Upcoming MPEG Standards: HEVC and MMT and their Prospects
Professor Dr.-Ing. Jörn Ostermann
Information Technology Laboratory
Faculty of Electrical Engineering and Computer Science
Leibniz Universität Hannover, Germany
Prof. Dr.-Ing. Jörn Ostermann received his Dipl.-Ing. and Dr.-Ing. from the University of Hannover in 1988 and 1994, respectively. From 1988 till 1994, he worked as a Research Assistant at the Institut für Theoretische Nachrichtentechnik conducting research in low bit-rate and object-based analysis-synthesis video coding. In 1994 and 1995 he worked in the Visual Communications Research Department at AT&T Bell Labs on video coding. He was a member of Image Processing and Technology Research within AT&T Labs - Research from 1996 to 2003. Since 2003 he is a Full Professor and Head of the Institut für Informationsverarbeitung at Leibniz Universität Hannover, Germany. In 2007, he became head of the Laboratory for Information Technology.
From 1993 to 1994, Prof. Jörn Ostermann chaired the European COST 211 sim group coordinating research in low bitrate video coding. Within MPEG-4, he organized the evaluation of video tools to start defining the standard. He chaired the Adhoc Group on Coding of Arbitrarily-shaped Objects in MPEG-4 Video. Since 2008, he is the Chair of the Requirements Group of MPEG (ISO/IEC JTC1 SC29 WG11).
Prof. Jörn Ostermann was a scholar of the German National Foundation. In 1998, he received the AT&T Standards Recognition Award and the ISO award. He is a Fellow of the IEEE, member of the IEEE Technical Committee on Multimedia Signal Processing and past chair of the IEEE CAS Visual Signal Processing and Communications (VSPC) Technical Committee. Prof. Jörn Ostermann served as a Distinguished Lecturer of the IEEE CAS Society. He has published more than 100 research papers and book chapters and is coauthor of a graduate level text book on video communications. He also holds more than 30 patents. His current research interests are video coding and streaming, 3D modeling, face animation, and computer-human interfaces.
In 2010, MPEG began work on two important new standardization projects. Jointly with ITU-T Study Group 16, MPEG started the development of a new video coding standard named High Efficiency Video Coding (HEVC). Furthermore, MPEG identified that transport of MPEG media over the Internet is often lacking performance and interoperability. As a result, the MPEG Media Transport (MMT) project was launched.
The new video coding standard, HEVC, will be jointly developed by experts from both MPEG and ITU-T SG 16 in a group named Joint Collaborative Team on Video Coding (JCT-VC). Compared to the latest video coding standard AVC|H.264, it is targeted at increased compression efficiency with an emphasis on video sequences with resolutions of HD and beyond. In addition to broadcasting applications, HEVC will also cater to the mobile market.
In the past, a MPEG-3 project was planned with the intention to replace MPEG-2 for HD video content until it was realized that increasing the resolution did not require an entirely new standard. Instead, it was sufficient to just add another profile for MPEG-2 allowing for the higher video resolutions. What is different today?
It is expected that additional complexity for video codecs will be available. Furthermore, the statistical properties of the video signal of modern video cameras differ significantly from that of a camera available 10 years ago when AVC|H.264 was standardized. As of June 2010, HEVC promises to reduce the bit rate for video coding by 50% compared to AVC|H.264 using previously explored coding tools like adaptive loop filter, extended macro block size, larger transform size, internal bit depth increasing, and adaptive quantization matrix selection, as well as new tools like decoder-side motion vector derivation, decoder-side motion estimation, adaptive interpolation filter, and adaptive prediction error coding in spatial and frequency domain. Unlike previous calls for proposals at the beginning of the standardization process, all 27 HEVC proposals were based on a hybrid coder with motion compensation and no proposal used wavelets.
In order to transport MPEG media over the Internet, MPEG relied on the work of other standardization bodies like 3GPP or IETF. Furthermore, the real-time transport of MPEG media over the Internet using standards is currently not possible when high quality is required. Protocols like RTP and RTSP require the use of additional proprietary commands in order to provide acceptable service over the Internet. To enable progressive download and streaming services, MPEG started working on HTTP streaming as well as MPEG Media Transport (MMT) in July 2010 and October 2010, respectively. The goal is to offer services based solely on existing protocols like HTTP or RTP, services based on UDP and TCP, as well as services that are based on IP with additional interfaces to the data link layer. It is expected that MMT will enable delivery of MPEG media with special focus given to HEVC.
Professor Oussama Khatib
Professor of Computer Science
Artificial Intelligence Laboratory
Department of Computer Science Stanford University
Prof. Oussama Khatib received his Doctorate degree in Electrical Engineering from Sup'Aero, Toulouse, France, in 1980. He is a Professor of Computer Science at Stanford University.
He is Co-Editor of the Springer Tracts in Advanced Robotics series, and has served on the Editorial Boards of several journals as well as Chair or Co-Chair for numerous international conferences. He co-edited the Springer Handbook of Robotics, which received the PROSE Award for Excellence in Physical Sciences & Mathematics and was also the winner in the category of Engineering & Technology.
Prof. Oussama Khatib is a Fellow of the IEEE and has served RAS as a Distinguished Lecturer, as a member of the Administrative Committee, and as the Program Chair of ICRA 2000. He is the President of the International Foundation of Robotics Research (IFRR) and a recipient of the Japan Robot Association (JARA) Award in Research and Development and the IEEE Robotics and Automation Society Pioneer Award.
Robotics is rapidly expanding into the human environment and vigorously staying engaged in its emerging challenges. From a largely dominant industrial focus, robotics has undergone, by the turn of the new millennium, a major transformation in scope and dimensions. This expansion has been brought about by the maturity of the field and the advances in its related technologies to address the pressing needs for human-centered robotic applications. Interacting, exploring, and working with humans, the new generation of robots will increasingly touch people and their lives, in homes, workplaces, and communities, providing support in services, entertainment, education, manufacturing, personal health care, and assistance. The successful introduction of robots in human environments will rely on the development of competent and practical systems that are dependable, safe, and easy to use. This presentation focuses on the efforts to develop human-friendly robotic systems that combine the essential characteristics of safety, human-compatibility, and performance with emphasis on (i) new design concepts and novel sensing modalities; (ii) efficient planning and whole-body robot control strategies; and (iii) robotic-based synthesis of human motion and skills.
In human-friendly robot design, our effort has focused on the development of intrinsically safe robotic systems that possess the requisite capabilities and performance to interact and work with humans. The novel design concept was based on a hybrid actuation approach that consists of biomimetic pneumatic muscles combined with small electric motors. The flexible muscles and the lightweight mechanism allow for human safety, while the electric motors compensate for the slower dynamics and nonlinearities of the pneumatics. This concept was shown to significantly decrease the inherent danger of robotic manipulators, as measured in terms of the reflected mass perceived at the point of impact. Safety can be further enhanced by the addition of robot skin to provide impact reduction and tactile sensing capabilities for advanced sensor based behaviors.
Redundancy is a major challenge in the planning and control of humanoid robots. Inspired by human behaviors, our early work in robot control encoded tasks and diverse constraints into artificial potential fields capturing human-like goal-driven behaviors. To implement such behaviors on robots with complex human-like structures we developed a unified whole-body task-oriented control structure that addresses dynamics in the context of multiple tasks, multi-point contacts, and multiple constraints. The performance and effectiveness of this approach have been demonstrated through extensive robot dynamic simulations and implementations on physical robots for experimental validation. The new framework provides a multi-task prioritized control architecture allowing the simultaneous execution of multiple objectives in a hierarchical manner, analogous to natural human motion.
Initially motivated by the development of human-like skills in robotics, our extensive study of human musculoskeletal system has brought insights and results that proved extremely valuable in human biomechanics. Understanding human motion is a complex procedure that requires accurate reconstruction of movement sequences, modeling of musculoskeletal kinematics, dynamics, and actuation, and suitable criteria for the characterization of performance. These issues have much in common with the problems of articulated body systems studied in robotics research. Building on methodologies and techniques developed in robotics, a host of new effective tools have been established for the synthesis of human motion. These include efficient algorithms for the simulation of musculoskeletal systems, novel physio-mechanical criteria and performance measures, real-time tracking and reconstruction of human motion, and accurate human performance characterization. These developments are providing new avenues for exploring human motion -- with exciting prospects for novel clinical therapies, athletic training, and performance improvement.
The following six tutorial sessions will be held on December 14, 2010. Tutorials 1-3 will be held in the morning, while Tutorials 4-6 will be held in the afternoon.
Morning session (9.00 am to 12.00 pm)
3D Video Processing Techniques for Free-Viewpoint Television
Gwangju Institute of Science and Technology (GIST), Korea
In recent years, various multimedia services have been available and the demand for three-dimensional television (3DTV) is growing rapidly. Since 3DTV is considered as the next generation broadcasting service that can deliver real and immersive experiences by supporting user-friendly interactions, a number of advanced 3D video processing technologies have been studied. Among them, multi-view video coding (MVC) is the key technology for various applications including free-viewpoint television (FVT). In order to support free-viewpoint video services, we need to develop efficient techniques for 3D video processing.
Human-Vision Friendly Processing for Images and Graphics
Nanyang Technological University, Singapore
Since the human visual system (HVS) is the ultimate receiver and appreciator for the majority (if not all) of naturally captured images and computer generated graphics, it would be better to use a perceptual criterion in the system design, implementation and optimization, instead of the traditional, mathematically defined one (e.g., MSE, SNR, PSNR, QoS or their relatives). After million-years of evolution, the HVS develops unique characteristics, which can be turned into the advantages for system designs. To make the machine perceive as the HVS does can result in resource savings (for instance, bandwidth, memory space, computing power) and performance enhancement (such as the resultant visual quality, and new functionalities). Significant research effort has been made toward modelling the HVS' mechanism during the past decade, and to apply the resultant models to various situations (equality evaluation, image/video compression, watermarking, channel coding, signal restoration/ enhancement, computer graphics, visual content retrieval, etc.).
In this tutorial, we will first introduce the problem formulation, the relevant physiological/psychological knowledge, and the work so far in the related fields. The basic engineering modules (like signal decomposition, visual attention, and visibility determination) are then to be discussed. The issues and difficulties related to the two major mechanisms in most current systems (i.e., feature detection and pooling) are to be highlighted and explored. Afterward, different perceptually-driven techniques will be presented for picture quality evaluation, signal compression, enhancement, communication, and computer graphics, with proper case studies whenever possible. The last part of the tutorial is devoted to a summary, points of further discussion and possible future research directions, based upon our experience in both academic and industrial pursuits.
Brain-Computer Interface Technology and Applications
Kai Keng Ang, Fabien Pierre Robert Lotte, Cuntai Guan
Institute for Infocomm Research, A*STAR, Singapore
A Brain-computer interfaces (BCI), or sometimes called brain-machine interface, is a device that respond to neural processes from the brain to provide a direct communication pathway between the brain and the external device. Research on BCIs began in the 1970 and recent advances in BCI technology has produced devices that augment or even help human functions that is only possible in science fiction a few years ago. This tutorial will present an overview of the current BCI technologies, ranging from invasive, semi-invasive using ECoG to non-invasive using EEG, MEG, NIRS and fMRI. Recently, there has been much interest in BCI technology to help improve the quality of life and to restore function for people with severe motor disabilities. One of the strategies is to use a BCI to translate brain signals that involves motor or mental imagery into commands for controlling the robot and bypasses the normal motor output neural pathways. This tutorial will focus on the signal processing and machine learning techniques to detect motor imagery. Finally, this tutorial will present how recent BCI technology can help to improve the lives of people with neurological disorders such as advanced amyotrophic lateral sclerosis, and to help restore more effective motor control to people after stroke or other traumatic brain disorders.
The first part of the tutorial will focus on an overview of BCI technologies: Invasive techniques, semi-invasive techniques using ECoG, and non-invasive techniques using EEG, MEG, NIRS and fMRI.
The second part of the tutorial will focus on the neurophysiological background on motor imagery, how to apply machine learning and signal processing algorithms to detect motor imagery from EEG signals, and how to interpret the computed solution.
The last part of the tutorial will focus on how BCI technology can help to improve lives of people with advanced amyotrophic lateral sclerosis. It will also describe how BCI technology can help to restore more effective motor control to people after stroke or other traumatic brain disorders by helping to guide activity-dependent brain plasticity.
Afternoon session (2.00 pm - 5.00 pm)
Image Denoising - The SURE-LET Methodology
The Chinese University of Hong Kong, Hong Kong SAR, China
The goal of this tutorial is to introduce the attendance to a new approach for dealing with noisy data - typically, images or videos here.
Image denoising consists in approximating the noiseless image by performing some, usually non-linear, processing of the noisy image. Most standard techniques involve assumptions on the result of this processing (sparsity, low high-frequency contents, etc.); i.e., on the denoised image.
Instead, the SURE-LET methodology that we promote consists in approximating the processing itself (seen as a function) in some linear combination of elementary non-linear processings (LET: Linear Expansion of Thresholds), and to optimize the coefficients of this combination by minimizing a statistically unbiased estimate of the Mean-Square Error (SURE: Stein's Unbiased Risk Estimate, for additive Gaussian noise).
This tutorial will introduce the technique to the attendance, will outline its advantages (fast, noise-robust, flexible, image adaptive). A very complete set of results will be shown and compared with the state-of-the-art. Extensions of the approach to Poisson noise reduction with application to fluorescence microscopy imaging will also be shown.
Emotion Recognition and Cognitive Load Measurement from Speech
Julien Epps, Fang Chen, Bo Yin
National ICT Australia, Australia
Research in speech processing has seen a gradual movement in attention from speech recognition and related applications towards paralinguistic speech processing problems in recent years. A wide range of paralinguistic classification problems have been considered, relating for example to the recognition of speaker identity, language, emotion, mental state, gender and age. In the general area of emotion recognition from speech, the number of papers published annually has increased by an order of magnitude over the past decade.
One application area of interest in paralinguistic speech classification is the measurement of cognitive load or mental workload. It is about a century since the proposal of the Yerkes-Dodson law, which states that there is an optimum mental arousal for performing a task, below and above which performance will deteriorate. Despite this, there are few methods that have been demonstrated to measure cognitive load in practise, and fewer still in real time. Speech-based methods are attractive because they are non-intrusive, inexpensive and can be real-time.
Like other paralinguistic classification tasks, cognitive load measurement is a challenging problem, and one that must account for variability posed by linguistic, contextual and speaker-specific characteristics. Unlike some other paralinguistic classification tasks, cognitive load measurement requires classification along an ordinal scale, motivating the use of very specific machine learning techniques.
This tutorial will introduce the technique to the attendance, will outline its advantages (fast, noise-robust, flexible, image adaptive). A very complete set of results will be shown and compared with the state-of-the-art.
This tutorial introduces and examines some of the key research problems for emotion recognition and cognitive load measurement from speech: understanding the psychophysiological basis of emotion and cognitive load during speech production, extracting suitable features from the speech signal, reducing feature variability due to speaker and linguistic content, developing machine learning methods applicable to the task, comparing and evaluating diverse methods, robustness, and constructing suitable databases. The discussion of cognitive load is framed in the wider context of emotion recognition from speech, and some key insights from this area will be covered. The tutorial will also briefly discuss the use of other biomedical signals for cognitive load measurement. Participants will be exposed to likely future challenges, both during the tutorial presentation and during the ensuing discussion.
Human Biometrics: Will it be Reality or Fantasy?
Waleed H. Abdulla
The University of Auckland, New Zealand
The 2001 MIT Technology Review indicated that biometrics is one of the emerging technologies that will change the world. Biometrics technology is initially treated as an exotic topic while recently it is a fast growing industry due to the urgent needs to secure people properties from goods to information.
Human Biometrics is automated recognition of a person using adherent distinctive physiological and/or involuntary behavioral features. Physiological features include facial characteristics, fingerprints, palm prints, iris patterns, and many more. Examples of behavioral features are signature writing dynamics, gait, speaker recognition, and keyboard typing dynamics. However, most biometric identifiers are a combination of physiological and behavioral features and they should not be exclusively classified into either physiological or behavioral characteristics. For example, speech is partially determined by the biological structure of the speaker vocal tract and partially by the way that person speaks. Also, fingerprints may be physiological in nature but the usage of the input device (e.g., how a user touches the fingerprint scanner and the pressure on the sensor) depends on the person's behavior. A car mechanics has different touch from a computer geek! Thus, the input to the recognition engine is a combination of physiological and behavioral characteristics. Behaviors can help in distinguishing the confusion happening when identifying parent, children, and siblings in their voice, gait, signature etc. The same argument applies to facial recognition. Faces of identical twins may completely match at birth but during growth, the facial features change based on the person's behavior developed from profession, way of living, environment, etc.
Through this tutorial we will go through all the main aspects of this fast growing technology. We will discuss in this tutorial if we are about entering an era where people don't need to carry any identity or credit cards and still can purchase things and travel to other countries. The attendees will be introduced throughout this tutorial to the following:
1. The fundamentals of Human Biometrics.
2. Types of biometrics.
3. Biometric systems structure.
4. Assessment of the performance of the biometric systems.
3DTV and Multi-View Video Processing
Yo-Sung Ho, Toshiaki Fujii
Ultra-Low Power and Low Energy Design for Signal Processing
Signal Processing for Brain Computer Interfaces
Signal Processing for Cognitive Radio
Advances in Digital Filters and Filter Banks
Advances in Sparse Signal Processing
Andy W. H. Khong
Recent Topics on Signal Processing for Active Noise Control
Yoshinobu Kajikawa, Waleed H. Abdulla
High-Efficiency Video Coding (HEVC)
Chun-Jen Tsai, Wen-Hsiao Peng, and Yo-Sung Ho
Medical Image Processing
Cuntai Guan, Jimmy Liu
Recent Advances in Signal Representations - Filters, Transforms and Sparse Representations
Shogo Muramatsu, Akira Hirabayashi
Computer-Assisted Language Learning (CALL) Based on Speech and Language Technologies
Advanced Signal Processing of Brain Signals: Methods and Applications
Justin Dauwels, Francois-Benoit Vialatte
Signal and Information Processing in Agriculture
Advanced Technologies for Robust and Secured Video Sharing and Delivery
Jo Yew Tham
Panel Session 1
With the exponential growth of media contents on the Web, the ability to search for media entities based on not just text annotations, but also visual contents, has become important. Although limited, popular commercial search engines, like Bing and Google image search, and start-ups like Gazopa and TinEye, SnapTell etc, are offering search services based on both text and visual contents. As commercial-scale search services require the handling of millions of media entities within interactive time, and with visibly improved performance beyond what can be done with annotated text, are research and lab technologies ready for such offerings? Has years of media content analysis research made any important contributions towards such services, and what should we focus on next to make better impact?
Kiyoharu Aizawa, University of Tokyo, Japan
Clement Liang-Tien Chia, Nanyang Technological University, Singapore
Alexander C. Loui, Eastman Kodak Company, USA
Jialie Shen, Singapore Management University, Singapore
Panel Session 2
Data hiding tries to hide secret message in a cover media via modifying slightly the components of the cover media. It can find many applications in covet communications, watermarking, and unseen labeling. On the other hands, media forensics wishes to find the traces of any imperceptible modifications on a media. Both data hiding and forensics have attracted attentions in recent years. Though there have been many research reports in data hiding and forensics, there are still many challenges to overcome.
In this panel session, we invite a few speakers to share their viewpoints and experiences on different aspects, including the recent progress and the potential applications, related to the data hiding and forensics.
K. J. Ray Liu, University of Maryland, USA
C.-C. Jay Kuo, University of Southern California, USA
Mohan Kankanhalli, National University of Singapore, Singapore
H.-Y. Mark Liao, Institute of Information Science, Academia Sinica, Taiwan ROC
Oscar Au, Hong Kong University of Science and Technology, Hong Kong SAR, China
Panel Session 3
The improvement of our living environment using the information technologies has received more and more attention in recent years. The main purpose of this panel is to introduce young researchers to a wide diversity of applications and issues and stimulate their interests and awareness along this research direction. In this panel, we have invited five leading researchers in this emerging field to share their experience and findings. They will discuss the opportunities and challenges in several application domains, including the smart oil field, water treatment, health informatics, tsunami and disaster mitigation, smart grids for energy distribution and policy-making.
Antonio Ortega, University of Southern California, USA
Cheng Fu, Nanyang Technological University, Singapore
Ping Yang, Chinese Academy of Science, China
Khairul Munadi, Syiah Kula University, Indonesia
Margaret Tan, Nanyang Technological University, Singapore
Overview Session I
December 17, Friday
Fri-AM. OS1.1Next Generation Wireless Communication Systems
Fri-AM. OS1.2Overview on Mocap Data Compression
Fri-AM. OS1.3Selected Topics from Recent Researches in Signal and Image Processing at Tokyo Metropolitan University
Fri-AM. OS1.4Taiwan E-Learning and Digital Archives Program and its R&D Effort
Fri-AM. OS1.5 On the Performance of Affine Projection Algorithm and Normalized LMS Algorithm
Overview Session II
December 17, Friday
C.-C. Jay Kuo
Fri-AM. OS2.1 Selected Topics from ASR Research for Asian Languages at Tokyo Tech
Fri-AM. OS2.2 An Overview of Free Viewpoint Depth-Image-Based Rendering (DIBR)
Fri-AM. OS2.3 Human Biometrics: Current Status and Future Vision
Fri-AM. OS2.4 Micro-Grid State Estimation Using Belief Propagation on Factor Graphs