APSIPA Distinguished Lecturers ( 1 January 2016 - 31 December 2017)
Wen-Huang Cheng, Academia Sinica, Taiwan


Wen-Huang Cheng received the B.S. and M.S. degrees in computer science and information engineering from National Taiwan University, Taipei, Taiwan, in 2002 and 2004, respectively, where he received the Ph.D. (Hons.) degree from the Graduate Institute of Networking and Multimedia in 2008.

He is currently an Associate Research Fellow with the Research Center for Information Technology Innovation (CITI), Academia Sinica, Taipei, Taiwan, where he is the Founding Leader with the Multimedia Computing Laboratory (MCLab), CITI, and an Assistant Research Fellow with a joint appointment in the Institute of Information Science. Before joining Academia Sinica, he was a Principal Researcher with MagicLabs, HTC Corporation, Taoyuan, Taiwan, from 2009 to 2010. His current research interests include multimedia content analysis, multimedia big data, deep learning, computer vision, mobile multimedia computing, social media, and human computer interaction.

Dr. Cheng has received numerous research awards, including the Outstanding Youth Electrical Engineer Award from the Chinese Institute of Electrical Engineering in 2015, the Top 10% Paper Award from the 2015 IEEE International Workshop on Multimedia Signal Processing, the Outstanding Reviewer Award from the 2015 ACM International Conference on Internet Multimedia Computing and Service, the Prize Award of Multimedia Grand Challenge from the 2014 ACM Multimedia Conference, the K. T. Li Young Researcher Award from the ACM Taipei/Taiwan Chapter in 2014, the Outstanding Young Scholar Awards from the Ministry of Science and Technology in 2014 and 2012, the Outstanding Social Youth of Taipei Municipal in 2014, the Best Reviewer Award from the 2013 Pacific-Rim Conference on Multimedia, the Best Poster Paper Award from the 2012 International Conference on 3D Systems and Applications.


Lecture 1: Sensing Visual Semantics for Interactive Multimedia Applications
For the effective development of interactive multimedia applications, one key technology is the multimedia content analysis, especially its achievable semantic level, i.e. the level of comprehension for a multimedia system on the multimedia content. However, since visual entities like objects in real-world photos and videos are usually captured in uncontrolled conditions, such as various viewpoints, positions, scales, and background clutter, in this lecture, we will present sensing techniques for giving robust visual semantics retrieval and recognition in real-world scenes. Several application scenarios will be showcased to demonstrate the effectiveness of the proposed sensing techniques. In particular, one application is to analyze the fashion trend of clothes from real video contents. Another application is mobile vision for locating the visual objects precisely and meanwhile achieving real-time performances. The third and the last application is video-based human posture and gesture detection to benefit the creation of serious gaming environments for professional training purpose.

Lecture 2: Exploring Social Semantics from Multimedia Big Data
Users are key elements in social multimedia and a huge amount of user-generated multimedia contents (multimedia big data) are created and exchanged through the social interactions among users. Exploring social semantics from multimedia big data is thus an effective way to understanding users and their behaviors. Particularly, popularity prediction on social media is a specific type of social semantics and has attracted extensive attention because of its widespread applications, such as online marketing, trend detection and resource allocation. Generally, given historical user-item pairs, popularity prediction is defined as the problem of estimating the rating scores, view counts or click-through of a newly post in social media. In this lecture, we first review existing research on popularity prediction that predominantly focuses on exploring the correlation between popularity and user-item factors, such as item content, user cues, social relation, and user-item interaction. In fact, time also exerts crucial impact on the popularity but is often overlooked. We further present techniques on investigating the popularity prediction from two complementary perspectives by factoring popularity into two contextual associations, i.e., user-item context and time-sensitive context. The user-item context is linked to popularity with user-specific and item-specific contextual information, which can be derived from user-item sharing behaviors on social media. The time-sensitive context is affected by 'change over time' information (associated with sharing time of photos) including user activeness variability and photo prevalence variability. Finally, further research on exploring social semantics from multimedia big data will be addressed.

Gene Cheung, National Institute of Informatics, Japan

Gene Cheung (M'00-SM'07) received the B.S. degree in electrical engineering from Cornell University in 1995, and the M.S. and Ph.D. degrees in electrical engineering and computer science from the University of California, Berkeley, in 1998 and 2000, respectively.

He was a senior researcher in Hewlett-Packard Laboratories Japan, Tokyo, from 2000 till 2009. He is now an associate professor in National Institute of Informatics in Tokyo, Japan. He is an adjunct associate professor in the Hong Kong University of Science & Technology (HKUST) since 2015.

His research interests include image & video representation, immersive visual communication and graph signal processing. He has served as associate editor for IEEE Transactions on Multimedia (2007-2011) and DSP Applications Column in IEEE Signal Processing Magazine (2010-2014). He currently serves as associate editor for IEEE Transactions on Image Processing (2015-present), SPIE Journal of Electronic Imaging (2014-present) and APSIPA Journal on Signal & Information Processing (2011-present), and as area editor for EURASIP Signal Processing: Image Communication (2011-present). He will serve as associate editor for IEEE Circuits and Systems for Video Technology starting 2016. He served as the lead guest editor of the special issue on "Interactive Media Processing for Immersive Communication'" in IEEE Journal on Special Topics on Signal Processing, published in March 2015. He served as a member of the Multimedia Signal Processing Technical Committee (MMSP-TC) in IEEE Signal Processing Society (2012-2014), and a member of the Image, Video, and Multidimensional Signal Processing Technical Committee (IVMSP-TC) (2015-2017). He has also served as technical program co-chair of International Packet Video Workshop (PV) 2010 and IEEE International Workshop on Multimedia Signal Processing (MMSP) 2015, area chair in IEEE International Conference on Image Processing (ICIP) 2010, 2012-2013, 2015, track co-chair for Multimedia Signal Processing track in IEEE International Conference on Multimedia and Expo (ICME) 2011, symposium co-chair for CSSMA Symposium in IEEE GLOBECOM 2012, and area chair for ICME 2013-2015. He was invited as plenary speaker for IEEE MMSP 2013 on the topic "3D visual communication: media representation, transport and rendering'". He is a co-author of best student paper award in IEEE Workshop on Streaming and Media Communications 2011 (in conjunction with ICME 2011), best paper finalists in ICME 2011, ICIP 2011 and ICME 2015, best paper runner-up award in ICME 2012 and best student paper award in ICIP 2013.


Lecture 1: Graph Signal Processing for Image Coding & Restoration
Graph signal processing (GSP) is the study of discrete signals that live on structured data kernels described by graphs. By allowing a more flexible graphical description of the underlying data kernel, GSP can be viewed as a generalization of traditional signal processing techniques that target signals in regular kernels, while still providing a frequency domain interpretation of the observed signals. Though an image is a regularly sampled signal on a 2D grid, one can nonetheless consider an image patch as a graph-signal on a sparsely connected graph defined signal-dependently. Recent GSP works have shown that such approach can lead to a compact signal representation in the graph Fourier domain, resulting in noticeable gain in image compression and restoration. Specifically, in this talk I will overview recent advances in GSP as applied to image processing. I will first describe how a Graph Fourier Transform (GFT)-a generalization of known transforms like Discrete Cosine Transform (DCT)-can be defined in a signal-dependent manner and leads to compression gain for piecewise smooth images, outperforming H.264 intra by up to 6.8dB. I will then describe how suitable graph-signal smoothness priors can be constructed for a graph-based image denoising algorithm, outperforming state-of-the-art BM3D by up to 2dB for piecewise smooth images. Similar graph-signal smoothness priors can also be used for other image restoration problems, such as de-quantization of compressed JPEG images.

Lecture 2: 3D Image Representation & Coding for Interactive Navigation
In color-plus-depth image representation, a 3D scene is represented by color (RGB) and depth images as observed from multiple viewpoints, and intermediate virtual views can be further rendered via depth-image-based rendering (DIBR). However, conventional transform coding plus lossy quantization of depth images can lead to geometric distortion, resulting in undesirable bleeding artifacts in DIBR-synthesized images. Observing that disparity information-like motion vectors in video coding-should be coarsely represented but losslessly coding, in this talk I first introduce a graph-based representation called GBR-plus that compactly represents disparity information to displace entire pixel patches from one reference view to a target view in a graphical manner. Second, I discuss how disparity information in GBR-plus can be approximated and then efficiently coded using arithmetic edge coding (AEC). Finally, to enable interactive view navigation at the client so that any viewpoint image can be flexibly decoded from a number of decoding paths, I present a new distributed source coding (DSC) coding framework called merge frame that does not require traditional channel coding and bit-plane coding, while achieving identical merging and good rate-distortion performance.

Zhu Li, University of Missouri, USA


Zhu Li is an Associate Professor with the Dept of Computer Science & Electrical Engineering (CSEE) at Univeristy of Missouri, Kansas City. He received his PhD in Electrical & Computer Engineering from Northwestern University, Evanston in 2004. He was Sr. Staff Researcher/Sr. Manager with Samsung Research America's Multimedia Standards Research Lab in Dallas, 2012-15, Senior Staff Researcher/Group Lead with FutureWei(Huawei)'s Media Lab in Bridgater, NJ, from 2010-12, an Assistant Professor with the Dept of Computing, The Hong Kong Polytechnic University from 2008-2010, and a Principal Staff Research Engineer with the Multimedia Research Lab (MRL), Motorola Labs, Schaumburg, Illinois, from 2000 to 2008.

His research interests include audio-visual analytics and machine learning with its application in large scale video repositories annotation, search and recommendation, as well as video adaptation, source-channel coding and distributed optimization issues of the wireless video networks. He has 30+ issued or pending patents, 90+ publications in book chapters, journals, conference proceedings and standards contributions in these areas. He is an IEEE senior member, elected member of the IEEE Multimedia Signal Processing (MMSP) Technical Committee ,2014-16, elected Vice Chair of the IEEE Multimedia Communication Technical Committee (MMTC) 2008~2010, and Standards Liaison, 2014-16. He is an Associated Editor for IEEE Trans. on Multimedia, IEEE Trans. on Circuits & System for Video Technology, Springer Journal of Signal Processing Systems, co-editor for the Springer-Verlag book on "Intelligent Video Communication: Techniques and Applications". He served on numerous conference and workshop TPCs and was symposium co-chair at IEEE ICC'2010, and on the Best Paper Award Committee for IEEE ICME 2010.

He received the Best Poster Paper Award from IEEE Int'l Conf on Multimedia & Expo (ICME) at Toronto, 2006, and the Best Paper Award from IEEE Int'l Conf on Image Processing (ICIP) at San Antonio, 2007.


Lecture 1: Robust Visual Object Re-Identification Against Very Large Repositories - The MPEG Mobile Visual Search Standardization Research
Visual object identification against a very large repository is a key technical challenge in a variety of mobile visual search and virtual reality/augmented reality applications. MPEG created a working group on Compact Descriptor for Visual Search (CDVS) to develop relevant technology and standard to address this issue and enable mobile visual search and object re-identification applications. In this talk, I will review the key technical challenges to the CDVS pipeline, and covering the novel contributions made in CDVS work on alternative interesting points detection, more efficient aggregation scheme, indexing / hashing issues and retrieval system optimization, as well as the future direction of the research in this area.

Lecture 2: Visual Recognition over Large Repositories with Subspace Indexing on Grassmann Manifolds
In large scale visual pattern recognition applications, when the subject set is large the traditional linear models like PCA/LDA/LPP, become inadequate in capturing the non-linearity and local variations of visual appearance manifold. Kernelized non-linear solutions can alleviate the problem to certain degree, but faces a computational complexity challenge of solving an Eigen problems of size n x n for number of training samples n. In this work, we developed a novel solution by applying a data partition on the BIGDATA training set first and obtain a rich set of local data patch models, then the hierarchical structure of this rich set of models are computed with subspace clustering on Grassmanian manifold, via a VQ like algorithm with data partition locality constraint. At query time, a probe image is projected to the data space partition first to obtain the probe model, and the optimal local model is computed by traversing the model hierarchical tree. Simulation results demonstrated the effectiveness of this solution in capturing larger degree of freedom (DoF) of the problem, with good computational efficiency and recognition accuracy, for applications in large subject set face recognition and image tagging.

Jiaying Liu, Peking University, China

Dr. Jiaying Liu received the B.E. degree in computer science from Northwestern Polytechnic University, Xi'an, China, and the Ph.D. degree with the Best Graduate Honor in computer science from Peking University, Beijing, China, in 2005 and 2010, respectively.
She is currently an Associate Professor with the Institute of Computer Science and Technology, Peking University. She has authored or co-authored over 60 papers, and hold 10 granted patents. Her current research interests include image/video processing, computer vision, and video compression.

Dr. Liu was a Visiting Scholar with the University of Southern California, Los Angeles, from 2007 to 2008. Supported by Star Track program, she was a Visiting Researcher at Microsoft Research Asia (MSRA) in 2015. She has also served as TC member in APSIPA IVM since 2015.

She has also engaged in computing education. She has run MOOC courses "C++ Programming" and "Fundamental Algorithm Design" on Coursera/edX/ChineseMOOC. There are more than 30 thousand students enrolled. She also got Peking University Teaching Excellence Award.

Chia-Hung Yeh, National Sun Yat-Sen University, Taiwan

Chia-Hung Yeh (M'03-SM'12) received his B.S. and Ph.D. degrees from National Chung Cheng University, Taiwan, in 1997 and 2002, respectively, both from the Department of Electrical Engineering. Dr. Yeh joined the Department of Electrical Engineering, National Sun Yat-sen University (NSYSU) as an assistant professor in 2007 and became an associate professor in 2010. In Feb. 2013, Dr. Yeh is promoted to a full professor. Dr. Yeh's research interests include multimedia communication, multimedia database management, and image/audio/video signal processing. He served on the Editorial Board of the Journal of Visual Communication and Image Representation, and the EURASIP Journal on Advances in Signal Processing. In addition, he has rich experience in organizing various conferences serving as keynote speaker, session chair, and technical program committee and program committee member for international/domestic conferences. Dr. Yeh has co-authored more than 170 technical international conferences and journal papers and holds 42 patents in the U.S., Taiwan, and China. He received the 2007 Young Researcher Award of NSYSU, the 2011 Distinguished Young Engineer Award from the Chinese Institute of Electrical Engineering, the 2013 Distinguished Young Researcher Award of NSYSU, the 2013 IEEE MMSP Top 10% Paper Award, and the 2014 IEEE GCCE Outstanding Poster Award.


Lecture 1: A Light-weight 3D Reconstruction System
3D models allow us to explore all dimensions of the objects, e.g. monuments, sites, even the whole city regions. Over the last decade, a significant number of 3D related approaches and applications, such as 3D printing, 3D films, 3D archive and so on, have become popular research topics. Furthermore, to meet the 3D industry's requirements on high accuracy and flexibility, 3D reconstruction approaches that can be performed anywhere and anytime are in high demand. To achieve this goal, two main challenges need to be overcome, the acquisition of acceptable input data and the computational complexity of the reconstruction procedure. Many researches are devoted to developing the technique of laser scanner calibration. However, the cost of a 3D laser scanner restricts its usage to general since it is not affordable to most people. In addition, due to the size of it, this device cannot be considered as a part of a mobile-based application. In this lecture, we will present a light-weight 3D object reconstruction approach. It aims to fulfill the increasing demand for fast and reliable 3D reconstruction in a mobile environment. Thereby, people can directly use their own portable device to reconstruct the desired objects into 3D models.

Lecture 2: New Intra Coding Schemes for High Efficiency Video Coding
Video coding is a procedure that compresses digital video data to reduce the required bandwidth when transmitting a video. The goal of video coding is to compress a large amount of data efficiently for the transmission of data over the Internet while keeping acceptable visual quality of the reconstructed video. Intra coding has an important role in video coding because it prevents error propagation and maintains better visual quality; in addition, it requires much less computations than inter coding because no motion estimation computations are required. However, the current intra coding methods for HEVC standard is still inefficient, so a new intra coding scheme is required for further improvement of coding efficiency. For these reasons, two new directions including pattern matching and predictive texture synthesis are used to enhance the intra coding efficiency, and to achieve a better coding performance than HEVC intra prediction.

Jiangtao Wen, Tsinghua University, China


Jiangtao (Gene) Wen received the BS, MS and Ph.D. degrees with honors from Tsinghua University, Beijing, China, in 1992, 1994 and 1996 respectively, all in Electrical Engineering.

From 1996 to 1998, he was a Staff Research Fellow at UCLA, where he conducted research on multimedia coding and communications. Many of his inventions were later adopted by international standards such as H.263, MPEG and H.264. After UCLA, he served as the Principal Scientist of PacketVideo Corp. (NASDAQ: WAVE/DCM), the CTO of Morphbius Technology Inc., the Director of Video Codec Technologies of Mobilygen Corp (NASDAQ: MXIM), the Senior Director of Technology of Ortiva Wireless (NASDAQ: ALLT) and consulted for Stretch Inc., Ocarina Networks (NASDAQ: DELL) and QuickFire Networks (NASDAQ: FB). Since 2009, Dr. Wen has held a Professorship at the Department of Computer Science and Technology of Tsinghua University. He was a Visiting Professor at Princeton University in 2010 and 2011.

Dr. Wen's research focuses on multimedia communication over challenging networks and computational photography. He has authored many widely referenced papers in related fields. Products deploying technologies that Dr. Wen developed are currently widely used worldwide. Dr. Wen holds over 40 patents with numerous others pending. Dr. Wen is an Associate Editor for IEEE Transactions Circuits and Systems for Video Technologies (CSVT). He is a recipient of the 2010 IEEE Trans. CSVT Best Paper Award.
Dr. Wen was elected a Fellow of the IEEE in 2011. He is the Director of the Research Institute of the Internet of Things of Tsinghua University, and a Co-Director of the Ministry of Education Tsinghua-Microsoft Joint Lab of Multimedia and Networking.
Besides teaching and conducting research, Dr. Wen also invests in high technology companies as an angel investor.