APSIPA Distinguished Lecturers ( 1 January 2014 - 31 December 2015)
Nam Ik Cho, Seoul National University, Korea


Nam Ik Cho received the B.S., M.S., and Ph.D. degrees in control and instrumentation engineering from Seoul National University, Seoul, Korea, in 1986, 1988, and 1992, respectively. From 1991 to 1993, he was a Research Associate of the Engineering Research Center for Advanced Control and Instrumentation, Seoul National University. From 1994 to 1998, he was with the University of Seoul, Seoul, Korea, as an Assistant Professor of Electrical Engineering. He worked as a visiting scholar in the Dept. of Electrical Eng., University of California, Santa Barbara in 1996. He joined the Dept. of Electrical and Computer Engineering, Seoul National University, in 1999, where he is currently a Professor. During 2011-2013, he served as a Vice Dean of the College of Engineering, Seoul National University. His research interests include image processing, adaptive filtering and computer vision. In this area, he published about 60 journal papers, 100+ conference papers, and 10 US Patents.

Dr. Cho is currently a senior member of the IEEE Signal Processing Society, and serving as a handling editor of Signal Processing, Elsevier. He was the special session chair of IEEE BMSB2012 and the finace chair of IEEE ISPACS2004. He received Chaster Sall Award (2014) from IEEE Consumer Electronics Society.


Lecture 1: Hierarchical prediction for lossless image compression
Hierarchical prediction for lossless image compression Abstract: A new hierarchical prediction scheme is presented for the lossless compression of color images and color filter array (CFA) images. For the lossless compression of an RGB image, it is first decorrelated by a reversible color transform and then Y component is encoded by a conventional lossless grayscale image compression method. For encoding the chrominance images, a hierarchical scheme that enables the use of upper, left and lower pixels for the pixel prediction is proposed, whereas the conventional raster scan prediction methods use upper and left pixels. An appropriate context model for the prediction error is also defined and the arithmetic coding is applied to the error signal corresponding to each context. For several sets of images, it is shown that the proposed method further reduces the bit rates compared to JPEG2000 and JPEG-XR. For the CFA compression, the CFA data is subsampled and encoded in order, where each of subimages contains only one kind of color components among R, G, or B in the case of Bayer CFA image. By subsampling, the green pixels are separated into two sets, one of which is encoded by a conventional grayscale encoder. Then the green pixels in other set are predicted from the already encoded greens. Then all of greens are used for the prediction of red, and all of these pixels are used for the blue prediction. In this process, the predictors are designed considering the direction of edges in the neighborhood. By gathering some information from the prediction process such as edge activities and neighboring errors, the magnitude of prediction error is also estimated. From this, the pdf of prediction error conditioned on neighboring pixels (context) is estimated, and context-adaptive arithmetic encoding is applied to reduce the resulting bits further. Experimental results on real and simulated CFA images show that the proposed method produces less bpp than the conventional lossless image compression methods and also recently developed lossless CFA compression algorithms. The proposed lecture is a condensed version of following papers:.

  1. S. Kim and N. I. Cho, "Hierarchical prediction and context adaptive coding for lossless color image compression," IEEE Transactions on Image processing, Jan. 2014
  2. S. Kim and N. I. Cho, "Lossless compression of color filter array images by hierarchical prediction and context modeling," IEEE Trans. Circuits and Systems for Video Technology, accepted.

Lecture 2: Camera captured document image processing for enhancing visual quality and recognition rates
Rectification methods for the document images are presented here. Specifically, the algorithms that remove geometric distortions in camera-captured document images are studied. Unlike conventional methods, we formulate the document dewarping as an optimization problem defined on discrete representations of text-blocks and text-lines. We model the geometric distortions caused by cameras and curved surfaces as generalized cylindrical surface and camera rotation, and develop a cost function whose minimization yields the parameters of the models. Our cost function is based on geometric relations between a point in an input image and its corresponding point in the rectified image. We also encode the constraints on text-lines and text-blocks in rectified images into the cost function. Due to discrete representation of textlines, our cost function is well-defined and minimized via the Levenberg-Marquadt algorithm. Since our method does not depend on special assumptions on inputs, it works for various layouts and handles curved surfaces as well as planes in the same framework. Moreover, we extend our approach to unfolded book surfaces. Experimental results show that our method works for very challenging cases and compares favorably with conventional methods. This presentation is a condensed version of following papers with some new results:

  1. Hyung Il Koo and Nam Ik Cho, "Skew estimation of natural images based on a salient line detector," Journal of Electronic Imaging, Vol. 22, no. 1, pp. 013020-013020, January 2013.
  2. Hyung Il Koo, and Nam Ik Cho, "Text-Line Extraction in Handwritten Chinese Documents Based on an Energy Minimization Framework," IEEE Transactions on Image Processing, Vol. 21, No. 3, pp. 1169-1175, March 2012.
  3. Hyung Il Koo, Jinho Kim, and Nam Ik Cho, "Composition of a Dewarped and Enhanced Document Image From Two View Images," IEEE Trans. Image Processing, Vol. 18, No. 7, pp. 1551-1562, July 2009.

Minh N. Do, University of Illinois at Urbana-Champaign, USA

Minh N. Do is an Associate Professor in the Department of Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign (UIUC). His work covers image and multi-dimensional signal processing, wavelets and multiscale geometric analysis, computational imaging, and visual information representation, and has led to about 45 journal papers. He received a Silver Medal from the 32nd International Mathematical Olympiad in 1991, a University Medal from the University of Canberra in 1997, a Doctorate Award from the Swiss Federal Institute of Technology Lausanne (EPFL) in 2001, a CAREER Award from the National Science Foundation in 2003, and a Young Author Best Paper Award from IEEE in 2008. He was named a Beckman Fellow at the Center for Advanced Study, UIUC, in 2006, and received of a Xerox Award for Faculty Research from the College of Engineering, UIUC, in 2007. He was an Associate Editor of the IEEE Transactions on Image Processing, and a member of the IEEE Technical Committees on Signal Processing Theory and Methods, and on Image, Video, and Multidimensional Signal Processing. He was elevated to be an IEEE Fellow in 2014 for contributions to image representation and computational imaging. He is a co-founder and Chief Scientist of Personify (formerly Nuvixa), a spin-off from UIUC to commercialize depth-based visual communication.


Lecture 1: Beyond Wavelets: Directional Multiresolution Image Representation
Efficient representation of visual information lies at the foundation of many image processing tasks, including compression, filtering, feature extraction, and inverse problems. Efficiency of a representation refers to its power to capture significant information of objects of interest using a compact description. For practical applications, this representation has to be realized by structured transforms and fast algorithms.

In this lecture, after demonstrating that the commonly used separable wavelet transforms are inadequate for processing pictorial information, I will present our recent construction of a "true" two-dimensional (2-D) representations that can capture the intrinsic geometrical structure of natural images. The resulting image expansion is composed of contour segments, and thus is named contourlets. The contourlet transform has a fast iterated filter bank algorithm, a precise connection between the continuous and discrete domains using multiresolution analysis, and the optimal approximation rate for 2-D piecewise smooth functions with discontinuities along smooth curves. Experiments with real images indicate the potential of contourlets in image processing applications. Furthermore, by utilizing ideas from harmonic analysis, visual perception, computer vision, and signal processing, we look for a new fruitful interaction between these fields.

Lecture 2: Immersive Visual Communication with Depth
The ubiquity of digital cameras has made a great impact on visual communication as can be seen from the explosive growth of visual contents on the Internet and the default inclusion of a digital camera on cellphones and laptops. The recent emerging of low-cost and fast depth cameras (e.g. Kinect) provides a great opportunity to revolutionize visual communication further by enabling immersive and interactive capabilities. Depth measurements provide perfect complementary information to the traditional color imaging in capturing the three-dimensional (3D) scene. By effectively integrate color and depth information, we develop real-time systems that capture the live scene and render 3D free-viewpoint videos, and potentially augment the captured scene with 3D virtual worlds. Such system can provide unprecedented immersive and interactive 3D viewing experiences for personalized distance learning and tele-presence.

Lecture 3: Computational Imaging: From Formation to Processing

As more than 70 percent of the human body's sensors are in the eyes and video is estimated to represent 90 percent of all Internet traffic by 2016, imaging and video technology are going to be major application areas in personal computing. In this lecture, I will describe some of our current efforts in developing computational techniques for imaging and videography. These include depth-image-based rendering, interpolation and upconversion, coded exposure, deconvolution and restoration, inverse rendering, and relighting.

Woon-Seng Gan, Nanyang Technological University, Singapore


Woon-Seng Gan received his BEng (1st Class Hons) and PhD degrees, both in Electrical and Electronic Engineering from the University of Strathclyde, UK in 1989 and 1993 respectively. He is currently an Associate Professor and the Head of Information Engineering Division, School of Electrical and Electronic Engineering in Nanyang Technological University. His research interests span a wide and related areas of adaptive signal processing, active noise control, directional sound system, psycho-acoustical signal processing, and real-time embedded systems.
Dr. Gan has published more than 200 international refereed journals and conferences, and has granted five Singapore/US patents. He had co-authored a book on Digital Signal Processors: Architectures, Implementations, and Applications (Prentice Hall, 2005). He was also the leading author of a new book on Embedded Signal Processing with the Micro Signal Architecture, (Wiley-IEEE, 2007). A book on Subband Adaptive Filtering: Theory and Implementation was also published by John Wiley in August 2009. He had also authored a book chapter in Rick Lyon's latest book on Streamlining Digital Signal Processing: A Trick of the Trade Guidebook, 2nd Edition, published by Wiley-IEEE press, 2012.
He is currently a Fellow of the Audio Engineering Society(AES), a Fellow of the Institute of Engineering and Technology(IET), a Senior Member of the IEEE, and a Professional Engineer of Singapore. In 2012, he became the Series Editor of the new SpringerBriefs in Signal Processing. He is also an Associate Technical Editor of the Journal of Audio Engineering Society (JAES); Associate Editor of the IEEE Transaction on Audio, Speech, and Language Processing (ASLP); Editorial member of the Asia Pacific Signal and Information Processing Association (APSIPA) Transactions on Signal and Information Processing; and Associate Editor of the EURASIP Journal on Audio, Speech and Music Processing. He is currently a technical committee member of the Design and Implementation of Signal Processing Systems (DiSPS), and the Industry DSP Technology (IDSP) standing committee of the IEEE Signal Processing Society. He is a member of the Board of Governor of the APSIPA.


Lecture 1: Audio Projection: Directional Sound and Its Applications in Immersive Communication
The proposed lecture was a condensed version of a previous APSIPA tutorial (2012). It reviews the historical development leading up to the modern-day directional loudspeakers based on parametric acoustic array effect. Key challenges associated with performance of emitters and applications will be addressed. In particular, we present signal processing techniques to overcome some physical limitations inherent in this type of directional loudspeakers. Some of the new results will also be presented to chart the roadmap for future advancement in this new area of sound projection. In addition, we examine the needs of psychoacoustic processing to enhance the 3D spatial perception and audio quality of parametric loudspeakers. I will also review some of the new works that are currently being carried out by different research groups. Finally, the significance in using parametric loudspeakers in immersive communication will be described and I will put forward some research challenges in parametric loudspeakers and how signal processing techniques may help to overcome these challenges.

Lecture 2: Subband Adaptive Filtering: Theory and Implementation
This lecture describes, analyzes, and generalizes a new class of subband adaptive filters (SAFs), called the normalized SAF (NSAF), whereby the adaptive filter is no longer separated into subfilters. Instead, subband signals, which are normalized by their respective subband input variance, are used to adapt the fullband tap weights of a modeling filter. The modeling filter is placed before the decimators, which is different from that of the conventional structure where a set of subfilters are placed after the decimators. In such a configuration, the modeling filter operates on a set of subband input signals at the original sampling rate. The weight adjustment applied on the modeling filter, at each iteration, is a linear combination of normalized subband regressors. Implicitly, the weight-control mechanism is driven by an equalized input spectrum, which is a composite of the normalized version of the contiguous spectral bands of the original spectrum. The equalized spectrum accelerates the convergence due to its reduced spectral dynamic range, while avoiding any possible aliasing and band-edge effects. Moreover, computational reduction can be attained by decimating the adaptation rate of the modeling filter, with the convergence properties remain intact.

Lecture 3: Recent Work on 3-D Audio Rendering for Headphones
In this lecture, we examine how 3D sound can be rendered over the headphones, and examine some of the key problems in creating realistic 3D sound over headphones. We examine some of the latest techniques in creating a highly personalized and realistic 3D sound and how these new techniques can overcome existing problems and ready for prime time usage in gaming, entertainment, and augmented reality. In particular, we outline one of our latest works in creating a personalized 3D audio headphones that does not require extensive training to customize audio cues to the geometry of the listeners' ear, and simply rely on emitters' geometry and positioning, and include cue extraction techniques to create a highly immersive and enhanced sound stage in headphone listening.

Lecture 4: Recent advances on active noise control: open issues and innovative applications
In this lecture, we briefly reviewed broadband and narrowband feedforward and adaptive feedback ANC systems with focus on signal processing algorithms. We focused on the introduction of the recent research and development in the last decade after detailed tutorial publications. In particular, we introduced the audio-integrated algorithm and the concepts of psychoacoustics and virtual sensing for ANC. In this talk, we also comprehensively reviewed online secondary-path modeling techniques and ANC without the secondary-path model, which remain critical for some practical applications. Finally, we highlighted some ANC applications in medical and consumer electronics fields, which are important for motivating new ANC applications in addition to traditional applications in industry and transportation. We also identified many related difficulties and open research issues in each section. This lecture is based on an overview paper published in the APSIPA Transaction in 2012.

Yao-Win Peter Hong, National Tsing Hua University, Taiwan


Y.-W. Peter Hong received his B.S. degree in Electrical Engineering from National Taiwan University, Taipei, Taiwan, in 1999, and his Ph.D. degree in Electrical Engineering from Cornell University, Ithaca, NY, in 2005. He joined the Institute of Communications Engineering and the Department of Electrical Engineering at National Tsing Hua University, Hsinchu, Taiwan, in Fall 2005, where he is now a Full Professor. His research interests include physical layer secrecy, cooperative communications, distributed signal processing for sensor networks, and PHY-MAC cross-layer designs for wireless networks.
Dr. Hong received the best paper award for young authors from the IEEE IT/COM Society Taipei/Tainan Chapter in 2005, the best paper award in MILCOM 2005, the Junior Faculty Research Award from the College of EECS and from National Tsing Hua University in 2009 and 2010, respectively. He also received the IEEE Communication Society Asia-Pacific Outstanding Young Researcher Award in 2010, the Y. Z. Hsu Scientific Paper Award and the National Science Councile (NSC) Wu Ta-You Memorial Award in 2011, and the Chinese Institute of Electrical Engineering (CIEE) Outstanding Young Electrical Engineer Award in 2012. His coauthored paper received the Best Paper Award from the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) in 2013.
Dr. Hong is currently an Associate Editor for IEEE Transactions on Signal Processing and IEEE Transactions on Information Forensics and Security. He is also the leading coauthor of the books "Cooperative Communications and Networking: Technologies and System Design" (with W.-J. Huang and C.-C. Jay Kuo) and "Signal Processing Approaches to Secure Physical Layer Communications in Multi-Antenna Wireless Systems" (with P.-C. Lan and C.-C. Jay Kuo) published by Springer in 2010 and in 2013, respectively. Dr. Hong has also served as guest editor of EURASIP Special Issue on Cooperative MIMO Multicell Networks and of IJSNET Special Issue on Advances in Theory and Applications of Wireless, Ad Hoc, and Sensor Networks.


Lecture 1: MIMO Signal Processing Techniques to Enhance Physical Layer Secrecy in Wireless Communications
This talk provides an overview of signal processing techniques used to enhance physical layer secrecy in the data transmission phase of multi-antenna wireless communication systems. Wireless physical layer secrecy has attracted much attention in recent years due to the broadcast nature of the wireless medium and its inherent vulnerability to eavesdropping. Motivated by results in information theory, signal processing techniques have been developed to enlarge the signal quality difference at the destination(s) and the eavesdropper(s). In particular, in the data transmission phase, secrecy beamforming and precoding schemes as well as the use of artificial noise have been used to enhance signal quality at the destination while limiting the signal strength at the eavesdropper. These techniques will first be introduced for point-to-point MIMO system as well as systems with multiple destinations and eavesdroppers. Then, the techniques will be further extended to wireless relay systems, where additional spatial degrees of freedom are provided and additional security threats are present. This talk is part of Dr. Hong's tutorial provided at APSIPA ASC 2013.

Lecture 2: Discriminatory Channel Estimation - A Training and Channel Estimation Approach to Enhance Physical Layer Secrecy
Due to the broadcast nature of the wireless medium, secrecy considerations have become increasingly important in the design of wireless communication systems. Motivated by results in information theory, signal processing techniques have been developed to enlarge the signal quality difference at the legitimate receiver (LR) and the eavesdropper (or unauthorized receiver (UR)). In this talk, a secrecy-enhancing training procedure, named the discriminatory channel estimation (DCE) scheme, will be introduced as a channel estimation approach to achieve this task. In contrast to most studies on physical layer secrecy, which focus on the data transmission phase, studies on DCE focus on the channel estimation phase and aim to provide a practical signal processing technique to discriminate between the channel estimation performances at LR and UR. By doing so, the difference between the effective signal-to-noise ratios of the two users can be enhanced and, thus, leaving more room for secrecy coding or modulation schemes in the data transmission phase. A key feature of DCE designs is the insertion of artificial noise (AN) in the training signal to degrade the channel estimation performance at UR. Two DCE schemes will be discussed, namely, the feedback-and-retraining and the two-way training based DCE schemes. In both of these schemes, the optimal power allocation between training data and AN is determined by minimizing the normalized mean squared error (NMSE) of LR subject to a lower limit constraint on the NMSE of UR.

Lecture 3: Energy Harvesting Communications for Wireless Cellular and Sensor Networks
Recent advances in energy harvesting technology have enabled the development of wireless devices that are able to support their own operations through the collection of ambient energy, e.g., from solar, wind, vibrational, or thermal sources. This helps reduce the energy cost, increase the deployment flexibility, and prolong the lifetime of wireless communication devices. However, when relying on energy harvested from the environment, the strict energy causality and energy storage constraints as well as the randomness of the energy arrival may have a significant impact on the communication performance and design. This talk gives an overview of recent advances and challenges in this research area, and introduces our solution to some of these problems. For wireless cellular networks, energy-aware rate and power allocation policies, beamforming, and scheduling policies will be discussed. For wireless sensor networks, energy-aware relaying (or information forwarding) and sensor deployment schemes will be discussed, especially for distributed parameter estimation problems.

Lecture 4: Distributed Channel-Aware Transmission Strategies for Conventional, Cooperative and Cognitive Wireless Networks
This talk introduces ways to incorporate channel state information in the uplink channel access policies of conventional, cooperative, and cognitive wireless networks, using a decentralized random-access approach. Here, users are allowed to make transmission decisions based on local channel state information (CSI) and/or spectrum occupancy information (SOI) to maximize the sum throughput of the system. We show that proper use of CSI and SOI can help better exploit the advantages of multiuser diversity, cooperative diversity, and opportunistic spectrum access. In the past, these issues have been addressed mostly from a centralized perspective, where a central controller is used to coordinate transmissions. This talk introduces ways to exploit these advantages in a decentralized fashion, allowing users to make independent transmission decisions based only on local information. For conventional wireless networks, the optimal uplink channel access policy which determines the transmission probability, rate, and power, is first derived by exploiting only the uplink CSI. For cooperative networks, a similar channel-aware policy and an associated partner selection policy can be derived based on both the uplink and interuser CSI. These concepts can also be applied to cognitive radio environments where transmission decisions should not only be made based on the channel quality but also on the spectrum occupancy.

Tan Lee, Chinese University of Hong Kong, Hong Kong

Tan Lee is an Associate Professor at the Department of Electronic Engineering, the Chinese University of Hong Kong (CUHK). He also holds a concurrent administrative position as the Associate Dean for Student Affairs, Faculty of Engineering, CUHK. Tan Lee has been working on speech and language related research since early 90s. His works cover many different areas, including automatic speech and speaker recognition, text-to-speech, speech enhancement, language identification, pathological speech analysis, hearing and speaking aids, and music signal processing. Tan Lee initiated and coordinated a number of pioneering projects on the research and development of Chinese spoken language technologies in Hong Kong. He led 8 projects funded by the General Research Funds (GRF) from the Hong Kong Research Grants Council (RGC). Tan Lee works closely with medical doctors, and speech and hearing professionals, in applying signal processing techniques to human communication disorder problems. He is the Director of the newly established Language and Communication Disorders Research Laboratory at CUHK Shenzhen Research Institute. Tan Lee was the Chairman of the IEEE Hong Kong Chapter of Signal Processing in 2005-2006. He is an associate editor of the EURASIP Journal on Advances in Signal Processing. Tan Lee received the CUHK Vice-Chancellor's Exemplary Teaching Award in 2004 and the Engineering Faculty's Exemplary Teaching Awards for multiple years during 2001-2009.


Lecture 1: Robust Pitch Estimation
Pitch is an important attribute of human voice. It carries abundant information that is pertinent to speech communication. Pitch estimation is a fundamental problem in many areas of speech research. Pitch estimation can be based on time-domain waveform periodicity, frequency-domain harmonicity, or both. For noise-corrupted speech, these signal characteristics would be distorted to certain extent, such that the conventional methods become unreliable or even completely fail. This lecture will start with a comparative review on the state-of-the-art approaches to robust pitch estimation. A new method of pitch estimation for speech signals at very low signal-to-noise ratios is presented. Robust temporal-spectral representation of pitch is derived by accumulating spectral peaks over consecutive time frames. Since the harmonic structure of voiced speech changes much more slowly than noise spectrum in neighboring time frames, spectral peaks related to pitch harmonics would stand out over the noise through the temporal accumulation. Prior knowledge of speech and noise is incorporated in a simple-yet-effective way with the use of sparse estimation techniques. The performance of the new method is compared with other existing algorithms on a wide variety of noise conditions.

Lecture 2: Dealing with Imperfections in Human Speech Communication
While speech is the most preferred and natural modality of communication for human beings, speech production and perception are extremely complicated processes that require inter-disciplinary knowledge to understand thoroughly. From signal processing points of view, we are interested in the unique properties of acoustic speech signals and investigate how these properties contribute to effective communication from the speaker to the listener. Along the human speech communication pathway, there are many imperfections that are related to disabilities of the speakers or listeners. Regardless of the causes of the disabilities, the acoustic speech signals transmitted from the speaker to the listener remain the most relevant physical observables, which can be intervened or manipulated by applying effective signal processing methods. In this lecture, I will describe some of our recent works on applying speech processing techniques to address various problems of speech and hearing disorders. They include cochlear implant processing strategies for improving speech perception of tonal languages, electrolarynx systems with pitch control, analysis and assessment of disordered voice and speech.

Lecture 3: Spoken Language Recognition with Prosodic Features
Spoken language recognition (SLR) refers to the process of automatically determining the language of a spoken utterance. It has many applications in computer processing of multimedia and multi-lingual information. In state-of-the-art SLR systems, cepstral features and their variants are commonly used. The use of another important component of human speech, namely prosody, is largely overlooked. Although it is generally agreed that prosodic features are useful to identify a spoken language, most of the previous attempts to prosody-based SLR were not quite successful. In this lecture, we argue that effective SLR requires the joint contributions of a comprehensive set of prosodic attributes that are derived to represent F0 and intensity contours, and segmental durations in many different ways. We consider not only a variety of acoustic measurements but also multiple variants of each raw measurement obtained by applying different normalization methods. The prosodic attributes are modeled by the bag of n-grams approach with support vector machine (SVM) as in conventional phonotactic SLR systems. Results of large-scale SLR experiments are reported to demonstrate the effectiveness of using prosodic features.

Lecture 4: Unsupervised Acoustic Modeling for Spoken Language Applications
Acoustic modeling is an important problem in many spoken language applications. It aims at providing compact yet accurate statistical representations for a set of sub-word units. Conventional acoustic modeling is a highly supervised process that requires plenty of speech data with transcriptions. Such resources may not be available for many languages and in many real-world situations. In this lecture, a new approach to unsupervised acoustic modeling is presented. Unlike conventional approaches that cluster speech frames directly, the new approach is based on clustering of Gaussian components that are estimated from untranscribed training data. The resulted acoustic models may not be linguistically meaningful, like conventional phoneme models in continuous speech recognition. However, these acoustic models are useful in many spoken language applications, when transcribed data are scarce.

Ekachai Leelarasmee, Chulalongkorn University, Thailand

Ekachai Leelarasmee received his Ph.D. in Electrical Engineering from University of California at Berkeley in 1982. He is currently the Deputy Director of International School of Engineering and Adjunct Professor in Electrical Engineering of Chulalongkorn University. He has served IEEE Thailand Section from 2004-present in various positions including Section Chair. He was awarded the outstanding volunteer awards by IEEE Region 10 in 2006 and is currently the president of IEEE Thailand Solid State Circuits Society Chapter and Technical Committee of Signal Processing Group of APSIPA. Dr. Ekachai has organized a few IEEE conferences as their Technical Program Chairs and has given a tutorial session in APSIPA 2013. His current interests includes integrated circuit and embedded system design.


Lecture 1: Tunable Quadrature Sinusoidal Waveform Generators
Quadrature sinusoidal waveforms are two sinusoidal signals that have 90 degree phase difference. They are needed in high data rate modulation schemes such as quadrature phase shift keying (QPSK) and etc. This lecture will talk about different techniques to design an oscillator that generates quadrature sinusoidal signals whose frequency can be adjusted electronically. Basic circuits such as ring, LC and relaxation oscillators will be revised to show the conditions for oscillation and how they can be modified to be frequency tunable as well as producing quadrature signals. Deviation from quadrature phase of each method will be discussed. Direct Digital Frequency Synthesizer in which very accurate frequency can be set and good quadrature phase will also be described.

Chia-Wen Lin, National Tsing Hua University, Taiwan


Chia-Wen Lin received his Ph.D. degree in electrical engineering from National Tsing Hua University (NTHU), Hsinchu, Taiwan, in 2000. He is currently an Associate Professor with the Department of Electrical Engineering and the Institute of Communications Engineering, NTHU. He was with the Department of Computer Science and Information Engineering, National Chung Cheng University, Taiwan, during 2000-2007. Prior to joining academia, he worked for the Information and Communications Research Laboratories, Industrial Technology Research Institute, Hsinchu, Taiwan, during 1992-2000. His research interests include image and video processing and video networking.Dr. Lin is a Steering Committee member of the IEEE Transactions on Multimedia. He is Associate Editor of the IEEE Transactions on Circuits and Systems for Video Technology, the IEEE Transactions on Multimedia, the IEEE Multimedia, and the Journal of Visual Communication and Image Representation. He is also an Area Editor of EURASIP Signal Processing: Image Communication. He is currently Chair of the Multimedia Systems and Applications Technical Committee of the IEEE Circuits and Systems Society. He is Steering Committee Member of IEEE International Conference on Multimedia & Expo (ICME). He served as Technical Program Co-Chair of ICME 2010, and Special Session Co-Chair of ICME 2009. His paper won Top 10% Paper Awards of IEEE MMSP 2013, and Young Investigator Award of VCIP 2005. He received the Young Faculty Awards presented by CCU in 2005 and the Young Investigator Awards presented by National Science Council, Taiwan, in 2006.


Lecture 1: Video Retargeting: Algorithms, Applications, and Quality Assessment
Video retargeting from a full-resolution video to a lower-resolution display will inevitably cause information loss. Content-aware video retargeting techniques have been studied to avoid critical visual information loss while resizing a video. Maintaining the spatio-temporal coherence of a retargeted video is very critical on visual quality. In this lecture, we will first show how to use a panoramic mosaic to guide the scaling of corresponding regions of video frames in a video shot to ensure good spatio-temporal coherence. Second, we will show how the proposed video retargeting scheme can be used to construct a scalable video coder which supports content-adaptive spatial scalability (e.g., the base-layer and enhancement-layer videos are of different resolution and different aspect ratios) with good coding efficiency. Finally, we will present an objective quality assessment scheme based on geometric distortion and information loss for automatically evaluating the visual quality of a retargeted image.

Lecture 2: Self-Learning-Based Structured Noise Removal in Image/Video
Decomposing an image into multiple semantic components has been an active research topic for various image processing applications, such as image denoising, enhancement, and inpainting. In this lecture, we introduce a unified framework for image decomposition based on the uses of sparsity and morphological diversity in image mixtures. We will show how the proposed framework can effectively deal with several image denoising tasks, including rain streaks removal, general denoising, joint super-resolution and deblocking for a highly compressed image. Based on advanced sparse representation and morphological diversity of images, the proposed framework first learns an over-complete dictionary from the high spatial frequency parts of an input image for reconstruction purposes. An unsupervised clustering technique is applied to the dictionary atoms for identifying the morphological component corresponding to the noise pattern of interest (e.g., rain streaks, blocking artifacts, or Gaussian noise). The proposed self-learning based approach allows one to identify and disregard the above morphological component during image reconstruction in an unsupervised way, and thus image denoising can be achieved in a fully automatic manner.

Nam Ling, Santa Clara University, USA


Nam Ling received the B.Eng. degree from Singapore and the M.S. and Ph.D. degrees from the U.S.A. He is currently the Sanfilippo Family Chair Professor of Santa Clara University (U.S.A) and the Chair of its Department of Computer Engineering. From 2002 to 2010, he was an Associate Dean for its School of Engineering. Currently, he is also a Consulting Professor for the National University of Singapore, a Guest Professor for Shanghai Jiao Tong University (China), and a Tsuiying Chair Professor for Lanzhou University (China). He has more than 170 publications (including books) in video coding and systolic arrays. He also has several adopted standards contributions and U.S. patents. He is an IEEE Fellow due to his contributions to video coding algorithms and architectures. He is also an IET Fellow. He was named IEEE Distinguished Lecturer twice and received the IEEE ICCE Best Paper Award (First Place). He is currently also an APSIPA Distinguished Lecturer. He received six awards from the University, four at the University level (Outstanding Achievement, Recent Achievement in Scholarship, President's Recognition, and Sustained Excellence in Scholarship) and two at the School/College level (Researcher of the Year and Teaching Excellence). He has served as Keynote Speakers for IEEE APCCAS, VCVP (twice), JCPC, IEEE ICAST, IEEE ICIEA, and IET FC & U-Media, as well as a Distinguished Speaker for IEEE ICIEA. He is/was General Chairs/Co Chairs for IEEE Hot Chips, VCVP (twice), IEEE ICME, and IET U-Media. He has also served as Technical Program Co Chairs for IEEE ISCAS, APSIPA ASC, IEEE APCCAS, IEEE SiPS (twice), DCV, and IEEE VCIP. He was Technical Committee Chairs for IEEE CASCOM TC and IEEE TCMM, and has served as Guest Editors/Associate Editors for IEEE TCAS I, IEEE J-STSP, Springer JSPS, and Springer MSSP journals. He has delivered more than 110 invited colloquia worldwide and has served as Visiting Professors/Consultants/Scientists for many institutions/companies.


Lecture 1: 3D Video Coding and its Related Research
In this talk, we will discuss the latest state-of-the-art 3D-High Efficiency Video Coding (3D-HEVC) technology and its related research. Following the success of HEVC jointly developed by the Joint Collaborative Team on Video Coding (JCT-VC) of ISO/IEC Motion Picture Experts Group (MPEG) and ITU-T Video Coding Experts Group (VCEG), a second team known as the Joint Collaborative Team on 3D Video (JCT-3V) focuses on developing its 3D extension. This technology goes beyond the traditional stereoscopic and multi-view representations of video and extends to include the use of depth maps and view synthesis. More powerful 3D capabilities coupled with much higher resolution and perceptual quality target toward devices and content that can be expected for theater, home, and mobile applications. In 3D-HEVC, coding of dependent views and depth maps, as well as synthesizing intermediate views, pose new challenges in achieving high coding efficiency while maintaining a reasonable computational complexity. In the talk, we will discuss the coding tools in 3D-HEVC, research challenges, and highlight some of our current related research.

Lecture 2: Research on High Efficiency Video Coding and its 3D Extension
In this talk, we will discuss the latest state-of-the-art video coding technology of High Efficiency Video Coding (HEVC) and related research. HEVC was jointly developed by the Joint Collaborative Team on Video Coding (JCT-VC) of ISO/IEC Motion Picture Experts Group (MPEG) and ITU-T Video Coding Experts Group (VCEG). HEVC development has achieved a 50% bit-rate reduction as compared to that of the current H.264/AVC video coding standard. Significant increase in resolution and perceptual quality can be expected for home and mobile video applications with devices and content using HEVC. Beyond 2D video, the work for the 3D extension of HEVC is in process and is expected to reach a finalizing stage in 2015. In the talk, we will discuss the coding tools in HEVC, research challenges addressing HEVC, and highlight some of our current research related to HEVC and its 3D extension.

Koichi Shinoda, Tokyo Institute of Technology, Japan


Koichi Shinoda received his B.S. in 1987 and his M.S. in 1989, both in physics, from the University of Tokyo. He received his D.Eng. in computer science from Tokyo Institute of Technology in 2001. In 1989, he joined NEC Corporation and was involved in research on automatic speech recognition. From 1997 to 1998, he was a visiting scholar with Bell Labs, Lucent Technologies. From 2001, he was an Associate Professor with the University of Tokyo. He is currently a Professor at the Tokyo Institute of Technology. His research interests include speech recognition, video information retrieval, and human interfaces. Dr. Shinoda received the Awaya Prize from the Acoustic Society of Japan in 1997 and the Excellent Paper Award from the Institute of Electronics, Information, and Communication Engineers (IEICE) in 1998. He is an Associate Editor of Computer Speech and Language and a Subject Editor of Speech Communication. He is a member of IEEE, ACM, ASJ, IEICE, IPSJ, and JSAI.


Lecture 1: Speaker adaptation techniques for speech recognition
Speaker adaptation techniques were extensively studied in early 90's and has been still one of the essential techniques in automatic speech recognition. They belong to one type of transfer learning, in which the parameters of a speaker-independent model are modified so that they fit the acoustic characteristics of an individual, with a small amount of his/her utterances. These techniques are successfully applied not only to the difference of speakers, but also to those of channels, noise environments, and so on. In this lecture, we first explain fundamental speaker adaptation techniques, Maximum A Posteriori (MAP) estimation, Maximum Likelihood Linear Regression (MLLR) , Eigenvoice, and then, how they are combined with each other and with the other training techniques such as discriminative learning and with structure learning, such as Structural MAP (SMAP) or SMAPLR. We also discuss how those techniques are applied to robust speech recognition under noisy environment.

Lecture 2: Robust video information retrieval using speech technologies
The amount of video data on the Internet has been rapidly increasing. Those video have large variety and in most case with low quality. Robust techniques for video indexing are strongly demanded. In automatic video semantic indexing, a user submits a textual input query for a desired object or a scene to a search system, which returns video shots that include the object or scene. In this application, many techniques developed in speech research have been successfully employed. For example, a new method using Gaussian-mixture model (GMM) supervectors and support vector machines (SVMs) was recently proven to be very effective. In this method, speech technologies such as speaker verification and speaker adaptation techniques play very important roles. In this lecture, we first introduce the activities of NIST TRECVID workshop which is a showcase of the state-of-the-art video search technologies, and then, discuss several techniques such as SIFT and HOG features, Bag of Visual Words, Fisher kernel, Multi-modal framework, and Fast tree search, to achieve robustness against the variety of the Internet video.

Hsin-Min Wang, Academia Sinica, Taiwan


Hsin-Min Wang received the B.S. and Ph.D. degrees in electrical engineering from National Taiwan University in 1989 and 1995, respectively.
In October 1995, he joined the Institute of Information Science, Academia Sinica, where he is now a research fellow and deputy director. He was an adjunct associate professor with National Taipei University of Technology and National Chengchi University. He currently serves as the president of the Association for Computational Linguistics and Chinese Language Processing (ACLCLP), a managing editor of Journal of Information Science and Engineering, and an editorial board member of International Journal of Computational Linguistics and Chinese Language Processing. His major research interests include spoken language processing, natural language processing, multimedia information retrieval, and pattern recognition.
Dr. Wang received the Chinese Institute of Engineers (CIE) Technical Paper Award in 1995 and the ACM Multimedia Grand Challenge First Prize in 2012. He is a life member of APSIPA, ACLCLP, and Institute of Information & Computing Machinery (IICM), a senior member of IEEE, and a member of the International Speech Communication Association (ISCA) and ACM.


Lecture 1: Emotion-based Audio Music Annotation and Retrieval
One of the most exciting but challenging endeavors in music research is to develop a computational model that comprehends the affective content of music signals and organizes a music collection according to emotion. Recently, we have proposed a novel acoustic emotion Gaussians (AEG) model that defines a proper generative process of emotion perception in music. As a generative model, AEG permits easy and straightforward interpretations of the model learning processes. To bridge the acoustic feature space and music emotion space, a set of latent feature classes, which are learned from data, is introduced to perform the end-to-end semantic mappings between the two spaces. Based on the space of latent feature classes, the AEG model is applicable to both automatic music emotion annotation and emotion-based music retrieval. This lecture will cover the topics of an ACM Multimedia 2012 full paper and its several related preceding and following conference papers.

Lecture 2: Social Tags-based Audio Music Annotation and Retrieval
Music tags correspond to keywords that people use to describe different aspects of a music clip, such as genre, mood, and instrument. With the explosive growth of digital music available on the Web, automatic music tagging, which can be used to annotate unknown music or retrieve desirable music, is becoming increasingly important. Audio tag classification is one of the evaluation tasks in the Music Information Retrieval Evaluation eXchange (MIREX) annual campaign. We have achieved good performance since our first participation in 2009. We have formulated the music tagging task as a novel cost-sensitive multi-label (CSML) learning problem. More recently, we have further developed a novel content-based query-by-tag music search system for an untagged music database. The new tag query interface allows users to input multiple tags with multiple levels of preference (denoted as an MTML query) by colorizing desired tags in a web-based tag cloud interface. To effect the MTML content-based music retrieval, a probabilistic fusion model (denoted as GMFM), which consists of two mixture models, namely a Gaussian mixture model and a multinomial mixture model, is used to jointly model the auditory features and tag labels of a song. Two indexing methods and their corresponding matching methods, namely pseudo song-based matching and tag affinity-based matching, are incorporated into the pre-learned GMFM. In this lecture, I will present our recent research results on automatic music tagging and tags-based music retrieval.