APSIPA
Distinguished Lecturers
( 1 January 2014 - 31 December 2015) |
|
Nam Ik
Cho, Seoul National University,
Korea |
Biography:
Nam Ik Cho received the B.S., M.S., and Ph.D. degrees in control
and instrumentation engineering from Seoul National University,
Seoul, Korea, in 1986, 1988, and 1992, respectively. From
1991 to 1993, he was a Research Associate of the Engineering
Research Center for Advanced Control and Instrumentation,
Seoul National University. From 1994 to 1998, he was with
the University of Seoul, Seoul, Korea, as an Assistant Professor
of Electrical Engineering. He worked as a visiting scholar
in the Dept. of Electrical Eng., University of California,
Santa Barbara in 1996. He joined the Dept. of Electrical and
Computer Engineering, Seoul National University, in 1999,
where he is currently a Professor. During 2011-2013, he served
as a Vice Dean of the College of Engineering, Seoul National
University. His research interests include image processing,
adaptive filtering and computer vision. In this area, he published
about 60 journal papers, 100+ conference papers, and 10 US
Patents.
Dr. Cho is currently a senior member of the IEEE Signal Processing
Society, and serving as a handling editor of Signal Processing,
Elsevier. He was the special session chair of IEEE BMSB2012
and the finace chair of IEEE ISPACS2004. He received Chaster
Sall Award (2014) from IEEE Consumer Electronics Society.
Lectures:
Lecture 1: Hierarchical prediction for lossless image compression
Hierarchical prediction for lossless image compression Abstract:
A new hierarchical prediction scheme is presented for the
lossless compression of color images and color filter array
(CFA) images. For the lossless compression of an RGB image,
it is first decorrelated by a reversible color transform and
then Y component is encoded by a conventional lossless grayscale
image compression method. For encoding the chrominance images,
a hierarchical scheme that enables the use of upper, left
and lower pixels for the pixel prediction is proposed, whereas
the conventional raster scan prediction methods use upper
and left pixels. An appropriate context model for the prediction
error is also defined and the arithmetic coding is applied
to the error signal corresponding to each context. For several
sets of images, it is shown that the proposed method further
reduces the bit rates compared to JPEG2000 and JPEG-XR. For
the CFA compression, the CFA data is subsampled and encoded
in order, where each of subimages contains only one kind of
color components among R, G, or B in the case of Bayer CFA
image. By subsampling, the green pixels are separated into
two sets, one of which is encoded by a conventional grayscale
encoder. Then the green pixels in other set are predicted
from the already encoded greens. Then all of greens are used
for the prediction of red, and all of these pixels are used
for the blue prediction. In this process, the predictors are
designed considering the direction of edges in the neighborhood.
By gathering some information from the prediction process
such as edge activities and neighboring errors, the magnitude
of prediction error is also estimated. From this, the pdf
of prediction error conditioned on neighboring pixels (context)
is estimated, and context-adaptive arithmetic encoding is
applied to reduce the resulting bits further. Experimental
results on real and simulated CFA images show that the proposed
method produces less bpp than the conventional lossless image
compression methods and also recently developed lossless CFA
compression algorithms. The proposed lecture is a condensed
version of following papers:.
-
S. Kim and N. I. Cho, "Hierarchical prediction and
context adaptive coding for lossless color image compression,"
IEEE Transactions on Image processing, Jan. 2014
-
S. Kim and N. I. Cho, "Lossless compression of color
filter array images by hierarchical prediction and context
modeling," IEEE Trans. Circuits and Systems for Video
Technology, accepted.
Lecture
2: Camera captured document image processing for enhancing
visual quality and recognition rates
Rectification methods for the document images are presented
here. Specifically, the algorithms that remove geometric distortions
in camera-captured document images are studied. Unlike conventional
methods, we formulate the document dewarping as an optimization
problem defined on discrete representations of text-blocks
and text-lines. We model the geometric distortions caused
by cameras and curved surfaces as generalized cylindrical
surface and camera rotation, and develop a cost function whose
minimization yields the parameters of the models. Our cost
function is based on geometric relations between a point in
an input image and its corresponding point in the rectified
image. We also encode the constraints on text-lines and text-blocks
in rectified images into the cost function. Due to discrete
representation of textlines, our cost function is well-defined
and minimized via the Levenberg-Marquadt algorithm. Since
our method does not depend on special assumptions on inputs,
it works for various layouts and handles curved surfaces as
well as planes in the same framework. Moreover, we extend
our approach to unfolded book surfaces. Experimental results
show that our method works for very challenging cases and
compares favorably with conventional methods. This presentation
is a condensed version of following papers with some new results:
-
Hyung Il Koo and Nam Ik Cho, "Skew estimation of natural
images based on a salient line detector," Journal of
Electronic Imaging, Vol. 22, no. 1, pp. 013020-013020, January
2013.
-
Hyung Il Koo, and Nam Ik Cho, "Text-Line Extraction
in Handwritten Chinese Documents Based on an Energy Minimization
Framework," IEEE Transactions on Image Processing,
Vol. 21, No. 3, pp. 1169-1175, March 2012.
-
Hyung Il Koo, Jinho Kim, and Nam Ik Cho, "Composition
of a Dewarped and Enhanced Document Image From Two View
Images," IEEE Trans. Image Processing, Vol. 18, No.
7, pp. 1551-1562, July 2009.
|
|
|
Minh N. Do,
University of Illinois at Urbana-Champaign, USA
|
Biography:
Minh N. Do is an Associate Professor in the Department of Electrical
and Computer Engineering at the University of Illinois at Urbana-Champaign
(UIUC). His work covers image and multi-dimensional signal processing,
wavelets and multiscale geometric analysis, computational imaging,
and visual information representation, and has led to about
45 journal papers. He received a Silver Medal from the 32nd
International Mathematical Olympiad in 1991, a University Medal
from the University of Canberra in 1997, a Doctorate Award from
the Swiss Federal Institute of Technology Lausanne (EPFL) in
2001, a CAREER Award from the National Science Foundation in
2003, and a Young Author Best Paper Award from IEEE in 2008.
He was named a Beckman Fellow at the Center for Advanced Study,
UIUC, in 2006, and received of a Xerox Award for Faculty Research
from the College of Engineering, UIUC, in 2007. He was an Associate
Editor of the IEEE Transactions on Image Processing, and a member
of the IEEE Technical Committees on Signal Processing Theory
and Methods, and on Image, Video, and Multidimensional Signal
Processing. He was elevated to be an IEEE Fellow in 2014 for
contributions to image representation and computational imaging.
He is a co-founder and Chief Scientist of Personify (formerly
Nuvixa), a spin-off from UIUC to commercialize depth-based visual
communication.
Lectures:
Lecture 1: Beyond Wavelets: Directional Multiresolution Image
Representation
Efficient representation of visual information lies at the foundation
of many image processing tasks, including compression, filtering,
feature extraction, and inverse problems. Efficiency of a representation
refers to its power to capture significant information of objects
of interest using a compact description. For practical applications,
this representation has to be realized by structured transforms
and fast algorithms.
In this lecture, after demonstrating that the commonly used
separable wavelet transforms are inadequate for processing pictorial
information, I will present our recent construction of a "true"
two-dimensional (2-D) representations that can capture the intrinsic
geometrical structure of natural images. The resulting image
expansion is composed of contour segments, and thus is named
contourlets. The contourlet transform has a fast iterated filter
bank algorithm, a precise connection between the continuous
and discrete domains using multiresolution analysis, and the
optimal approximation rate for 2-D piecewise smooth functions
with discontinuities along smooth curves. Experiments with real
images indicate the potential of contourlets in image processing
applications. Furthermore, by utilizing ideas from harmonic
analysis, visual perception, computer vision, and signal processing,
we look for a new fruitful interaction between these fields.
Lecture 2: Immersive Visual Communication with Depth
The ubiquity of digital cameras has made a great impact on visual
communication as can be seen from the explosive growth of visual
contents on the Internet and the default inclusion of a digital
camera on cellphones and laptops. The recent emerging of low-cost
and fast depth cameras (e.g. Kinect) provides a great opportunity
to revolutionize visual communication further by enabling immersive
and interactive capabilities. Depth measurements provide perfect
complementary information to the traditional color imaging in
capturing the three-dimensional (3D) scene. By effectively integrate
color and depth information, we develop real-time systems that
capture the live scene and render 3D free-viewpoint videos,
and potentially augment the captured scene with 3D virtual worlds.
Such system can provide unprecedented immersive and interactive
3D viewing experiences for personalized distance learning and
tele-presence.
Lecture 3: Computational Imaging: From Formation to Processing
As more than 70 percent of the human body's sensors are in the
eyes and video is estimated to represent 90 percent of all Internet
traffic by 2016, imaging and video technology are going to be
major application areas in personal computing. In this lecture,
I will describe some of our current efforts in developing computational
techniques for imaging and videography. These include depth-image-based
rendering, interpolation and upconversion, coded exposure, deconvolution
and restoration, inverse rendering, and relighting.
|
|
|
Woon-Seng Gan, Nanyang Technological University, Singapore
|
Biography:
Woon-Seng
Gan received his BEng (1st Class Hons) and PhD degrees, both
in Electrical and Electronic Engineering from the University
of Strathclyde, UK in 1989 and 1993 respectively. He is currently
an Associate Professor and the Head of Information Engineering
Division, School of Electrical and Electronic Engineering in
Nanyang Technological University. His research interests span
a wide and related areas of adaptive signal processing, active
noise control, directional sound system, psycho-acoustical signal
processing, and real-time embedded systems.
Dr. Gan has published more than 200 international refereed journals
and conferences, and has granted five Singapore/US patents.
He had co-authored a book on Digital Signal Processors: Architectures,
Implementations, and Applications (Prentice Hall, 2005). He
was also the leading author of a new book on Embedded Signal
Processing with the Micro Signal Architecture, (Wiley-IEEE,
2007). A book on Subband Adaptive Filtering: Theory and Implementation
was also published by John Wiley in August 2009. He had also
authored a book chapter in Rick Lyon's latest book on Streamlining
Digital Signal Processing: A Trick of the Trade Guidebook, 2nd
Edition, published by Wiley-IEEE press, 2012.
He is currently a Fellow of the Audio Engineering Society(AES),
a Fellow of the Institute of Engineering and Technology(IET),
a Senior Member of the IEEE, and a Professional Engineer of
Singapore. In 2012, he became the Series Editor of the new SpringerBriefs
in Signal Processing. He is also an Associate Technical Editor
of the Journal of Audio Engineering Society (JAES); Associate
Editor of the IEEE Transaction on Audio, Speech, and Language
Processing (ASLP); Editorial member of the Asia Pacific Signal
and Information Processing Association (APSIPA) Transactions
on Signal and Information Processing; and Associate Editor of
the EURASIP Journal on Audio, Speech and Music Processing. He
is currently a technical committee member of the Design and
Implementation of Signal Processing Systems (DiSPS), and the
Industry DSP Technology (IDSP) standing committee of the IEEE
Signal Processing Society. He is a member of the Board of Governor
of the APSIPA.
Lectures:
Lecture 1: Audio Projection: Directional Sound and Its Applications
in Immersive Communication
The proposed lecture was a condensed version of a previous APSIPA
tutorial (2012). It reviews the historical development leading
up to the modern-day directional loudspeakers based on parametric
acoustic array effect. Key challenges associated with performance
of emitters and applications will be addressed. In particular,
we present signal processing techniques to overcome some physical
limitations inherent in this type of directional loudspeakers.
Some of the new results will also be presented to chart the
roadmap for future advancement in this new area of sound projection.
In addition, we examine the needs of psychoacoustic processing
to enhance the 3D spatial perception and audio quality of parametric
loudspeakers. I will also review some of the new works that
are currently being carried out by different research groups.
Finally, the significance in using parametric loudspeakers in
immersive communication will be described and I will put forward
some research challenges in parametric loudspeakers and how
signal processing techniques may help to overcome these challenges.
Lecture 2: Subband Adaptive Filtering: Theory and Implementation
This lecture describes, analyzes, and generalizes a new class
of subband adaptive filters (SAFs), called the normalized SAF
(NSAF), whereby the adaptive filter is no longer separated into
subfilters. Instead, subband signals, which are normalized by
their respective subband input variance, are used to adapt the
fullband tap weights of a modeling filter. The modeling filter
is placed before the decimators, which is different from that
of the conventional structure where a set of subfilters are
placed after the decimators. In such a configuration, the modeling
filter operates on a set of subband input signals at the original
sampling rate. The weight adjustment applied on the modeling
filter, at each iteration, is a linear combination of normalized
subband regressors. Implicitly, the weight-control mechanism
is driven by an equalized input spectrum, which is a composite
of the normalized version of the contiguous spectral bands of
the original spectrum. The equalized spectrum accelerates the
convergence due to its reduced spectral dynamic range, while
avoiding any possible aliasing and band-edge effects. Moreover,
computational reduction can be attained by decimating the adaptation
rate of the modeling filter, with the convergence properties
remain intact.
Lecture 3: Recent Work on 3-D Audio Rendering for Headphones
In this lecture, we examine how 3D sound can be rendered over
the headphones, and examine some of the key problems in creating
realistic 3D sound over headphones. We examine some of the latest
techniques in creating a highly personalized and realistic 3D
sound and how these new techniques can overcome existing problems
and ready for prime time usage in gaming, entertainment, and
augmented reality. In particular, we outline one of our latest
works in creating a personalized 3D audio headphones that does
not require extensive training to customize audio cues to the
geometry of the listeners' ear, and simply rely on emitters'
geometry and positioning, and include cue extraction techniques
to create a highly immersive and enhanced sound stage in headphone
listening.
Lecture 4: Recent advances on active noise control: open issues and innovative applications
In this lecture, we briefly reviewed broadband and narrowband feedforward and adaptive feedback ANC systems with focus on signal processing algorithms. We focused on the introduction of the recent research and development in the last decade after detailed tutorial publications. In particular, we introduced the audio-integrated algorithm and the concepts of psychoacoustics and virtual sensing for ANC. In this talk, we also comprehensively reviewed online secondary-path modeling techniques and ANC without the secondary-path model, which remain critical for some practical applications. Finally, we highlighted some ANC applications in medical and consumer electronics fields, which are important for motivating new ANC applications in addition to traditional applications in industry and transportation. We also identified many related difficulties and open research issues in each section. This lecture is based on an overview paper published in the APSIPA Transaction in 2012.
|
|
|
Yao-Win
Peter
Hong, National
Tsing Hua University, Taiwan |
Biography:
Y.-W. Peter Hong received his B.S. degree in Electrical Engineering
from National Taiwan University, Taipei, Taiwan, in 1999, and
his Ph.D. degree in Electrical Engineering from Cornell University,
Ithaca, NY, in 2005. He joined the Institute of Communications
Engineering and the Department of Electrical Engineering at
National Tsing Hua University, Hsinchu, Taiwan, in Fall 2005,
where he is now a Full Professor. His research interests include
physical layer secrecy, cooperative communications, distributed
signal processing for sensor networks, and PHY-MAC cross-layer
designs for wireless networks.
Dr. Hong received the best paper award for young authors from
the IEEE IT/COM Society Taipei/Tainan Chapter in 2005, the best
paper award in MILCOM 2005, the Junior Faculty Research Award
from the College of EECS and from National Tsing Hua University
in 2009 and 2010, respectively. He also received the IEEE Communication
Society Asia-Pacific Outstanding Young Researcher Award in 2010,
the Y. Z. Hsu Scientific Paper Award and the National Science
Councile (NSC) Wu Ta-You Memorial Award in 2011, and the Chinese
Institute of Electrical Engineering (CIEE) Outstanding Young
Electrical Engineer Award in 2012. His coauthored paper received
the Best Paper Award from the Asia-Pacific Signal and Information
Processing Association Annual Summit and Conference (APSIPA
ASC) in 2013.
Dr. Hong is currently an Associate Editor for IEEE Transactions
on Signal Processing and IEEE Transactions on Information Forensics
and Security. He is also the leading coauthor of the books "Cooperative
Communications and Networking: Technologies and System Design"
(with W.-J. Huang and C.-C. Jay Kuo) and "Signal Processing
Approaches to Secure Physical Layer Communications in Multi-Antenna
Wireless Systems" (with P.-C. Lan and C.-C. Jay Kuo) published
by Springer in 2010 and in 2013, respectively. Dr. Hong has
also served as guest editor of EURASIP Special Issue on Cooperative
MIMO Multicell Networks and of IJSNET Special Issue on Advances
in Theory and Applications of Wireless, Ad Hoc, and Sensor Networks.
Lectures:
Lecture 1: MIMO Signal Processing Techniques to Enhance Physical
Layer Secrecy in Wireless Communications
This talk provides an overview of signal processing techniques
used to enhance physical layer secrecy in the data transmission
phase of multi-antenna wireless communication systems. Wireless
physical layer secrecy has attracted much attention in recent
years due to the broadcast nature of the wireless medium and
its inherent vulnerability to eavesdropping. Motivated by results
in information theory, signal processing techniques have been
developed to enlarge the signal quality difference at the destination(s)
and the eavesdropper(s). In particular, in the data transmission
phase, secrecy beamforming and precoding schemes as well as
the use of artificial noise have been used to enhance signal
quality at the destination while limiting the signal strength
at the eavesdropper. These techniques will first be introduced
for point-to-point MIMO system as well as systems with multiple
destinations and eavesdroppers. Then, the techniques will be
further extended to wireless relay systems, where additional
spatial degrees of freedom are provided and additional security
threats are present. This talk is part of Dr. Hong's tutorial
provided at APSIPA ASC 2013.
Lecture 2: Discriminatory Channel Estimation - A Training
and Channel Estimation Approach to Enhance Physical Layer Secrecy
Due to the broadcast nature of the wireless medium, secrecy
considerations have become increasingly important in the design
of wireless communication systems. Motivated by results in information
theory, signal processing techniques have been developed to
enlarge the signal quality difference at the legitimate receiver
(LR) and the eavesdropper (or unauthorized receiver (UR)). In
this talk, a secrecy-enhancing training procedure, named the
discriminatory channel estimation (DCE) scheme, will be introduced
as a channel estimation approach to achieve this task. In contrast
to most studies on physical layer secrecy, which focus on the
data transmission phase, studies on DCE focus on the channel
estimation phase and aim to provide a practical signal processing
technique to discriminate between the channel estimation performances
at LR and UR. By doing so, the difference between the effective
signal-to-noise ratios of the two users can be enhanced and,
thus, leaving more room for secrecy coding or modulation schemes
in the data transmission phase. A key feature of DCE designs
is the insertion of artificial noise (AN) in the training signal
to degrade the channel estimation performance at UR. Two DCE
schemes will be discussed, namely, the feedback-and-retraining
and the two-way training based DCE schemes. In both of these
schemes, the optimal power allocation between training data
and AN is determined by minimizing the normalized mean squared
error (NMSE) of LR subject to a lower limit constraint on the
NMSE of UR.
Lecture 3: Energy Harvesting Communications for Wireless
Cellular and Sensor Networks
Recent advances in energy harvesting technology have enabled
the development of wireless devices that are able to support
their own operations through the collection of ambient energy,
e.g., from solar, wind, vibrational, or thermal sources. This
helps reduce the energy cost, increase the deployment flexibility,
and prolong the lifetime of wireless communication devices.
However, when relying on energy harvested from the environment,
the strict energy causality and energy storage constraints as
well as the randomness of the energy arrival may have a significant
impact on the communication performance and design. This talk
gives an overview of recent advances and challenges in this
research area, and introduces our solution to some of these
problems. For wireless cellular networks, energy-aware rate
and power allocation policies, beamforming, and scheduling policies
will be discussed. For wireless sensor networks, energy-aware
relaying (or information forwarding) and sensor deployment schemes
will be discussed, especially for distributed parameter estimation
problems.
Lecture 4: Distributed Channel-Aware Transmission Strategies
for Conventional, Cooperative and Cognitive Wireless Networks
This talk introduces ways to incorporate channel state information
in the uplink channel access policies of conventional, cooperative,
and cognitive wireless networks, using a decentralized random-access
approach. Here, users are allowed to make transmission decisions
based on local channel state information (CSI) and/or spectrum
occupancy information (SOI) to maximize the sum throughput of
the system. We show that proper use of CSI and SOI can help
better exploit the advantages of multiuser diversity, cooperative
diversity, and opportunistic spectrum access. In the past, these
issues have been addressed mostly from a centralized perspective,
where a central controller is used to coordinate transmissions.
This talk introduces ways to exploit these advantages in a decentralized
fashion, allowing users to make independent transmission decisions
based only on local information. For conventional wireless networks,
the optimal uplink channel access policy which determines the
transmission probability, rate, and power, is first derived
by exploiting only the uplink CSI. For cooperative networks,
a similar channel-aware policy and an associated partner selection
policy can be derived based on both the uplink and interuser
CSI. These concepts can also be applied to cognitive radio environments
where transmission decisions should not only be made based on
the channel quality but also on the spectrum occupancy.
|
|
|
Tan Lee, Chinese
University of Hong Kong, Hong Kong
|
Biography:
Tan Lee is an Associate Professor at the Department of Electronic
Engineering, the Chinese University of Hong Kong (CUHK). He
also holds a concurrent administrative position as the Associate
Dean for Student Affairs, Faculty of Engineering, CUHK. Tan
Lee has been working on speech and language related research
since early 90s. His works cover many different areas, including
automatic speech and speaker recognition, text-to-speech, speech
enhancement, language identification, pathological speech analysis,
hearing and speaking aids, and music signal processing. Tan
Lee initiated and coordinated a number of pioneering projects
on the research and development of Chinese spoken language technologies
in Hong Kong. He led 8 projects funded by the General Research
Funds (GRF) from the Hong Kong Research Grants Council (RGC).
Tan Lee works closely with medical doctors, and speech and hearing
professionals, in applying signal processing techniques to human
communication disorder problems. He is the Director of the newly
established Language and Communication Disorders Research Laboratory
at CUHK Shenzhen Research Institute. Tan Lee was the Chairman
of the IEEE Hong Kong Chapter of Signal Processing in 2005-2006.
He is an associate editor of the EURASIP Journal on Advances
in Signal Processing. Tan Lee received the CUHK Vice-Chancellor's
Exemplary Teaching Award in 2004 and the Engineering Faculty's
Exemplary Teaching Awards for multiple years during 2001-2009.
Lectures:
Lecture 1: Robust Pitch Estimation
Pitch is an important attribute of human voice. It carries abundant
information that is pertinent to speech communication. Pitch
estimation is a fundamental problem in many areas of speech
research. Pitch estimation can be based on time-domain waveform
periodicity, frequency-domain harmonicity, or both. For noise-corrupted
speech, these signal characteristics would be distorted to certain
extent, such that the conventional methods become unreliable
or even completely fail. This lecture will start with a comparative
review on the state-of-the-art approaches to robust pitch estimation.
A new method of pitch estimation for speech signals at very
low signal-to-noise ratios is presented. Robust temporal-spectral
representation of pitch is derived by accumulating spectral
peaks over consecutive time frames. Since the harmonic structure
of voiced speech changes much more slowly than noise spectrum
in neighboring time frames, spectral peaks related to pitch
harmonics would stand out over the noise through the temporal
accumulation. Prior knowledge of speech and noise is incorporated
in a simple-yet-effective way with the use of sparse estimation
techniques. The performance of the new method is compared with
other existing algorithms on a wide variety of noise conditions.
Lecture 2: Dealing with Imperfections in Human Speech Communication
While speech is the most preferred and natural modality of communication
for human beings, speech production and perception are extremely
complicated processes that require inter-disciplinary knowledge
to understand thoroughly. From signal processing points of view,
we are interested in the unique properties of acoustic speech
signals and investigate how these properties contribute to effective
communication from the speaker to the listener. Along the human
speech communication pathway, there are many imperfections that
are related to disabilities of the speakers or listeners. Regardless
of the causes of the disabilities, the acoustic speech signals
transmitted from the speaker to the listener remain the most
relevant physical observables, which can be intervened or manipulated
by applying effective signal processing methods. In this lecture,
I will describe some of our recent works on applying speech
processing techniques to address various problems of speech
and hearing disorders. They include cochlear implant processing
strategies for improving speech perception of tonal languages,
electrolarynx systems with pitch control, analysis and assessment
of disordered voice and speech.
Lecture 3: Spoken Language Recognition with Prosodic Features
Spoken language recognition (SLR) refers to the process of automatically
determining the language of a spoken utterance. It has many
applications in computer processing of multimedia and multi-lingual
information. In state-of-the-art SLR systems, cepstral features
and their variants are commonly used. The use of another important
component of human speech, namely prosody, is largely overlooked.
Although it is generally agreed that prosodic features are useful
to identify a spoken language, most of the previous attempts
to prosody-based SLR were not quite successful. In this lecture,
we argue that effective SLR requires the joint contributions
of a comprehensive set of prosodic attributes that are derived
to represent F0 and intensity contours, and segmental durations
in many different ways. We consider not only a variety of acoustic
measurements but also multiple variants of each raw measurement
obtained by applying different normalization methods. The prosodic
attributes are modeled by the bag of n-grams approach with support
vector machine (SVM) as in conventional phonotactic SLR systems.
Results of large-scale SLR experiments are reported to demonstrate
the effectiveness of using prosodic features.
Lecture 4: Unsupervised Acoustic Modeling for Spoken Language
Applications
Acoustic modeling is an important problem in many spoken language
applications. It aims at providing compact yet accurate statistical
representations for a set of sub-word units. Conventional acoustic
modeling is a highly supervised process that requires plenty
of speech data with transcriptions. Such resources may not be
available for many languages and in many real-world situations.
In this lecture, a new approach to unsupervised acoustic modeling
is presented. Unlike conventional approaches that cluster speech
frames directly, the new approach is based on clustering of
Gaussian components that are estimated from untranscribed training
data. The resulted acoustic models may not be linguistically
meaningful, like conventional phoneme models in continuous speech
recognition. However, these acoustic models are useful in many
spoken language applications, when transcribed data are scarce.
|
|
|
Ekachai Leelarasmee,
Chulalongkorn University, Thailand
|
Biography:
Ekachai Leelarasmee received his Ph.D. in Electrical Engineering
from University of California at Berkeley in 1982. He is currently
the Deputy Director of International School of Engineering and
Adjunct Professor in Electrical Engineering of Chulalongkorn
University. He has served IEEE Thailand Section from 2004-present
in various positions including Section Chair. He was awarded
the outstanding volunteer awards by IEEE Region 10 in 2006 and
is currently the president of IEEE Thailand Solid State Circuits
Society Chapter and Technical Committee of Signal Processing
Group of APSIPA. Dr. Ekachai has organized a few IEEE conferences
as their Technical Program Chairs and has given a tutorial session
in APSIPA 2013. His current interests includes integrated circuit
and embedded system design.
Lectures:
Lecture 1: Tunable Quadrature Sinusoidal Waveform Generators
Quadrature sinusoidal waveforms are two sinusoidal signals that
have 90 degree phase difference. They are needed in high data
rate modulation schemes such as quadrature phase shift keying
(QPSK) and etc. This lecture will talk about different techniques
to design an oscillator that generates quadrature sinusoidal
signals whose frequency can be adjusted electronically. Basic
circuits such as ring, LC and relaxation oscillators will be
revised to show the conditions for oscillation and how they
can be modified to be frequency tunable as well as producing
quadrature signals. Deviation from quadrature phase of each
method will be discussed. Direct Digital Frequency Synthesizer
in which very accurate frequency can be set and good quadrature
phase will also be described.
|
|
|
Chia-Wen Lin, National
Tsing Hua University, Taiwan
|
Biography:
Chia-Wen Lin received his Ph.D. degree in electrical engineering
from National Tsing Hua University (NTHU), Hsinchu, Taiwan,
in 2000. He is currently an Associate Professor with the Department
of Electrical Engineering and the Institute of Communications
Engineering, NTHU. He was with the Department of Computer Science
and Information Engineering, National Chung Cheng University,
Taiwan, during 2000-2007. Prior to joining academia, he worked
for the Information and Communications Research Laboratories,
Industrial Technology Research Institute, Hsinchu, Taiwan, during
1992-2000. His research interests include image and video processing
and video networking.Dr. Lin is a Steering Committee member
of the IEEE Transactions on Multimedia. He is Associate Editor
of the IEEE Transactions on Circuits and Systems for Video Technology,
the IEEE Transactions on Multimedia, the IEEE Multimedia, and
the Journal of Visual Communication and Image Representation.
He is also an Area Editor of EURASIP Signal Processing: Image
Communication. He is currently Chair of the Multimedia Systems
and Applications Technical Committee of the IEEE Circuits and
Systems Society. He is Steering Committee Member of IEEE International
Conference on Multimedia & Expo (ICME). He served as Technical
Program Co-Chair of ICME 2010, and Special Session Co-Chair
of ICME 2009. His paper won Top 10% Paper Awards of IEEE MMSP
2013, and Young Investigator Award of VCIP 2005. He received
the Young Faculty Awards presented by CCU in 2005 and the Young
Investigator Awards presented by National Science Council, Taiwan,
in 2006.
Lectures:
Lecture 1: Video Retargeting: Algorithms, Applications, and
Quality Assessment
Video retargeting from a full-resolution video to a lower-resolution
display will inevitably cause information loss. Content-aware
video retargeting techniques have been studied to avoid critical
visual information loss while resizing a video. Maintaining
the spatio-temporal coherence of a retargeted video is very
critical on visual quality. In this lecture, we will first show
how to use a panoramic mosaic to guide the scaling of corresponding
regions of video frames in a video shot to ensure good spatio-temporal
coherence. Second, we will show how the proposed video retargeting
scheme can be used to construct a scalable video coder which
supports content-adaptive spatial scalability (e.g., the base-layer
and enhancement-layer videos are of different resolution and
different aspect ratios) with good coding efficiency. Finally,
we will present an objective quality assessment scheme based
on geometric distortion and information loss for automatically
evaluating the visual quality of a retargeted image.
Lecture 2: Self-Learning-Based Structured Noise Removal in
Image/Video
Decomposing an image into multiple semantic components has been
an active research topic for various image processing applications,
such as image denoising, enhancement, and inpainting. In this
lecture, we introduce a unified framework for image decomposition
based on the uses of sparsity and morphological diversity in
image mixtures. We will show how the proposed framework can
effectively deal with several image denoising tasks, including
rain streaks removal, general denoising, joint super-resolution
and deblocking for a highly compressed image. Based on advanced
sparse representation and morphological diversity of images,
the proposed framework first learns an over-complete dictionary
from the high spatial frequency parts of an input image for
reconstruction purposes. An unsupervised clustering technique
is applied to the dictionary atoms for identifying the morphological
component corresponding to the noise pattern of interest (e.g.,
rain streaks, blocking artifacts, or Gaussian noise). The proposed
self-learning based approach allows one to identify and disregard
the above morphological component during image reconstruction
in an unsupervised way, and thus image denoising can be achieved
in a fully automatic manner.
|
|
|
Nam Ling, Santa Clara University, USA |
Biography:
Nam Ling received the B.Eng. degree from Singapore and the M.S.
and Ph.D. degrees from the U.S.A. He is currently the Sanfilippo
Family Chair Professor of Santa Clara University (U.S.A) and
the Chair of its Department of Computer Engineering. From 2002
to 2010, he was an Associate Dean for its School of Engineering.
Currently, he is also a Consulting Professor for the National
University of Singapore, a Guest Professor for Shanghai Jiao
Tong University (China), and a Tsuiying Chair Professor for
Lanzhou University (China). He has more than 170 publications
(including books) in video coding and systolic arrays. He also
has several adopted standards contributions and U.S. patents.
He is an IEEE Fellow due to his contributions to video coding
algorithms and architectures. He is also an IET Fellow. He was
named IEEE Distinguished Lecturer twice and received the IEEE
ICCE Best Paper Award (First Place). He is currently also an
APSIPA Distinguished Lecturer. He received six awards from the
University, four at the University level (Outstanding Achievement,
Recent Achievement in Scholarship, President's Recognition,
and Sustained Excellence in Scholarship) and two at the School/College
level (Researcher of the Year and Teaching Excellence). He has
served as Keynote Speakers for IEEE APCCAS, VCVP (twice), JCPC,
IEEE ICAST, IEEE ICIEA, and IET FC & U-Media, as well as
a Distinguished Speaker for IEEE ICIEA. He is/was General Chairs/Co
Chairs for IEEE Hot Chips, VCVP (twice), IEEE ICME, and IET
U-Media. He has also served as Technical Program Co Chairs for
IEEE ISCAS, APSIPA ASC, IEEE APCCAS, IEEE SiPS (twice), DCV,
and IEEE VCIP. He was Technical Committee Chairs for IEEE CASCOM
TC and IEEE TCMM, and has served as Guest Editors/Associate
Editors for IEEE TCAS I, IEEE J-STSP, Springer JSPS, and Springer
MSSP journals. He has delivered more than 110 invited colloquia
worldwide and has served as Visiting Professors/Consultants/Scientists
for many institutions/companies.
Lectures:
Lecture 1: 3D Video Coding and its Related Research
In this talk, we will discuss the latest state-of-the-art 3D-High
Efficiency Video Coding (3D-HEVC) technology and its related
research. Following the success of HEVC jointly developed by
the Joint Collaborative Team on Video Coding (JCT-VC) of ISO/IEC
Motion Picture Experts Group (MPEG) and ITU-T Video Coding Experts
Group (VCEG), a second team known as the Joint Collaborative
Team on 3D Video (JCT-3V) focuses on developing its 3D extension.
This technology goes beyond the traditional stereoscopic and
multi-view representations of video and extends to include the
use of depth maps and view synthesis. More powerful 3D capabilities
coupled with much higher resolution and perceptual quality target
toward devices and content that can be expected for theater,
home, and mobile applications. In 3D-HEVC, coding of dependent
views and depth maps, as well as synthesizing intermediate views,
pose new challenges in achieving high coding efficiency while
maintaining a reasonable computational complexity. In the talk,
we will discuss the coding tools in 3D-HEVC, research challenges,
and highlight some of our current related research.
Lecture 2: Research on High Efficiency Video Coding and its
3D Extension
In this talk, we will discuss the latest state-of-the-art video
coding technology of High Efficiency Video Coding (HEVC) and
related research. HEVC was jointly developed by the Joint Collaborative
Team on Video Coding (JCT-VC) of ISO/IEC Motion Picture Experts
Group (MPEG) and ITU-T Video Coding Experts Group (VCEG). HEVC
development has achieved a 50% bit-rate reduction as compared
to that of the current H.264/AVC video coding standard. Significant
increase in resolution and perceptual quality can be expected
for home and mobile video applications with devices and content
using HEVC. Beyond 2D video, the work for the 3D extension of
HEVC is in process and is expected to reach a finalizing stage
in 2015. In the talk, we will discuss the coding tools in HEVC,
research challenges addressing HEVC, and highlight some of our
current research related to HEVC and its 3D extension.
|
|
|
Koichi Shinoda, Tokyo
Institute of Technology, Japan |
Biography:
Koichi Shinoda received his B.S. in 1987 and his M.S. in 1989,
both in physics, from the University of Tokyo. He received his
D.Eng. in computer science from Tokyo Institute of Technology
in 2001. In 1989, he joined NEC Corporation and was involved
in research on automatic speech recognition. From 1997 to 1998,
he was a visiting scholar with Bell Labs, Lucent Technologies.
From 2001, he was an Associate Professor with the University
of Tokyo. He is currently a Professor at the Tokyo Institute
of Technology. His research interests include speech recognition,
video information retrieval, and human interfaces. Dr. Shinoda
received the Awaya Prize from the Acoustic Society of Japan
in 1997 and the Excellent Paper Award from the Institute of
Electronics, Information, and Communication Engineers (IEICE)
in 1998. He is an Associate Editor of Computer Speech and Language
and a Subject Editor of Speech Communication. He is a member
of IEEE, ACM, ASJ, IEICE, IPSJ, and JSAI.
Lectures:
Lecture 1: Speaker adaptation techniques for speech recognition
Speaker adaptation techniques were extensively studied in early
90's and has been still one of the essential techniques in automatic
speech recognition. They belong to one type of transfer learning,
in which the parameters of a speaker-independent model are modified
so that they fit the acoustic characteristics of an individual,
with a small amount of his/her utterances. These techniques
are successfully applied not only to the difference of speakers,
but also to those of channels, noise environments, and so on.
In this lecture, we first explain fundamental speaker adaptation
techniques, Maximum A Posteriori (MAP) estimation, Maximum Likelihood
Linear Regression (MLLR) , Eigenvoice, and then, how they are
combined with each other and with the other training techniques
such as discriminative learning and with structure learning,
such as Structural MAP (SMAP) or SMAPLR. We also discuss how
those techniques are applied to robust speech recognition under
noisy environment.
Lecture 2: Robust video information retrieval using speech
technologies
The amount of video data on the Internet has been rapidly increasing.
Those video have large variety and in most case with low quality.
Robust techniques for video indexing are strongly demanded.
In automatic video semantic indexing, a user submits a textual
input query for a desired object or a scene to a search system,
which returns video shots that include the object or scene.
In this application, many techniques developed in speech research
have been successfully employed. For example, a new method using
Gaussian-mixture model (GMM) supervectors and support vector
machines (SVMs) was recently proven to be very effective. In
this method, speech technologies such as speaker verification
and speaker adaptation techniques play very important roles.
In this lecture, we first introduce the activities of NIST TRECVID
workshop which is a showcase of the state-of-the-art video search
technologies, and then, discuss several techniques such as SIFT
and HOG features, Bag of Visual Words, Fisher kernel, Multi-modal
framework, and Fast tree search, to achieve robustness against
the variety of the Internet video.
|
|
|
Hsin-Min Wang,
Academia Sinica, Taiwan |
Biography:
Hsin-Min Wang received the B.S. and Ph.D. degrees in electrical
engineering from National Taiwan University in 1989 and 1995,
respectively.
In October 1995, he joined the Institute of Information Science,
Academia Sinica, where he is now a research fellow and deputy
director. He was an adjunct associate professor with National
Taipei University of Technology and National Chengchi University.
He currently serves as the president of the Association for
Computational Linguistics and Chinese Language Processing (ACLCLP),
a managing editor of Journal of Information Science and Engineering,
and an editorial board member of International Journal of Computational
Linguistics and Chinese Language Processing. His major research
interests include spoken language processing, natural language
processing, multimedia information retrieval, and pattern recognition.
Dr. Wang received the Chinese Institute of Engineers (CIE) Technical
Paper Award in 1995 and the ACM Multimedia Grand Challenge First
Prize in 2012. He is a life member of APSIPA, ACLCLP, and Institute
of Information & Computing Machinery (IICM), a senior member
of IEEE, and a member of the International Speech Communication
Association (ISCA) and ACM.
Lectures:
Lecture 1: Emotion-based Audio Music Annotation and Retrieval
One of the most exciting but challenging endeavors in music
research is to develop a computational model that comprehends
the affective content of music signals and organizes a music
collection according to emotion. Recently, we have proposed
a novel acoustic emotion Gaussians (AEG) model that defines
a proper generative process of emotion perception in music.
As a generative model, AEG permits easy and straightforward
interpretations of the model learning processes. To bridge the
acoustic feature space and music emotion space, a set of latent
feature classes, which are learned from data, is introduced
to perform the end-to-end semantic mappings between the two
spaces. Based on the space of latent feature classes, the AEG
model is applicable to both automatic music emotion annotation
and emotion-based music retrieval. This lecture will cover the
topics of an ACM Multimedia 2012 full paper and its several
related preceding and following conference papers.
Lecture 2: Social Tags-based Audio Music Annotation and Retrieval
Music tags correspond to keywords that people use to describe
different aspects of a music clip, such as genre, mood, and
instrument. With the explosive growth of digital music available
on the Web, automatic music tagging, which can be used to annotate
unknown music or retrieve desirable music, is becoming increasingly
important. Audio tag classification is one of the evaluation
tasks in the Music Information Retrieval Evaluation eXchange
(MIREX) annual campaign. We have achieved good performance since
our first participation in 2009. We have formulated the music
tagging task as a novel cost-sensitive multi-label (CSML) learning
problem. More recently, we have further developed a novel content-based
query-by-tag music search system for an untagged music database.
The new tag query interface allows users to input multiple tags
with multiple levels of preference (denoted as an MTML query)
by colorizing desired tags in a web-based tag cloud interface.
To effect the MTML content-based music retrieval, a probabilistic
fusion model (denoted as GMFM), which consists of two mixture
models, namely a Gaussian mixture model and a multinomial mixture
model, is used to jointly model the auditory features and tag
labels of a song. Two indexing methods and their corresponding
matching methods, namely pseudo song-based matching and tag
affinity-based matching, are incorporated into the pre-learned
GMFM. In this lecture, I will present our recent research results
on automatic music tagging and tags-based music retrieval.
|
|
|
|