Session Program Day 1
Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2022
Session | Room | Chair | |
TuAM1-1 (SS13:Advanced Topics on Sound Event and Scene Analysis) | Chiang Mai 1 | Nobutaka Ono, Keisuke Imoto, Tatsuya Komatsu | |
Date | Time | Title | Authors |
8 November 2022 | 10.35-10.55 | On Sorting and Padding Multiple Targets for Sound Event Localization and Detection With Permutation Invariant and Location-Based Training | Robin Scheibler; Tatsuya Komatsu; Yusuke Fujita; Michael Hentschel |
10.55-11.15 | How Information on Acoustic Scenes and Sound Events Mutually Benefits Event Detection and Scene Classification Tasks | Ami Igarashi; Keisuke Imoto; Yuka Komatsu; Shunsuke Tsubaki; Shuto Hario; Tatsuya Komatsu | |
11.15-11.35 | Compressed Sensing of Sparse Spectrum Using Distributed Sound-To-Light Conversion Device Blinkies | Satoshi Motoyama; Natsuki Ueno; Yuma Kinoshita; Nobutaka Ono | |
11.35-11.55 | CochlScene: Acquisition of Acoustic Scene Data Using Crowdsourcing | Il-Young Jeong; Jeongsoo Park | |
11.55-12.15 | Vision Transformer Based Audio Classification Using Patch-Level Feature Fusion | Juan Luo; Jielong Yang; Eng Siong Chng; Xionghu Zhong | |
12.15-12.35 | Self-Consistency Training With Hierarchical Temporal Aggregation for Sound Event Detection | Yunlong Li; Xiujuan Zhu; Mingyu Wang; Ying Hu | |
Session | Room | Chair | |
TuAM1-2 (Speech, Language, and Audio 1) | Chiang Mai 2 | Tomoki Toda | |
Date | Time | Title | Authors |
8 November 2022 | 10.35-10.55 | Music Similarity Calculation of Individual Instrumental Sounds Using Metric Learning | Yuka Hashizume; Li Li; Tomoki Toda |
10.55-11.15 | Investigation of Noise-Reverberation-Robustness of Modulation Spectral Features for Speech-Emotion Recognition | Taiyang Guo; Sixia Li; Masashi Unoki; Shogo Okada | |
11.15-11.35 | Combine Waveform and Spectral Methods for Single-Channel Speech Enhancement | Miao Li; Hui Zhang; Xueliang Zhang | |
11.35-11.55 | Perceptual Loss Function for Speech Enhancement Based on Generative Adversarial Learning | Xin Bai; Xueliang Zhang; Hui Zhang; Haifeng Huang | |
11.55-12.15 | Joint Speech Activity and Overlap Detection With Multi-Exit Architecture | Ziqing Du; Kai Liu; Xucheng Wan; Huan Zhou | |
Session | Room | Chair | |
TuAM1-3 (Human Biometrics and Security Systems) | Chiang Mai 3 | Jessada Karnjana | |
Date | Time | Title | Authors |
8 November 2022 | 10.35-10.55 | On Wrist Vein Recognition for Human Biometrics | Felix Marattukalam; David Cole; Pranav Gulati; Waleed H. Abdulla |
10.55-11.15 | Continuous Authentication on Unconstrained Activities Using Window and Cycle Based Segmentation | Lina Septiana; Narishige Abe; Tomoaki Matsunami; Hidetsugu Uchida; Kazuki Osamura; Shigefumi Yamada | |
11.15-11.35 | Smoothed Teager Energy Cepstral Feature for Replay Attack Detection on Voice Assistants | Madhu R Kamble; Anand Therattil; Hemant A. Patil; M. Ali Basha Shaik; Vikram Vij | |
11.35-11.55 | Disentangled Speaker Representation Learning via Mutual Information Minimization | Sung Hwan Mun; Min Hyun Han; Minchan Kim; Dongjune Lee; Nam Soo Kim | |
11.55-12.15 | Contribution of Timbre and Shimmer Features to Deepfake Speech Detection | Anuwat Chaiwongyen; Norranat Songsriboonsit; Suradej Duangpummet; Jessada Karnjana; Waree Kongprawechnon; Masashi Unoki | |
12.15-12.35 | Combined 2D and 3D Convolution Residual Attention Network for Hand Gesture Recognition | Chang-Ting Tsai; Jian-Jiun Ding | |
10.35-10.55 | On Wrist Vein Recognition for Human Biometrics | Felix Marattukalam; David Cole; Pranav Gulati; Waleed H. Abdulla | |
Session | Room | Chair | |
TuAM1-4 (Signal Image and Information Processing Theory and Methods) | Board Room 2 | Daranee Hormdee | |
Date | Time | Title | Authors |
8 November 2022 | 10.35-10.55 | Investigate Bidirectional Functional Brain Networks Using Directed Information | Qiang Li |
10.55-11.15 | Effective ASR Error Correction Leveraging Phonetic, Semantic Information and N-Best Hypotheses | Hsin-Wei Wang; Bi-Cheng Yan; Yi-Cheng Wang; Berlin Chen | |
11.15-11.35 | A Lossless Audio Codec Based on Hierarchical Residual Prediction | Taiyo Mineo; Hayaru Shouno | |
11.35-11.55 | Investigating Low-Distortion Speech Enhancement With Discrete Cosine Transform Features for Robust Speech Recognition | Yu-Sheng Tsao; Jeih-weih Hung; Kuan-Hsun Ho; Berlin Chen | |
11.55-12.15 | Consistent MDT-Tucker: A Hankel Structure Constrained Tucker Decomposition in Delay Embedded Space | Ryuki Yamamoto; Hidekata Hontani; Akira Imakura; Tatsuya Yokota | |
12.15-12.35 | Sound Reproduction With a Circular Loudspeaker Array Using Differential Beamforming Method | Yankai Zhang; Jiayi Mao; Yefeng Cai; Chao Ye | |
Session | Room | Chair | |
TuAM1-5 (SS01: Reconfigurable Computing and Performance Evaluation) | Board Room 3 | Ukrit Mankong | |
Date | Time | Title | Authors |
8 November 2022 | 10.35-10.55 | Design and System Implementation of a Configurable Optical Interconnection Network | Bowen Yang; Junyong Deng; Jiaying Luo; Yu Feng |
10.55-11.15 | 2S-AGCN Human Behavior Recognition Based on New Partition Strategy | Jin Wu; Lei Wang; Gege Chong; Haoran Feng | |
11.15-11.35 | Design of Optimal FIR Digital Filter by Swarm Optimization Technique | Jin Wu; Yaqiong Gao; Ling Yang; Zhengdong Su | |
11.35-11.55 | Design and Implementation of Reconfigurable Array Structure for Convolutional Neural Network Supporting Data Reuse | Rui Shan; Ziqing Huo; Xiaoshuo Li; Huan Chang; Rui Qin | |
11.55-12.15 | DBR: A Depth-Branch-Resorting Algorithm for Locality Exploration in Graph Processing | Lin Jiang; Ru Feng; Junjie Wang; Junyong Deng | |
12.15-12.35 | Performance Evaluation of Popularity-Aware Dynamic Clustering Scheme for Distributed Caching in ICN | Mikiya Yoshida; Yusuke Ito; Yurino Sato; Hiroyuki Koga | |
Session | Room | Chair | |
TuAM1-6 (SS03: Security Techniques of Speaker Recognition) | Chiang Mai 4 | Xiao-Lei Zhang | |
Date | Time | Title | Authors |
8 November 2022 | 10.35-10.55 | Masking Speech Feature to Detect Adversarial Examples for Speaker Verification | Xing Chen; Jiadi Yao; Xiao-Lei Zhang |
10.55-11.15 | F0 Modification via PV-TSM Algorithm for Speaker Anonymization Across Gender | Candy Olivia Mawalim; Shogo Okada; Masashi Unoki | |
11.15-11.35 | Pay Attention to Hard Trials | Lantian Li; Di Wang; Dong Wang | |
11.35-11.55 | A Multi-Task Framework of Speaker Recognition With TTS Data Augmentation | Xingjia Xie; Yiming Zhi; Beibei Ouyang; Qingyang Hong; Lin Li | |
11.55-12.15 | Source Tracing: Detecting Voice Spoofing | Tinglong Zhu; Xingming Wang; Xiaoyi Qin; Ming Li | |
12.15-12.35 | Replay Attack Detection Based on Voice and Non-Voice Sections for Speaker Verification | Ananda Garin Mills; Patthranit Kaewcharuay; Pannathorn Sathirasattayanon; Suradej Duangpummet; Kasorn Galajit; Jessada Karnjana; Pakinee Aimmanee | |
Session | Room | Chair | |
TuAM1-7 (Speech, Language, and Audio 2) | Chiang Mai 5 | Natthanan Promsuk | |
Date | Time | Title | Authors |
8 November 2022 | 10.35-10.55 | Learning Emotion Information for Expressive Speech Synthesis Using Multi-Resolution Modulation-Filtered Cochleagram | Kaili Zhang; Masashi Unoki |
10.55-11.15 | VocEmb4SVS: Improving Singing Voice Separation With Vocal Embeddings | Chenyi Li; Yi Li; Xuhao Du; Yaolong Ju; Shichao Hu; Zhiyong Wu | |
11.15-11.35 | Dialect-Aware Semi-Supervised Learning for End-To-End Multi-Dialect Speech Recognition | Sayaka Shiota; Ryo Imaizumi; Ryo Masumura; Hitoshi Kiya | |
11.35-11.55 | Design and Construction of Japanese Multimodal Utterance Corpus With Improved Emotion Balance and Naturalness | Daisuke Horii; Akinori Ito; Takashi Nose | |
11.55-12.15 | Non-Parallel Voice Conversion Based on Free-Energy Minimization of Speaker-Conditional Restricted Boltzmann Machine | Takuya Kishida; Toru Nakashika | |
12.15-12.35 | The TNT Team System Descriptions of Cantonese, Mongolian and Kazakh for IARPA OpenASR21 Challenge | Kai Tang; Jing Zhao; Jinghao Yan; Jian Kang; Haoyu Wang; Jinpeng Li; Shuzhou Chai; Guan-Bo Wang; Shen Huang; Guoguo Chen; Pengfei Hu; Wei-Qiang Zhang | |
Session | Room | Chair | |
TuAM1-8 (SS10: Real-world sensing technologies of human function) | Board Room 4 | Yumie Ono/Toshihisa Tanaka | |
Date | Time | Title | Authors |
8 November 2022 | 10.35-10.55 | Evaluation of Cognitive Test Results Using Concentration Estimation From Facial Videos | Terumi Umematsu; Masanori Tsujikawa; Hideyuki Sawada |
10.55-11.15 | Clustering of Advertising Images Using Electroencephalogram | Ingon Chanpornpakdi; Motoi Noda; Toshihisa Tanaka; Yuval Harpaz; Amir B. Geva | |
11.15-11.35 | Evaluation of Influence of Positions and Numbers of EEG Electrodes on Quantification of Independent Component Matrix | Ingon Chanpornpakdi; Ryohei Mizuochi; Maro G Machizawa | |
11.35-11.55 | Wearable Microfluidic Biosensor for Real-Time Sweat Content Monitoring | Hiroyuki Kudo; Yuto Goto | |
11.55-12.15 | Ear-EEG Based Eye State Classification Using Convolutional Neural Network | Chang-Hee Han; Han-Jeong Hwang | |
12.15-12.35 | Development of Virtual-Reality-Based Exergame for Lower-Extremity Rehabilitation of Stroke Patients | Mamiko Sasakawa; Daigo Ito; Ryo Ogura; Takanori Tominaga; Yumie Ono | |
Session | Room | Chair | |
TuPM1-1 ( Speech, Language, and Audio 1) | Chiang Mai 1 | Rohan Kumar Das | |
Date | Time | Title | Authors |
8 November 2022 | 15.20-15.40 | Is Your Baby Fine at Home? Baby Cry Sound Detection in Domestic Environments | Tanmay Khandelwal; Rohan Kumar Das; Eng-Siong Chng |
15.40-16.00 | Acoustic Echo and Noise Canceller Using Shared-Error Normalized Least Mean Square Algorithm | Kenta Iwai; Takanobu Nishiura | |
16.00-16.20 | Subband-Based Spectrogram Fusion for Speech Enhancement by Combining Mapping and Masking Approaches | Hao Shi; Longbiao Wang; Sheng Li; Jianwu Dang; Tatsuya Kawahara | |
16.20-16.40 | Neural Virtual Microphone Estimator: Application to Multi-Talker Reverberant Mixtures | Hanako Segawa; Tsubasa Ochiai; Marc Delcroix; Tomohiro Nakatani; Rintaro Ikeshita; Shoko Araki; Takeshi Yamada; Shoji Makino | |
16.40-17.00 | SE-Mixer: Towards an Efficient Attention-Free Neural Network for Speech Enhancement | Kai Wang; Bengbeng He; Wei-Ping Zhu | |
17.00-17.20 | How Should We Evaluate Synthesized Environmental Sounds | Yuki Okamoto; Keisuke Imoto; Shinnosuke Takamichi; Takahiro Fukumori; Yoichi Yamashita | |
17.20-17.40 | FeatureCut: An Adaptive Data Augmentation for Automated Audio Captioning | Zhongjie Ye; Yuqing Wang; Helin Wang; Dongchao Yang; Yuexian Zou | |
Session | Room | Chair | |
TuPM1-2 (Signal Processing Systems: Design and Implementation) | Chiang Mai 2 | Kasemsit Teeyapan | |
Date | Time | Title | Authors |
8 November 2022 | 15.20-15.40 | Robust Steerable Differential Beamformer for Concentric Circular Array With Directional Microphones | Weilong Huang; Jinwei Feng |
15.40-16.00 | A Deep Proximal-Unfolding Method for Monaural Speech Dereverberation | Meihuang Wang; Minmin Yuan; Andong Li; Chengshi Zheng; Xiaodong Li | |
16.00-16.20 | Speech Enhancement Using Self-Supervised Pre-Trained Model and Vector Quantization | Xiao-Ying Zhao; Qiu-Shi Zhu; Jie Zhang | |
16.20-16.40 | HouseX: A Fine-Grained House Music Dataset and Its Potential in the Music Industry | Xinyu Li | |
16.40-17.00 | Interpretable Control for Emotional Text-To-Speech System Toward Development of Sympathetic Educational-Support Robots | Jingyi Feng; Tomohiro Yoshikawa; Tomoki Toda | |
17.00-17.20 | Direction-Aware Target Speaker Extraction With a Dual-Channel System Based on Conditional Variational Autoencoders Under Underdetermined Conditions | Rui Wang; Li Li; Tomoki Toda | |
17.20-17.40 | LCN: Label Correction Based on Network Prediction for Cross-Modal Retrieval With Noisy Labels | Daiki Okamura; Ryosuke Harakawa; Masahiro Iwahashi | |
Session | Room | Chair | |
TuPM1-3 (Signal Image and Information Processing Theory and Methods) | Chiang Mai 3 | Tatsuya Yokota | |
Date | Time | Title | Authors |
8 November 2022 | 15.20-15.40 | Using Self-Learning Representations for Objective Assessment of Patient Voice in Dysphonia | Shaoxiang Dang; Tetsuya Matsumoto; Yoshinori Takeuchi; Hiroaki Kudo; Takashi Tsuboi; Yasuhiro Tanaka; Masahisa Katsuno |
15.40-16.00 | Fast Signal Completion Algorithm With Cyclic Convolutional Smoothing | Hiromu Takayama; Tatsuya Yokota | |
16.00-16.20 | Single-Channel Speech Enhancement Student Under Multi-Channel Speech Enhancement Teacher | Yuzhu Zhang; Hui Zhang; Xueliang Zhang | |
16.20-16.40 | Distance-Based Dynamic Weight: A Novel Framework for Multi-Source Information Fusion | Cuiping Cheng; Xiaoning Zhang; Taihao Li | |
16.40-17.00 | Improvement of the Direction-Of-Arrival Estimation Method Using a Single Channel Microphone by Correcting a Spectral Slope of Speech | Masaki Ikeuchi; Hiroki Tanji; Takahiro Murakami | |
17.00-17.20 | Studying Human-Based Speaker Diarization and Comparing to State-Of-The-Art Systems | Simon W. McKnight; Aidan O. T. Hogg; Vincent W. Neo; Patrick A. Naylor | |
17.20-17.40 | Optimization of CU Partition Based on Texture Degree in H.266/VVC | Jingyuan Tang; Songlin Sun | |
Session | Room | Chair | |
TuPM1-4 (SS02: Deep Learning Systems and Applications for Cloud, Fog, and Edge) | Board Room 2 | Jia-Ching Wang | |
Date | Time | Title | Authors |
8 November 2022 | 15.20-15.40 | Selection of Supplementary Acoustic Data for Meta-Learning in Under-Resourced Speech Recognition | I-Ting Hsieh; Chung-Hsien Wu; Zhe-Hong Zhao |
15.40-16.00 | Using Prosodic Phrase-Based VQVAE on Audio ALBERT for Speech Emotion Recognition | Jia-Hao Hsu; Chung-Hsien Wu; Tsung-Hsien Yang | |
16.00-16.20 | ESPnet-ONNX: Bridging a Gap Between Research and Production | Masao Someki; Yosuke Higuchi; Tomoki Hayashi; Shinji Watanabe | |
16.20-16.40 | Multi-Loss Function in Robust Convolutional Autoencoder for Reconstruction Low-Quality Fingerprint Image | Farchan Hakim Raswa; Franki Halberd; Agus Harjoko; Wahyono; Chung-Ting Lee; Yung-Hui Li; Jia Ching Wang | |
Session | Room | Chair | |
TuPM1-5 (Research Review) | Board Room 3 | Jesin James | |
Date | Time | Title | Authors |
8 November 2022 | 15.20-15.40 | EmotionGUI: Visualisation and Annotation of Emotions in a 2D Space for Multi-Modal Signals | Jesin James; Felix Marattukalam; Owen Eng; Aron Jeremiah |
15.40-16.00 | Enhancing the Performance of Automatic Speech Recognition With Optical Microphone Technology Through Data Augmentation Approach: A Pilot Study | Ruei-Ci Shen; Ji-Yan Han; Ying-Hui Lai | |
16.00-16.20 | Process Monitoring Based on Nearest Correlation and Variational Graph Auto-Encoder and Its Application to Tennessee Eastman Process | Yoshiaki Uchida; Koichi Fujiwara | |
16.20-16.40 | Decoding of Individual Emotions Induced During Interaction With Voice-User Interface Using Electroencephalography | Jun-Seok Lee, Ga-Young Choi, Ji-Yoon Lee, Jong-Gyu Shin, Sang-Ho Kim, Han-Jeong Hwang | |
16.40-17.00 | Leverage Limited Features of Partial Fingerprint Recognition Using Improved Siamese Network With Self-Spatial Attention | Farchan Hakim Raswa, Franki Halberd, Agus Harjoko, Chung-Ting Lee, Yung-Hui Li, Pao-Chi Chang, Jia-Ching Wang | |
17.00-17.20 | Design and Signal Analysis of a Compact Antenna for UWB MIMO Systems | Long Jin; Yangmiao Lin; Iickho Song; Ruohan Zhang | |
17.20-17.40 | A Filtered-x Active Noise Control Algorithm Robust to Impulsive Noise Using Novel Subband Adaptive Filter Algorithm | Chan Park; Minho Lee; PooGyeon Park | |
Session | Room | Chair | |
TuPM1-6 (Speech, Language, and Audio 2) | Chiang Mai 4 | Christian H Ritz | |
Date | Time | Title | Authors |
8 November 2022 | 15.20-15.40 | Neural Conversational Speech Synthesis With Flexible Control of Emotion Dimensions | Hiroki Mori; Hironao Nishino |
15.40-16.00 | Temporal Feedback Convolutional Recurrent Neural Networks for Speech Command Recognition | Taejun Kim; Juhan Nam | |
16.20-16.40 | Impact of Compression on the Performance of the Room Impulse Response Interpolation Approach to Spatial Audio Synthesis | Hualin Ren; Christian Ritz; Jiahong Zhao; Daeyoung Jang | |
16.40-17.00 | Machine Anomalous Sound Detection Based on Self-Supervised Classification | Shuxian Wang; Jun Du; Yajian Wang | |
17.00-17.20 | A Study on Low-Latency Recognition-Synthesis-Based Any-To-One Voice Conversion | Yi-Yang Ding; Li-Juan Liu; Yu Hu; Zhen-Hua Ling | |
17.20-17.40 | Speech Enhancement With Perceptually-Motivated Optimization and Dual Transformations | Xucheng Wan; Kai Liu; Ziqing Du; Huan Zhou | |
Session | Room | Chair | |
TuPM1-7 (SS12: Advanced signal detection and inspection technology) | Chiang Mai 5 | Settha Tangkawanit | |
Date | Time | Title | Authors |
8 November 2022 | 15.20-15.40 | Automatic Sound Detection and Notification System Using MFCC | Jaruwat Patmanee; Prapatson Kotipang; Pawarisorn Sinpeang; Surachet Kanprachar; Settha Tangkawanit |
15.40-16.00 | Sound Identification Using MFCC With Machine Learning | Pattarapong Kammee; Chairat Pinthong; Surachet Kanprachar; Settha Tangkawanit | |
16.20-16.40 | Direct-Lattice Adaptive Notch Filter for Frequency Estimation and Tracking | Prayuth Inban; Rachu Punchalard; Chawalit Benjangkaprasert | |
16.40-17.00 | Distance Estimation Between Camera and Vehicles From an Image Using YOLO and Machine Learning | Rattapoom Waranusast; Panomkhawn Riyamongkol; Pattanawadee Pattanathaburt | |
17.00-17.20 | OCR Application for Cancer Care | Settha Tangkawanit; Jiraporn Pooksook; Jirarat Ieamsaard; Panupong Sornkhom | |
17.20-17.40 | The Development of Mobile Application for Assisting COVID-19 Antigen Test Kit Results Reading | Rattapoom Waranusast; Pattanawadee Pattanathaburt | |
17.40 - 18.00 | Matched Filter Detector for Textile Fiber Classification of Signals With Near-Infrared Spectrum | Suchart Yammen; Wachira Limsripraphan | |
CONFERENCE FORMAT
The conference is planned to be in presence. However, if there are some travel restrictions for some authors at the time, we will allow them to upload their videos for the oral presentation. The presenter must attend the session online for Q&A. This will however mean that there will be no live streaming of the conference presentations, as in the hybrid conference. For more information please contact: apsipa2022@gmail.com