Working Technical Program

Please note that this is not a final version and some changes may occur in the final program.

 

Date

Time

Room

Session name

Presentation type

Paper code

PaperID

Paper title

Authors

2017-08-21 11:00-13:00 Poster 1 Speech analysis and representation 2 Poster Mon-P-1-1-1 67 Low-dimensional representation of spectral envelope without deterioration for full-band speech analysis/synthesis system Masanori Morise, Kenji Ozawa, Genta Miayashita
2017-08-21 11:00-13:00 Poster 1 Speech analysis and representation 2 Poster Mon-P-1-1-2 210 Robust Source-Filter Separation of Speech Signal in the Phase Domain Erfan Loweimi, Jon Barker, Oscar Saz Torralba, Thomas Hain
2017-08-21 11:00-13:00 Poster 1 Speech analysis and representation 2 Poster Mon-P-1-1-3 382 A Time-Warping Pitch Tracking Algorithm considering fast f0 changes Simon Stone, Peter Steiner, Peter Birkholz
2017-08-21 11:00-13:00 Poster 1 Speech analysis and representation 2 Poster Mon-P-1-1-4 436 A modulation property of time-frequency derivatives of filtered phase and its application to aperiodicity and FO estimation Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda
2017-08-21 11:00-13:00 Poster 1 Speech analysis and representation 2 Poster Mon-P-1-1-5 624 Non-Local Estimation of Speech Signal for Vowel Onset Point Detection in Varied Environments Avinash Kumar, Syed Shahnawazuddin, Gayadhar Pradhan
2017-08-21 11:00-13:00 Poster 1 Speech analysis and representation 2 Poster Mon-P-1-1-6 678 Time-domain envelope modulating the noise component of excitation in a continuous residual-based vocoder for statistical parametric speech synthesis Mohammed Salah Al-Radhi, Tamás Gábor Csapó, Géza Németh
2017-08-21 11:00-13:00 Poster 1 Speech analysis and representation 2 Poster Mon-P-1-1-7 781 Wavelet Speech Enhancement Based on Robust Principal Component Analysis Chia-Lung Wu, Hsiang-Ping Hsu, Syu-Siang Wang, Jeih-weih Hung, Ying-Hui Lai, Hsin-Min Wang, Yu Tsao
2017-08-21 11:00-13:00 Poster 1 Speech analysis and representation 2 Poster Mon-P-1-1-8 790 Vowel Onset Point Detection using Sonority Information Bidisha Sharma, S R Mahadeva Prasanna
2017-08-21 11:00-13:00 Poster 1 Speech analysis and representation 2 Poster Mon-P-1-1-9 1232 Analytic Filter Bank for Speech Analysis, Feature Extraction and Perceptual Studies Unto K. Laine
2017-08-21 11:00-13:00 Poster 1 Speech analysis and representation 2 Poster Mon-P-1-1-10 1422 Novel Shifted Real Spectrum for Exact Signal Reconstruction Meet Soni, Rishabh Tak, Hemant Patil
2017-08-21 11:00-13:00 Poster 1 Speech analysis and representation 2 Poster Mon-P-1-1-11 1681 Learning the mapping function from voltage amplitudes to sensor positions in 3D-EMA using deep neural networks Christian Kroos, Mark D. Plumbley
2017-08-21 14:30-14:50 E10 Far-field Speech Recognition Oral Mon-O-2-10-1 1510 Generation of simulated utterances in virtual rooms to train deep-neural networks for far-field speech recognition in Google Home Chanwoo Kim, Ananya Misra, K.K. Chin, Thad Hughes, Arun Narayanan, Tara Sainath, Michiel Bacchiani
2017-08-21 14:50-15:10 E10 Far-field Speech Recognition Oral Mon-O-2-10-2 733 Neural network-based spectrum estimation for online WPE dereverberation Keisuke Kinoshita, Marc Delcroix, Haeyong Kwon, Takuma Mori, Tomohiro Nakatani
2017-08-21 15:10-15:30 E10 Far-field Speech Recognition Oral Mon-O-2-10-3 852 Factorial Modeling for Effective Suppression of Directional Noise Osamu Ichikawa, Takashi Fukuda, Gakuto Kurata, Steven Rennie
2017-08-21 15:30-15:50 E10 Far-field Speech Recognition Oral Mon-O-2-10-4 853 On Design of Robust Deep Models for CHiME-4 Multi-Channel Speech Recognition with Multiple Configurations of Array Microphones Yan-Hui Tu, Jun Du, Lei Sun, Feng Ma, Chin-Hui Lee
2017-08-21 15:50-16:10 E10 Far-field Speech Recognition Oral Mon-O-2-10-5 234 Acoustic Modeling for Google Home Bo Li, Tara Sainath, Joe Caroselli, Arun Narayanan, Michiel Bacchiani, Ananya Misra, Izhak Shafran, Hasim Sak, Golan Pundak, K.K. Chin, Khe Chai Sim, Ron Weiss, Kevin Wilson, Ehsan Variani, Chanwoo Kim, Olivier Siohan, Mitchell Weintraub, Erik McDermott, Richard Rose, Matt Shannon
2017-08-21 16:10-16:30 E10 Far-field Speech Recognition Oral Mon-O-2-10-6 398 On multi-domain training and adaptation of end-to-end RNN acoustic models for distant speech recognition Seyedmahdad Mirsamadi, John H.L. Hansen
2017-08-21 11:00-11:20 E10 Multimodal and Articulatory Synthesis Oral Mon-O-1-10-1 325 The Influence of Synthetic Voice on the Evaluation of a Virtual Character Joao Cabral, Benjamin Cowan, Katja Zibrek, Rachel McDonnell
2017-08-21 11:20-11:40 E10 Multimodal and Articulatory Synthesis Oral Mon-O-1-10-2 900 Articulatory Text-to-Speech Synthesis using the Digital Waveguide Mesh driven by a Deep Neural Network Amelia Gully, Takenori Yoshimura, Damian Murphy, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda
2017-08-21 11:40-12:00 E10 Multimodal and Articulatory Synthesis Oral Mon-O-1-10-3 936 An HMM/DNN comparison for synchronized text-to-speech and tongue motion synthesis Sébastien Le Maguer, Ingmar Steiner, Alexander Hewer
2017-08-21 12:00-12:20 E10 Multimodal and Articulatory Synthesis Oral Mon-O-1-10-4 1410 VCV Synthesis using Task Dynamics to Animate a Factor-based Articulatory Model Rachel Alexander, Tanner Sorensen, Asterios Toutios, Shrikanth Narayanan
2017-08-21 12:20-12:40 E10 Multimodal and Articulatory Synthesis Oral Mon-O-1-10-5 1438 Beyond the Listening Test: An interactive approach to TTS Evaluation Joseph Mendelson, Matthew Aylett
2017-08-21 12:40-13:00 E10 Multimodal and Articulatory Synthesis Oral Mon-O-1-10-6 1762 Integrating Articulatory Information into Deep Learning-Based Text-to-Speech Synthesis Beiming Cao, Myungjong Kim, Jan van Santen, Ted Mau, Jun Wang
2017-08-21 14:30-16:30 Poster 2 Speech Production and Perception Poster Mon-P-2-2-1 742 Critical articulators identification from RT-MRI of the vocal tract Samuel Silva, António Teixeira
2017-08-21 14:30-16:30 Poster 2 Speech Production and Perception Poster Mon-P-2-2-2 1580 Semantic Edge Detection for Tracking Vocal Tract Air-tissue Boundaries in Real-time Magnetic Resonance Images Krishna Somandepalli, Asterios Toutios, Shrikanth Narayanan
2017-08-21 14:30-16:30 Poster 2 Speech Production and Perception Poster Mon-P-2-2-3 1016 Vocal Tract Airway Tissue Boundary Tracking for rtMRI using Shape and Appearance Priors Sasan Asadiabadi, Engin Erzin
2017-08-21 14:30-16:30 Poster 2 Speech Production and Perception Poster Mon-P-2-2-4 201 An objective critical distance measure based on the relative level of spectral valley Ananthapadmanabha T V, Ramakrishnan Angarai Ganesan, Shubham Sharma
2017-08-21 14:30-16:30 Poster 2 Speech Production and Perception Poster Mon-P-2-2-5 608 Database of volumetric and real-time vocal tract MRI for speech science Tanner Sorensen, Zisis Iason Skordilis, Asterios Toutios, Yoon-Chul Kim, Yinghua Zhu, Jangwon Kim, Adam Lammert, Vikram Ramanarayanan, Louis Goldstein, Dani Byrd, Krishna Nayak, Shrikanth Narayanan
2017-08-21 14:30-16:30 Poster 2 Speech Production and Perception Poster Mon-P-2-2-6 1267 The Influence on Realization and Perception of Lexical Tones from Affricate’s Aspiration Chong Cao, Yanlu Xie, Qi Zhang, Jinsong Zhang
2017-08-21 14:30-16:30 Poster 2 Speech Production and Perception Poster Mon-P-2-2-7 122 Audiovisual recalibration of vowel categories Matthias Franken, Frank Eisner, Jan-Mathijs Schoffelen, Dan Acheson, Peter Hagoort, James McQueen
2017-08-21 14:30-16:30 Poster 2 Speech Production and Perception Poster Mon-P-2-2-8 194 The effect of gesture on persuasive speech Judith Peters, Marieke Hoetjes
2017-08-21 14:30-16:30 Poster 2 Speech Production and Perception Poster Mon-P-2-2-9 1069 Auditory-visual integration of talker gender in Cantonese tone perception Wei Lai
2017-08-21 14:30-16:30 Poster 2 Speech Production and Perception Poster Mon-P-2-2-10 139 Event-related potentials associated with somatosensory effect in audio-visual speech perception Takayuki Ito, Hiroki Ohashi, Eva Montas, Vincent Gracco
2017-08-21 14:30-16:30 Poster 2 Speech Production and Perception Poster Mon-P-2-2-11 353 When a dog is a cat and how it changes your pupil size: Pupil dilation in response to information mismatch Lena F. Renner, Marcin Wlodarczak
2017-08-21 14:30-16:30 Poster 2 Speech Production and Perception Poster Mon-P-2-2-12 1236 Cross-modal Analysis between Phonation Differences and Texture Images based on Sentiment Correlations Win Thuzar Kyaw, Yoshinori Sagisaka
2017-08-21 14:30-16:30 Poster 2 Speech Production and Perception Poster Mon-P-2-2-13 48 Wireless neck-surface accelerometer and microphone on flex circuit with application to noise-robust monitoring of Lombard speech Daryush Mehta, Patrick Chwalek, Thomas Quatieri, Laura Brattain
2017-08-21 14:30-16:30 Poster 2 Speech Production and Perception Poster Mon-P-2-2-14 1371 Video-based tracking of jaw movements during speech: Preliminary results and future directions Andrea Bandini, Aravind Namasivayam, Yana Yunusova
2017-08-21 14:30-16:30 Poster 2 Speech Production and Perception Poster Mon-P-2-2-15 1374 Accurate Synchronization of Speech and EGG signal using Phase Information Sunil Kumar S B, K Sreenivasa Rao, Tanumay Mandal
2017-08-21 14:30-14:50 Main hall Neural networks for language modeling Oral Mon-O-2-1-1 1310 Approaches for Neural-Network Language Model Adaptation Min Ma, Michael Nirschl, Fadi Biadsy, Shankar Kumar
2017-08-21 14:50-15:10 Main hall Neural networks for language modeling Oral Mon-O-2-1-2 818 A Batch Noise Contrastive Estimation Approach for Training Large Vocabulary Language Models Youssef Oualil, Dietrich Klakow
2017-08-21 15:10-15:30 Main hall Neural networks for language modeling Oral Mon-O-2-1-3 513 Investigating Bidirectional Recurrent Neural Network Language Models for Speech Recognition Xie Chen, Anton Ragni, Xunying Liu, Mark Gales
2017-08-21 15:30-15:50 Main hall Neural networks for language modeling Oral Mon-O-2-1-4 564 FAST NEURAL NETWORK LANGUAGE MODEL LOOKUPS AT N-GRAM SPEEDS Yinghui Huang, Abhinav Sethy, Bhuvana Ramabhadran
2017-08-21 15:50-16:10 Main hall Neural networks for language modeling Oral Mon-O-2-1-5 723 Empirical Exploration of Novel Architectures and Objectives for Language Models Gakuto Kurata, Abhinav Sethy, Bhuvana Ramabhadran, George Saon
2017-08-21 16:10-16:30 Main hall Neural networks for language modeling Oral Mon-O-2-1-6 1442 Residual Memory Networks in Language Modeling: Improving the Reputation of Feed-Forward Networks Karel Beneš, Murali Baskar, Lukáš Burget
2017-08-21 14:30-14:50 C6 Perception of dialects and L2 Oral Mon-O-2-6-1 1031 End-to-End Acoustic Feedback in Language Learning for Correcting Devoiced French Final-Fricatives Sucheta Ghosh, Camille Fauth, Yves Laprie, Aghilas Sini
2017-08-21 14:50-15:10 C6 Perception of dialects and L2 Oral Mon-O-2-6-2 18 Dialect perception by older children Ewa Jacewicz, Robert A. Fox
2017-08-21 15:10-15:30 C6 Perception of dialects and L2 Oral Mon-O-2-6-3 207 Perception of non-contrastive variations in American English by Japanese learners: Flaps are less favored than stops Kiyoko Yoneyama, Mafuyu Kitahara, Keiichi Tajima
2017-08-21 15:30-15:50 C6 Perception of dialects and L2 Oral Mon-O-2-6-4 1150 How L1 speakers perceive L2 prosody: The cumulative effect of intonation, rhythm, and speech rate on accentedness and comprehensibility ratings Lieke van Maastricht, Tim Zee, Emiel Krahmer, Marc Swerts
2017-08-21 15:50-16:10 C6 Perception of dialects and L2 Oral Mon-O-2-6-5 763 Effects of Pitch Fall and L1 on Vowel Length Identification in L2 Japanese IZUMI TAKIGUCHI
2017-08-21 16:10-16:30 C6 Perception of dialects and L2 Oral Mon-O-2-6-6 1210 A Preliminary Study of Prosodic Disambiguation by Chinese EFL Learners Yuanyuan Zhang, Hongwei Ding
2017-08-21 11:00-13:00 D8 Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge I Special Session Mon-SS-1-8-1 1111 The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection Tomi Kinnunen, Md Sahidullah, Héctor Delgado, Massimiliano Todisco, Nicholas Evans, Junichi Yamagishi, Kong Aik Lee
2017-08-21 11:00-13:00 D8 Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge I Special Session Mon-SS-1-8-2 1085 ResNet and Model Fusion for Automatic Spoofing Detection Zhuxin Chen, Zhifeng Xie, Weibin Zhang, Xiangmin Xu
2017-08-21 11:00-13:00 D8 Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge I Special Session Mon-SS-1-8-3 360 Audio replay attack detection with deep learning frameworks Galina Lavrentyeva, Sergey Novoselov, Egor Malykh, Alexandr Kozlov, Oleg Kudashev, Vadim Shchemelinin
2017-08-21 11:00-13:00 D8 Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge I Special Session Mon-SS-1-8-4 450 Experimental analysis of features for replay attack detection-Results on the ASVspoof 2017 Challenge Roberto Javier Font Ruiz, María José Cano Vicente, Juan Manuel Espín López
2017-08-21 11:00-13:00 D8 Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge I Special Session Mon-SS-1-8-5 906 Countermeasures for Automatic Speaker Verification Replay Spoofing Attack : On Data Augmentation, Feature Representation, Classification and Fusion Weicheng Cai, Danwei Cai, Wenbo Liu, Ming Li
2017-08-21 11:00-13:00 D8 Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge I Special Session Mon-SS-1-8-6 776 Audio Replay Attack Detection with High-Frequency Features Marcin Witkowski, Stanisław Kacprzak, Piotr Żelasko, Konrad Kowalczyk, Jakub Gałka
2017-08-21 11:00-13:00 D8 Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge I Special Session Mon-SS-1-8-7 1362 Novel Variable Length Teager Energy Separation Based Instantaneous Frequency Features for Replay Detection Hemant Patil, Madhu Kamble, Tanvina Patel, Meet Soni
2017-08-21 11:00-13:00 D8 Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge I Special Session Mon-SS-1-8-8 1246 Ensemble learning for countermeasure of audio replay spoofing attack in ASVspoof2017 Zhe Ji, Zhi-Yi Li, Peng Li, Maobo An, Shengxiang Gao, Dan Wu, Faru Zhao
2017-08-21 11:00-13:00 D8 Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge I Special Session Mon-SS-1-8-9 676 SFF Anti-Spoofer: IIIT-H Submission for Automatic Speaker Verification Spoofing and Countermeasures Challenge 2017 K N R K Raju Alluri, Sivanand Achanta, Sudarsana Reddy Kadiri, Suryakanth V Gangashetty, Anil Kumar Vuppala
2017-08-21 11:00-13:00 D8 Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge I Special Session Mon-SS-1-8-10 1377 Replay Attack Detection using DNN for Channel Discrimination Parav Nagarsheth, Elie Khoury, Kailash Patil, Matt Garland
2017-08-21 11:00-13:00 D8 Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge I Special Session Mon-SS-1-8-11 304 Feature selection based on CQCCs for Automatic Speaker Verification spoofing Wang Xianliang, Xiao Yanhong, Zhu Xuan
2017-08-21 11:00-13:00 D8 Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge I Special Session Mon-SS-1-8-12 930 Spoof Detection Using Source, Instantaneous Frequency and Cepstral Features Sarfaraz Jelil, Rohan Kumar Das, S R Mahadeva Prasanna, Rohit Sinha
2017-08-21 11:00-13:00 D8 Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge I Special Session Mon-SS-1-8-13 456 A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification Lantian Li, Yixiang Chen, Dong Wang, Thomas Fang Zheng
2017-08-21 11:00-13:00 Poster 4 Search, Computational Strategies and Language Modeling Poster Mon-P-1-4-1 1671 Rescoring-aware Beam Search for Reduced Search Errors in Contextual Automatic Speech Recognition Ian Williams, Petar Aleksic
2017-08-21 11:00-13:00 Poster 4 Search, Computational Strategies and Language Modeling Poster Mon-P-1-4-2 1683 Comparison of Different Decoding Strategies for CTC Acoustic Models Thomas Zenkel, Ramon Sanabria, Florian Metze, Jan Niehues, Matthias Sperber, Sebastian Stüker, Alex Waibel
2017-08-21 11:00-13:00 Poster 4 Search, Computational Strategies and Language Modeling Poster Mon-P-1-4-3 1680 Phone duration modeling for LVCSR using neural networks Hossein Hadian, Daniel Povey, Hossein Sameti, Sanjeev Khudanpur
2017-08-21 11:00-13:00 Poster 4 Search, Computational Strategies and Language Modeling Poster Mon-P-1-4-4 343 Towards better decoding and language model integration in sequence to sequence models Jan Chorowski, Navdeep Jaitly
2017-08-21 11:00-13:00 Poster 4 Search, Computational Strategies and Language Modeling Poster Mon-P-1-4-5 547 Empirical Evaluation of Parallel Training Algorithms on Acoustic Modeling Wenpeng Li, Binbin Zhang, Lei Xie, Dong Yu
2017-08-21 11:00-13:00 Poster 4 Search, Computational Strategies and Language Modeling Poster Mon-P-1-4-6 1343 Binary Deep Neural Networks for Speech Recognition Xu Xiang, Yanmin Qian, Kai Yu
2017-08-21 11:00-13:00 Poster 4 Search, Computational Strategies and Language Modeling Poster Mon-P-1-4-7 1583 Hierarchical Constrained Bayesian Optimization for Joint Feature, Acoustic Model and Decoder Parameter Optimization Akshay Chandrashekaran, Ian Lane
2017-08-21 11:00-13:00 Poster 4 Search, Computational Strategies and Language Modeling Poster Mon-P-1-4-8 717 Use of Global and Acoustic Features Associated with Contextual Factors to Adapt Language Models for Spontaneous Speech Recognition Shohei Toyama, Daisuke Saito, Nobuaki Minematsu
2017-08-21 11:00-13:00 Poster 4 Search, Computational Strategies and Language Modeling Poster Mon-P-1-4-9 1247 Joint Learning of Correlated Sequence Labeling Tasks Using Bidirectional Recurrent Neural Networks Vardaan Pahuja, Anirban Laha, Shachar Mirkin, Vikas Raykar, Lili Kotlerman, Guy Lev
2017-08-21 11:00-13:00 Poster 4 Search, Computational Strategies and Language Modeling Poster Mon-P-1-4-10 729 Estimation of Gap Between Current Language Models and Human Performance Xiaoyu Shen, Youssef Oualil, Clayton Greenberg, Mittul Singh, Dietrich Klakow
2017-08-21 11:00-13:00 Poster 4 Search, Computational Strategies and Language Modeling Poster Mon-P-1-4-11 204 A phonological phrase sequence modelling approach for resource efficient and robust real-time punctuation recovery Anna Moró, György Szaszák
2017-08-21 14:30-16:30 Poster 4 Prosody and Text Processing Poster Mon-P-2-4-1 35 An RNN Model of Text Normalization Richard Sproat, Navdeep Jaitly
2017-08-21 14:30-16:30 Poster 4 Prosody and Text Processing Poster Mon-P-2-4-2 487 Weakly-Supervised Phrase Assignment from Text in a Speech-Synthesis System Using Noisy Labels Asaf Rendel, Raul Fernandez, Zvi Kons, Andrew Rosenberg, Ron Hoory, Bhuvana Ramabhadran
2017-08-21 14:30-16:30 Poster 4 Prosody and Text Processing Poster Mon-P-2-4-3 521 Prosody Aware Word-level Encoder Based on BLSTM-RNNs for DNN-based Speech Synthesis Yusuke Ijima, Nobukatsu Hojo, Ryo Masumura, Taichi Asami
2017-08-21 14:30-16:30 Poster 4 Prosody and Text Processing Poster Mon-P-2-4-4 669 Global Syllable Vectors for Building TTS Front-End with Deep Learning Jinfu Ni, Yoshinori Shiga, Hisashi Kawai
2017-08-21 14:30-16:30 Poster 4 Prosody and Text Processing Poster Mon-P-2-4-5 708 Prosody Control of Utterance Sequence for Information Delivering Ishin Fukuoka, Kazuhiko Iwata, Tetsunori Kobayashi
2017-08-21 14:30-16:30 Poster 4 Prosody and Text Processing Poster Mon-P-2-4-6 949 Multi-Task Learning for Prosodic Structure Generation using BLSTM RNN with Structured Output Layer Yuchen Huang, Zhiyong Wu, Runnan Li, Helen Meng, Lianhong Cai
2017-08-21 14:30-16:30 Poster 4 Prosody and Text Processing Poster Mon-P-2-4-7 1086 Investigating Efficient Feature Representation Methods and Training Objective for BLSTM-Based Phone Duration Prediction Yibin Zheng, Jianhua Tao, Zhengqi Wen, Ya Li, Bin Liu
2017-08-21 14:30-16:30 Poster 4 Prosody and Text Processing Poster Mon-P-2-4-8 1144 Discrete Duration Model For Speech Synthesis Bo Chen, Tianling Bian, Kai Yu
2017-08-21 14:30-16:30 Poster 4 Prosody and Text Processing Poster Mon-P-2-4-9 1152 Comparison of Modeling Target in LSTM-RNN Duration Model Bo Chen, Jiahao Lai, Kai Yu
2017-08-21 14:30-16:30 Poster 4 Prosody and Text Processing Poster Mon-P-2-4-10 1340 Learning word vector representations based on acoustic counts Manuel Sam Ribeiro, Oliver Watts, Junichi Yamagishi
2017-08-21 14:30-16:30 Poster 4 Prosody and Text Processing Poster Mon-P-2-4-11 1507 Synthesising uncertainty: the interplay of vocal effort and hesitation disfluencies Eva Szekely, Joseph Mendelson, Joakim Gustafson
2017-08-21 14:30-14:50 B4 Speech analysis and representation 1 Oral Mon-O-2-4-1 1179 Phone Classification using a Non-Linear Manifold with Broad Phone Class Dependent DNNs Linxue Bai, Peter Jancovic, Martin Russell, Philip Weber, Steve Houghton
2017-08-21 14:50-15:10 B4 Speech analysis and representation 1 Oral Mon-O-2-4-2 70 An Investigation of Crowd Speech for Room Occupancy Estimation Siyuan Chen, Julien Epps, Eliathamby Ambikairajah, Phu Le
2017-08-21 15:10-15:30 B4 Speech analysis and representation 1 Oral Mon-O-2-4-3 726 Time-Frequency Coherence for Periodic-Aperiodic Decomposition of Speech Signals Karthika Vijayan, Jitendra Dhiman, Chandra Sekhar Seelamantula
2017-08-21 15:30-15:50 B4 Speech analysis and representation 1 Oral Mon-O-2-4-4 316 Musical Speech: a New Methodology for Transcribing Speech Prosody Alexsandro Meireles, Antônio Simões, Antonio Celso Ribeiro, Beatriz Raposo de Medeiros
2017-08-21 15:50-16:10 B4 Speech analysis and representation 1 Oral Mon-O-2-4-5 1074 Estimation of Place of Articulation of Fricatives from Spectral Characteristics for Speech Training K S Nataraj, Prem C. Pandey, Hirak Dasgupta
2017-08-21 16:10-16:30 B4 Speech analysis and representation 1 Oral Mon-O-2-4-6 389 Estimation of the Probability Distribution of Spectral Fine Structure in the Speech Source Tom Bäckström
2017-08-21 11:00-11:20 C6 Acoustic and articulatory phonetics Oral Mon-O-1-6-1 1601 Phonetic Correlates of Pharyngeal and Pharyngealized Consonants in Saudi, Lebanese, and Jordanian Arabic: an rt-MRI Study Zainab Hermes, Marissa Barlaz, Ryan Shosted, Zhi-Pei Liang, Brad Sutton
2017-08-21 11:20-11:40 C6 Acoustic and articulatory phonetics Oral Mon-O-1-6-2 1039 Glottal opening and strategies of production of fricatives Benjamin Elie, Yves Laprie
2017-08-21 11:40-12:00 C6 Acoustic and articulatory phonetics Oral Mon-O-1-6-3 1292 Acoustics and articulation of medial versus final coronal stop gemination contrasts in Moroccan Arabic Mohamed Yassine Frej, Christopher Carignan, Catherine T. Best
2017-08-21 12:00-12:20 C6 Acoustic and articulatory phonetics Oral Mon-O-1-6-4 1553 How are four-level length distinctions produced? Evidence from Moroccan Arabic Giuseppina Turco, Karim Shoul, Rachid Ridouane
2017-08-21 12:20-12:40 C6 Acoustic and articulatory phonetics Oral Mon-O-1-6-5 1552 Vowels in the Barunga variety of North Australian Kriol Caroline Jones, Katherine Demuth, Weicong Li, Andre Almeida
2017-08-21 12:40-13:00 C6 Acoustic and articulatory phonetics Oral Mon-O-1-6-6 1304 Nature of contrast and coarticulation: Evidence from Mizo tones and Assamese vowel harmony Indranil Dutta, Irfan S., Pamir Gogoi, Priyankoo Sarmah
2017-08-21 11:00-11:20 A2 Multimodal Paralinguistics Oral Mon-O-1-2-1 98 Multimodal markers of persuasive speech : designing a Virtual Debate Coach Volha Petukhova, Manoj Raju, Harry Bunt
2017-08-21 11:20-11:40 A2 Multimodal Paralinguistics Oral Mon-O-1-2-2 179 Acoustic-Prosodic and Physiological Response to Stressful Interactions in Children with Autism Spectrum Disorder Daniel Bone, Julia Mertens, Emily Zane, Sungbok Lee, Shrikanth Narayanan, Ruth Grossman
2017-08-21 11:40-12:00 A2 Multimodal Paralinguistics Oral Mon-O-1-2-3 1278 A Stepwise Analysis of Aggregated Crowdsourced Labels Describing Multimodal Emotional Behaviors Alec Burmania, Carlos Busso
2017-08-21 12:00-12:20 A2 Multimodal Paralinguistics Oral Mon-O-1-2-4 999 An information theoretic analysis of the temporal synchrony between head gestures and prosodic patterns in spontaneous speech Gaurav Fotedar, Prasanta Ghosh
2017-08-21 12:20-12:40 A2 Multimodal Paralinguistics Oral Mon-O-1-2-5 1088 Multimodal Prediction of Affect Dimensions Fusing Multiple Regression Techniques Dongyan Huang, Wan Ding, Mingyu Xu, Huaiping Ming, Xinguo Yu, Minghui Dong, Haizhou Li
2017-08-21 12:40-13:00 A2 Multimodal Paralinguistics Oral Mon-O-1-2-6 1329 Co-production of speech and pointing gestures in clear and perturbed interactive tasks: multimodal designation strategies Marion Dohen, Benjamin Roustan
2017-08-21 14:30-16:30 Poster 1 Speech Perception Poster Mon-P-2-1-1 2 Factors Affecting the Intelligibility of Low-pass Filtered Speech Lei Wang, Fei Chen
2017-08-21 14:30-16:30 Poster 1 Speech Perception Poster Mon-P-2-1-2 4 Phonetic Restoration of Temporally Reversed Speech Shi-yu Wang, Fei Chen
2017-08-21 14:30-16:30 Poster 1 Speech Perception Poster Mon-P-2-1-3 83 Simultaneous articulatory and acoustic distortion in L1 and L2 Listening: Locally time-reversed “fast” speech Mako Ishida
2017-08-21 14:30-16:30 Poster 1 Speech Perception Poster Mon-P-2-1-4 618 Lexically Guided Perceptual Learning in Mandarin Chinese L. Ann Burchfield, San-hei Kenny Luk, Mark Antoniou, Anne Cutler
2017-08-21 14:30-16:30 Poster 1 Speech Perception Poster Mon-P-2-1-5 948 The effect of spectral profile on the intelligibility of emotional speech in noise Chris Davis, Chee Seng Chong, Jeesun Kim
2017-08-21 14:30-16:30 Poster 1 Speech Perception Poster Mon-P-2-1-6 1517 Whether long-term tracking of speech rate affects perception depends on who is talking Merel Maslowski, Antje S. Meyer, Hans Rutger Bosker
2017-08-21 14:30-16:30 Poster 1 Speech Perception Poster Mon-P-2-1-7 1719 Emotional thin-slicing: a proposal for a short- and long-term division of emotional speech Daniel Oliveira Peres, Dominic Watt, Waldemar Ferreira Netto
2017-08-21 14:30-16:30 Poster 1 Speech Perception Poster Mon-P-2-1-8 1735 Predicting epenthetic vowel quality from acoustics Adriana Guevara-Rukoz, Erika Parlato-Oliveira, Shi Yu, Yuki Hirose, Sharon Peperkamp, Emmanuel Dupoux
2017-08-21 14:30-16:30 Poster 1 Speech Perception Poster Mon-P-2-1-9 282 The effect of spectral tilt on size discrimination of voiced speech sounds Toshie Matsui, Toshio Irino, Kodai Yamamoto, Hideki Kawahara, Roy Patterson
2017-08-21 14:30-16:30 Poster 1 Speech Perception Poster Mon-P-2-1-10 532 Misperceptions of the emotional content of natural and vocoded speech in a car Jaime Lorenzo-Trueba, Cassia Valentini-Botinhao, Gustav Eje Henter, Junichi Yamagishi
2017-08-21 14:30-16:30 Poster 1 Speech Perception Poster Mon-P-2-1-11 375 The relative cueing power of F0 and duration in German prominence perception Oliver Niebuhr, Jana Winkler
2017-08-21 14:30-16:30 Poster 1 Speech Perception Poster Mon-P-2-1-12 570 Perception and acoustics of vowel nasality in Brazilian Portuguese Luciana Marques, Rebecca Scarborough
2017-08-21 14:30-16:30 Poster 1 Speech Perception Poster Mon-P-2-1-13 1742 Sociophonetic realizations guide subsequent lexical access Jonny Kim, Katie Drager
2017-08-21 11:00-11:20 B4 Dereverberation, echo cancellation and speech Oral Mon-O-1-4-1 461 Improving Speaker Verification for Reverberant Conditions with Deep Neural Network Dereverberation Processing Peter Guzewich, Stephen Zahorian
2017-08-21 11:20-11:40 B4 Dereverberation, echo cancellation and speech Oral Mon-O-1-4-2 46 Stepsize Control for Acoustic Feedback Cancellation Based on the Detection of Reverberant Signal Periods and the Estimated System Distance Philipp Bulling, Klaus Linhard, Arthur Wolf, Gerhard Schmidt
2017-08-21 11:40-12:00 B4 Dereverberation, echo cancellation and speech Oral Mon-O-1-4-3 1084 A Delay-Flexible Stereo Acoustic Echo Cancellation for DFT-Based In-Car Communication (ICC) Systems Jan Franzen, Tim Fingscheidt
2017-08-21 12:00-12:20 B4 Dereverberation, echo cancellation and speech Oral Mon-O-1-4-4 78 Speech Enhancement Based on Harmonic Estimation combined with MMSE to Improve Speech Intelligibility for Cochlear Implant Recipients Dongmei Wang, John H.L. Hansen
2017-08-21 12:20-12:40 B4 Dereverberation, echo cancellation and speech Oral Mon-O-1-4-5 771 Improving speech intelligibility in binaural hearing aids by estimating a time-frequency mask with a weighted least squares classifier David Ayllon, Roberto Gil-Pita, Manuel Rosa-Zurera
2017-08-21 12:40-13:00 B4 Dereverberation, echo cancellation and speech Oral Mon-O-1-4-6 858 Simulations of high-frequency vocoder on Mandarin speech recognition for acoustic hearing preserved cochlear implant Tsung-Chen Wu, Tai-Shih Chi, Chia-Fone Lee
2017-08-21 14:30-14:50 A2 Pathological Speech and Language Oral Mon-O-2-2-1 378 Dominant Distortion Classification for Pre-Processing of Vowels in Remote Biomedical Voice Analysis Amir Hossein Poorjam, Jesper Rindom Jensen, Max A. Little, Mads Græsbøll Christensen
2017-08-21 14:50-15:10 A2 Pathological Speech and Language Oral Mon-O-2-2-2 626 Automatic Paraphasia Detection from Aphasic Speech: A Preliminary Study Duc Le, Keli Licata, Emily Mower Provost
2017-08-21 15:10-15:30 A2 Pathological Speech and Language Oral Mon-O-2-2-3 819 Evaluation of the neurological state of people with Parkinson’s disease using i-vectors Nicanor Garcia, Juan Rafael Orozco-Arroyave, Luis Fernando D’Haro, Najim Dehak, Elmar Noeth
2017-08-21 15:30-15:50 A2 Pathological Speech and Language Oral Mon-O-2-2-4 138 Objective Severity Assessment From Disordered Voice Using Estimated Glottal Airflow Yu-Ren Chien, Michal Borsky, Jon Gudnason
2017-08-21 15:50-16:10 A2 Pathological Speech and Language Oral Mon-O-2-2-5 1007 Earlier Identification of Children with Autism Spectrum Disorder: An Automatic Vocalisation-based Approach Florian Pokorny, Björn Schuller, Peter Marschik, Raymond Brueckner, Pär Nyström, Nicholas Cummins, Sven Bölte, Christa Einspieler, Terje Falck-Ytter
2017-08-21 16:10-16:30 A2 Pathological Speech and Language Oral Mon-O-2-2-6 1078 Convolutional Neural Network to Model Articulation Impairments in Patients with Parkinson’s Disease Juan Camilo Vásquez Correa, Juan Rafael Orozco-Arroyave, Elmar Noeth
2017-08-21 11:00-13:00 Poster 2 Speech and audio segmentation and classification 2 Poster Mon-P-1-2-1 74 Multilingual I-Vector based Statistical Modeling for Music Genre Classification Jia Dai, Wei Xue, Wenju Liu
2017-08-21 11:00-13:00 Poster 2 Speech and audio segmentation and classification 2 Poster Mon-P-1-2-2 309 Indoor/Outdoor Audio Classification using Foreground Speech Segmentation Banriskhem K. Khonglah, Deepak K T, S R Mahadeva Prasanna
2017-08-21 11:00-13:00 Poster 2 Speech and audio segmentation and classification 2 Poster Mon-P-1-2-3 440 Attention based CLDNNs for short-duration acoustic scene classification Jinxi Guo, Ning Xu, Li-Jia Li, Abeer Alwan
2017-08-21 11:00-13:00 Poster 2 Speech and audio segmentation and classification 2 Poster Mon-P-1-2-4 746 Frame-wise dynamic threshold based polyphonic acoustic event detection Xianjun Xia, Roberto Togneri, Ferdous Sohel, David Huang
2017-08-21 11:00-13:00 Poster 2 Speech and audio segmentation and classification 2 Poster Mon-P-1-2-5 792 Enhanced Feature Extraction for Speech Detection in Media Audio Inseon Jang, ChungHyun Ahn, Jeongil Seo, Younseon Jang
2017-08-21 11:00-13:00 Poster 2 Speech and audio segmentation and classification 2 Poster Mon-P-1-2-6 831 Unsupervised Filterbank Learning Using Convolutional Restricted Boltzmann Machine for Environmental Sound Classification Hardik Sailor, Dharmesh Agrawal, Hemant Patil
2017-08-21 11:00-13:00 Poster 2 Speech and audio segmentation and classification 2 Poster Mon-P-1-2-7 982 AUDIO CLASSIFICATION USING CLASS-SPECIFIC LEARNED DESCRIPTORS Sukanya Sonowal, Tushar Sandhan, Inkyu Choi, Nam Soo Kim
2017-08-21 11:00-13:00 Poster 2 Speech and audio segmentation and classification 2 Poster Mon-P-1-2-8 1160 Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery Janek Ebbers, Jahn Heymann, Lukas Drude, Thomas Glarner, Reinhold Haeb-Umbach, Bhiksha Raj
2017-08-21 11:00-13:00 Poster 2 Speech and audio segmentation and classification 2 Poster Mon-P-1-2-9 1238 Virtual Adversarial Training and Data Augmentation for Acoustic Event Detection with Gated Recurrent Neural Networks Matthias Zöhrer, Franz Pernkopf
2017-08-21 11:00-13:00 Poster 2 Speech and audio segmentation and classification 2 Poster Mon-P-1-2-10 1386 Montreal Forced Aligner: trainable text-speech alignment using Kaldi Michael McAuliffe, Michaela Socolof, Sarah Mihuc, Michael Wagner, Morgan Sonderegger
2017-08-21 11:00-13:00 Poster 2 Speech and audio segmentation and classification 2 Poster Mon-P-1-2-11 1388 A robust Voiced/Unvoiced phoneme classification from whispered speech using the “color” of whispered phonemes and Deep Neural Network Nisha Meenakshi, Prasanta Ghosh
2017-08-21 14:30-16:30 E397 Show & Tell 2 Show&Tell Mon-S&T-2-B-1 10035 An apparatus to investigate western opera singing skill learning using performance and result biofeedback, and measuring its neural correlates Aurore Jaumard-Hakoun, Samy Chikhi, Takfarinas Medani, Angelika Nair, Gérard Dreyfus, François-Benoît Vialatte
2017-08-21 14:30-16:30 E397 Show & Tell 2 Show&Tell Mon-S&T-2-B-2 10044 PercyConfigurator — Perception Experiments as a Service Christoph Draxler
2017-08-21 14:30-16:30 E397 Show & Tell 2 Show&Tell Mon-S&T-2-B-3 10045 System for speech transcription and post-editing in Microsoft Word Askars Salimbajevs, Indra Ikauniece
2017-08-21 14:30-16:30 E397 Show & Tell 2 Show&Tell Mon-S&T-2-B-4 10047 Emojive! Collecting Emotion Data from Speech and Facial Expression using Mobile Game App Ji Ho Park, Nayeon Lee, Dario Bertero, Anik Dey, Pascale Fung
2017-08-21 14:30-16:30 E397 Show & Tell 2 Show&Tell Mon-S&T-2-B-5 10059 Mylly – The Mill: A new platform for processing speech and text corpora easily and efficiently Mietta Lennes, Jussi Piitulainen, Martin Matthiesen
2017-08-21 11:00-13:00 F11 Special Session: Speech Technology for Code-Switching in Multilingual Communities Special Session Mon-SS-1-11-1 301 Longitudinal Speaker Clustering and Verification Corpus with Code-Switching Frisian-Dutch Speech Emre Yilmaz, Jelske Dijkstra, Hans Van de Velde, Frederik Kampstra, Jouke Algra, Henk Van den Heuvel, David Van Leeuwen
2017-08-21 11:00-13:00 F11 Special Session: Speech Technology for Code-Switching in Multilingual Communities Special Session Mon-SS-1-11-2 391 Exploiting Untranscribed Broadcast Data for Improved Code-Switching Detection Emre Yilmaz, Henk van den Heuvel, David van Leeuwen
2017-08-21 11:00-13:00 F11 Special Session: Speech Technology for Code-Switching in Multilingual Communities Special Session Mon-SS-1-11-3 1198 Jee haan, I’d like both, por favor: Elicitation of a Code-Switched Corpus of Hindi-English and Spanish-English Human-Machine Dialog Vikram Ramanarayanan, David Suendermann-Oeft
2017-08-21 11:00-13:00 F11 Special Session: Speech Technology for Code-Switching in Multilingual Communities Special Session Mon-SS-1-11-4 1244 On building mixed lingual speech synthesis systems SaiKrishna Rallabandi, Alan W Black
2017-08-21 11:00-13:00 F11 Special Session: Speech Technology for Code-Switching in Multilingual Communities Special Session Mon-SS-1-11-5 1259 Speech Synthesis for Mixed-Language Navigation Instructions Khyathi Chandu, Sai Krishna Rallabandi, Sunayana Sitaram, Alan W Black
2017-08-21 11:00-13:00 F11 Special Session: Speech Technology for Code-Switching in Multilingual Communities Special Session Mon-SS-1-11-6 1373 Addressing Code-Switching in French/Algerian Arabic Speech Amazouz Djegdjiga, Martine Adda-Decker, Lori Lamel
2017-08-21 11:00-13:00 F11 Special Session: Speech Technology for Code-Switching in Multilingual Communities Special Session Mon-SS-1-11-7 1429 Metrics for modeling code-switching across corpora Wally Guzman, Joseph Ricard, Jacqueline Serigos, Barbara Bullock, Almeida Jacqueline Toribio
2017-08-21 11:00-13:00 F11 Special Session: Speech Technology for Code-Switching in Multilingual Communities Special Session Mon-SS-1-11-8 1437 Synthesising isiZulu-English code-switch bigrams using word embeddings Ewald Van der westhuizen, Thomas Niesler
2017-08-21 11:00-13:00 F11 Special Session: Speech Technology for Code-Switching in Multilingual Communities Special Session Mon-SS-1-11-9 1663 Crowdsourcing Universal Part-Of-Speech Tags for Code-Switching Victor Soto, Julia Hirschberg
2017-08-21 14:30-16:30 D8 Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge II Special Session Mon-SS-2-8-1 1111 The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection Tomi Kinnunen, Md Sahidullah, Héctor Delgado, Massimiliano Todisco, Nicholas Evans, Junichi Yamagishi, Kong Aik Lee
2017-08-21 14:30-16:30 D8 Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge II Special Session Mon-SS-2-8-2 1085 ResNet and Model Fusion for Automatic Spoofing Detection Zhuxin Chen, Zhifeng Xie, Weibin Zhang, Xiangmin Xu
2017-08-21 14:30-16:30 D8 Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge II Special Session Mon-SS-2-8-3 360 Audio replay attack detection with deep learning frameworks Galina Lavrentyeva, Sergey Novoselov, Egor Malykh, Alexandr Kozlov, Oleg Kudashev, Vadim Shchemelinin
2017-08-21 14:30-16:30 D8 Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge II Special Session Mon-SS-2-8-4 450 Experimental analysis of features for replay attack detection-Results on the ASVspoof 2017 Challenge Roberto Javier Font Ruiz, María José Cano Vicente, Juan Manuel Espín López
2017-08-21 14:30-16:30 D8 Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge II Special Session Mon-SS-2-8-5 906 Countermeasures for Automatic Speaker Verification Replay Spoofing Attack : On Data Augmentation, Feature Representation, Classification and Fusion Weicheng Cai, Danwei Cai, Wenbo Liu, Ming Li
2017-08-21 14:30-16:30 D8 Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge II Special Session Mon-SS-2-8-6 776 Audio Replay Attack Detection with High-Frequency Features Marcin Witkowski, Stanisław Kacprzak, Piotr Żelasko, Konrad Kowalczyk, Jakub Gałka
2017-08-21 14:30-16:30 D8 Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge II Special Session Mon-SS-2-8-7 1362 Novel Variable Length Teager Energy Separation Based Instantaneous Frequency Features for Replay Detection Hemant Patil, Madhu Kamble, Tanvina Patel, Meet Soni
2017-08-21 14:30-16:30 D8 Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge II Special Session Mon-SS-2-8-8 1246 Ensemble learning for countermeasure of audio replay spoofing attack in ASVspoof2017 Zhe Ji, Zhi-Yi Li, Peng Li, Maobo An, Shengxiang Gao, Dan Wu, Faru Zhao
2017-08-21 14:30-16:30 D8 Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge II Special Session Mon-SS-2-8-9 676 SFF Anti-Spoofer: IIIT-H Submission for Automatic Speaker Verification Spoofing and Countermeasures Challenge 2017 K N R K Raju Alluri, Sivanand Achanta, Sudarsana Reddy Kadiri, Suryakanth V Gangashetty, Anil Kumar Vuppala
2017-08-21 14:30-16:30 D8 Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge II Special Session Mon-SS-2-8-10 1377 Replay Attack Detection using DNN for Channel Discrimination Parav Nagarsheth, Elie Khoury, Kailash Patil, Matt Garland
2017-08-21 14:30-16:30 D8 Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge II Special Session Mon-SS-2-8-11 304 Feature selection based on CQCCs for Automatic Speaker Verification spoofing Wang Xianliang, Xiao Yanhong, Zhu Xuan
2017-08-21 14:30-16:30 D8 Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge II Special Session Mon-SS-2-8-12 930 Spoof Detection Using Source, Instantaneous Frequency and Cepstral Features Sarfaraz Jelil, Rohan Kumar Das, S R Mahadeva Prasanna, Rohit Sinha
2017-08-21 14:30-16:30 D8 Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge II Special Session Mon-SS-2-8-13 456 A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification Lantian Li, Yixiang Chen, Dong Wang, Thomas Fang Zheng
2017-08-21 11:00-11:20 Main hall Conversational Telephone Speech Recognition Oral Mon-O-1-1-1 1513 Improved Single System Conversational Telephone Speech Recognition with VGG Bottleneck Features William Hartmann, Roger Hsiao, Tim Ng, Jeff Ma, Francis Keith, Man-hung Siu
2017-08-21 11:20-11:40 Main hall Conversational Telephone Speech Recognition Oral Mon-O-1-1-2 145 Student-teacher training with diverse decision tree ensembles Jeremy H. M. Wong, Mark Gales
2017-08-21 11:40-12:00 Main hall Conversational Telephone Speech Recognition Oral Mon-O-1-1-3 460 Embedding-Based Speaker Adaptive Training of Deep Neural Networks Xiaodong Cui, Vaibhava Goel, George Saon
2017-08-21 12:00-12:20 Main hall Conversational Telephone Speech Recognition Oral Mon-O-1-1-4 1058 Improving Deliverable Speech-to-text Systems with Multilingual Knowledge Transfer Jeff Ma, Francis Keith, Owen Kimball, Man-hung Siu, Tim Ng
2017-08-21 12:20-12:40 Main hall Conversational Telephone Speech Recognition Oral Mon-O-1-1-5 405 English Conversational Telephone Speech Recognition by Humans and Machines George Saon, Gakuto Kurata, Tom Sercu, Kartik Audhkhasi, Samuel Thomas, Dimitrios Dimitriadis, Xiaodong Cui, Bhuvana Ramabhadran, Michael Picheny, Lynn-Li Lim, Bergul Roomi, Phil Hall
2017-08-21 12:40-13:00 Main hall Conversational Telephone Speech Recognition Oral Mon-O-1-1-6 1544 Comparing Human and Machine Errors in Conversational Speech Transcription Andreas Stolcke, Jasha Droppo
2017-08-21 14:30-16:30 E306 Show & Tell 1 Show&Tell Mon-S&T-2-A-1 10034 Prosograph: A Tool for Prosody Visualisation of Large Speech Corpora Alp Oktem, Mireia Farrús, Leo Wanner
2017-08-21 14:30-16:30 E306 Show & Tell 1 Show&Tell Mon-S&T-2-A-2 10048 ChunkitApp: Investigating the relevant units of online speech processing Svetlana Vetchinnikova, Anna Mauranen, Nina Mikusova
2017-08-21 14:30-16:30 E306 Show & Tell 1 Show&Tell Mon-S&T-2-A-3 10049 Extending the EMU Speech Database Management System: Cloud Hosting, Team Collaboration, Automatic Revision Control Markus Jochim
2017-08-21 14:30-16:30 E306 Show & Tell 1 Show&Tell Mon-S&T-2-A-4 10051 HomeBank: A repository for long-form real-world audio recordings of children Anne Warlaumont, Mark vanDam, Elika Bergelson, Alejandrina Cristia
2017-08-21 14:30-16:30 E306 Show & Tell 1 Show&Tell Mon-S&T-2-A-5 10052 A system for real-time collaborative transcription correction Peter Bell, Joachim Fainberg, Catherine Lai, Mark Sinclair
2017-08-22 16:00-16:20 E10 Stance, Credibility, and Deception Oral Tue-O-5-10-1 159 Inferring Stance from Prosody Nigel Ward, Jason Carlson, Olac Fuentes, Diego Castan, Elizabeth Shriberg, Andreas Tsiartas
2017-08-22 16:20-16:40 E10 Stance, Credibility, and Deception Oral Tue-O-5-10-2 1706 Exploring Dynamic Measures of Stance in Spoken Interaction Gina-Anne Levow, Richard A. Wright
2017-08-22 16:40-17:00 E10 Stance, Credibility, and Deception Oral Tue-O-5-10-3 1035 Opinion Dynamics Modeling for Movie Review Transcripts Classification with Hidden Conditional Random Fields Valentin Barriere, Chloé Clavel, Slim Essid
2017-08-22 17:00-17:20 E10 Stance, Credibility, and Deception Oral Tue-O-5-10-4 121 TRANSFER LEARNING BETWEEN CONCEPTS FOR HUMAN BEHAVIOR MODELING: AN APPLICATION TO SINCERITY AND DECEPTION PREDICTION Qinyi Luo, Rahul Gupta, Shrikanth Narayanan
2017-08-22 17:20-17:40 E10 Stance, Credibility, and Deception Oral Tue-O-5-10-5 384 The Sound of Deception – What Makes a Speaker Credible? Anne Schröder, Simon Stone, Peter Birkholz
2017-08-22 17:40-18:00 E10 Stance, Credibility, and Deception Oral Tue-O-5-10-6 1723 Hybrid Acoustic-Lexical Deep Learning Approach for Deception Detection Gideon Mendels, Sarah Ita Levitan, Kai-Zhan Lee, Julia Hirschberg
2017-08-22 13:30-13:50 Main hall WaveNet and Novel Paradigms Oral Tue-O-4-1-1 314 Speaker-dependent WaveNet vocoder Akira Tamamori, Tomoki Hayashi, Kazuhiro Kobayashi, Kazuya Takeda, Tomoki Toda
2017-08-22 13:50-14:10 Main hall WaveNet and Novel Paradigms Oral Tue-O-4-1-2 336 Waveform Modeling Using Stacked Dilated Convolutional Neural Networks for Speech Bandwidth Extension Yu Gu, Zhen-Hua Ling
2017-08-22 14:10-14:30 Main hall WaveNet and Novel Paradigms Oral Tue-O-4-1-3 488 Direct modeling of frequency spectra and waveform generation based on phase recovery for DNN-based speech synthesis Shinji Takaki, Hirokazu Kameoka, Junichi Yamagishi
2017-08-22 14:30-14:50 Main hall WaveNet and Novel Paradigms Oral Tue-O-4-1-4 628 A Hierarchical Encoder-Decoder Model for Statistical parametric speech synthesis Srikanth Ronanki, Oliver Watts, Simon King
2017-08-22 14:50-15:10 Main hall WaveNet and Novel Paradigms Oral Tue-O-4-1-5 986 Statistical voice conversion with WaveNet-based waveform generation Kazuhiro Kobayashi, Tomoki Hayashi, Akira Tamamori, Tomoki Toda
2017-08-22 15:10-15:30 Main hall WaveNet and Novel Paradigms Oral Tue-O-4-1-6 1107 Google’s Next-Generation Real-Time Unit-Selection Synthesizer using Sequence-To-Sequence LSTM-based Autoencoders Vincent Wan, Yannis Agiomyrgiannakis, Hanna Silen, Jakub Vit
2017-08-22 10:00-10:20 B4 Speaker Recognition Oral Tue-O-3-4-1 620 Deep Neural Network Embeddings for Text-Independent Speaker Verification David Snyder, Daniel Garcia-Romero, Dan Povey, Sanjeev Khudanpur
2017-08-22 10:20-10:40 B4 Speaker Recognition Oral Tue-O-3-4-2 1018 Tied Variational Autoencoder Backends for i-Vector Speaker Recognition Jesus Villalba, Niko Brummer, Najim Dehak
2017-08-22 10:40-11:00 B4 Speaker Recognition Oral Tue-O-3-4-3 1182 Improved Gender Independent Speaker Recognition Using Convolutional Neural Network Based Bottleneck Features Shivesh Ranjan, John H.L. Hansen
2017-08-22 11:00-11:20 B4 Speaker Recognition Oral Tue-O-3-4-4 49 Autoencoder based Domain Adaptation for Speaker Recognition under Insufficient Channel Information Suwon Shon, Seongkyu Mun, Wooil Kim, Hanseok Ko
2017-08-22 11:20-11:40 B4 Speaker Recognition Oral Tue-O-3-4-5 829 Nonparametrically Trained Probabilistic Linear Discriminant Analysis for i-Vector Speaker Verification Abbas Khosravani, Mohammad Mehdi Homayounpour
2017-08-22 11:40-12:00 B4 Speaker Recognition Oral Tue-O-3-4-6 144 DNN bottleneck features for speaker clustering Jesús Jorrín, Leibny Paola Garcia Perera, Luis Buera
2017-08-22 10:00-12:00 Poster 2 Speaker Characterization and Recognition Poster Tue-P-3-2-1 633 Speaker Verification via Estimating Total Variability Space Using Probabilistic Partial Least Squares Chen Chen, Jiqing Han, Yilin Pan
2017-08-22 10:00-12:00 Poster 2 Speaker Characterization and Recognition Poster Tue-P-3-2-2 452 Deep Speaker Feature Learning for Text-independent Speaker Verification Lantian Li, Yixiang Chen, Ying Shi, Zhiyuan Tang, Dong Wang
2017-08-22 10:00-12:00 Poster 2 Speaker Characterization and Recognition Poster Tue-P-3-2-3 93 Duration mismatch compensation using four-covariance model and deep neural network for speaker verification Pierre-Michel Bousquet, Mickael Rouvier
2017-08-22 10:00-12:00 Poster 2 Speaker Characterization and Recognition Poster Tue-P-3-2-4 1586 Extended Variability Modeling and Unsupervised Adaptation for PLDA Speaker Recognition Alan McCree, Greg Sell, Daniel Garcia-Romero
2017-08-22 10:00-12:00 Poster 2 Speaker Characterization and Recognition Poster Tue-P-3-2-5 438 Improving the Effectiveness of Speaker Verification Domain Adaptation With Inadequate In-Domain Data Jonas Borgstrom, Elliot Singer, Douglas Reynolds, Seyed Omid Sadjadi
2017-08-22 10:00-12:00 Poster 2 Speaker Characterization and Recognition Poster Tue-P-3-2-6 656 I-Vector DNN Scoring and Calibration for Noise Robust Speaker Verification Zhili Tan, Manwai Mak
2017-08-22 10:00-12:00 Poster 2 Speaker Characterization and Recognition Poster Tue-P-3-2-7 803 Analysis of Score Normalization in Multilingual Speaker Recognition Pavel Matejka, Oldrich Plchot, Ondřej Novotný, Lukas Burget, Mireia Diez Sánchez, Jan Černocký
2017-08-22 10:00-12:00 Poster 2 Speaker Characterization and Recognition Poster Tue-P-3-2-8 1062 Alternative Approaches to Neural Network based Speaker Verification Anna Silnova, Lukas Burget, Jan Černocký
2017-08-22 10:00-12:00 Poster 2 Speaker Characterization and Recognition Poster Tue-P-3-2-9 219 A Distribution Free Formulation of the Total Variability Model Ruchir Travadi, Shrikanth Narayanan
2017-08-22 10:00-12:00 Poster 2 Speaker Characterization and Recognition Poster Tue-P-3-2-10 545 Recursive Whitening Transformation for Speaker Recognition on Language Mismatched Condition Suwon Shon, Seongkyu Mun, Hanseok Ko
2017-08-22 10:00-12:00 Poster 2 Speaker Characterization and Recognition Poster Tue-P-3-2-11 668 Domain mismatch modeling of out-domain i-vectors for PLDA speaker verification Md Hafizur Rahman, Ivan Himawan, David Dean, Sridha Sridharan
2017-08-22 13:30-15:30 Poster 2 Acoustic Models for ASR II Poster Tue-P-4-2-1 1323 Backstitch: Counteracting Finite-sample Bias via Negative Steps Yiming Wang, Vijayaditya Peddinti, Hainan Xu, Xiaohui Zhang, Dan Povey, Sanjeev Khudanpur
2017-08-22 13:30-15:30 Poster 2 Acoustic Models for ASR II Poster Tue-P-4-2-2 779 Node pruning based on Entropy of Weights and Node Activity for Small-footprint Acoustic Model based on Deep Neural Networks Ryu Takeda, Kazuhiro Nakadai, Kazunori Komatani
2017-08-22 13:30-15:30 Poster 2 Acoustic Models for ASR II Poster Tue-P-4-2-3 584 Complex-valued restricted Boltzmann machine for direct learning of frequency spectra Toru Nakashika, Shinji Takaki, Junichi Yamagishi
2017-08-22 13:30-15:30 Poster 2 Acoustic Models for ASR II Poster Tue-P-4-2-4 1284 End-to-End Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition with TensorFlow Ehsan Variani, Tom Bagby, Erik McDermott, Michiel Bacchiani
2017-08-22 13:30-15:30 Poster 2 Acoustic Models for ASR II Poster Tue-P-4-2-5 1557 An Efficient Phone N-gram Forward-backward Computation Using Dense Matrix Multiplication Khe Chai Sim, Arun Narayanan
2017-08-22 13:30-15:30 Poster 2 Acoustic Models for ASR II Poster Tue-P-4-2-6 1747 Parallel Neural Network Features for Improved Tandem Acoustic Modeling Zoltán Tüske, Wilfried Michel, Ralf Schlüter, Hermann Ney
2017-08-22 13:30-15:30 Poster 2 Acoustic Models for ASR II Poster Tue-P-4-2-7 1581 Acoustic feature learning with deep variational canonical correlation analysis Qingming Tang, Weiran Wang, Karen Livescu
2017-08-22 16:00-18:00 F11 Special Session: Acoustic Manifestations of Social Characteristics Special Session Tue-SS-5-11-1 28 Clear Speech – Mere Speech? How segmental and prosodic speech reduction shape the impression that speakers create on listeners Oliver Niebuhr
2017-08-22 16:00-18:00 F11 Special Session: Acoustic Manifestations of Social Characteristics Special Session Tue-SS-5-11-2 293 Relationships between speech timing and perceived hostility in a French corpus of political debates Charlotte Kouklia, Nicolas Audibert
2017-08-22 16:00-18:00 F11 Special Session: Acoustic Manifestations of Social Characteristics Special Session Tue-SS-5-11-3 328 Towards Speaker Characterization: Identifying and Predicting Dimensions of Person Attribution Laura Fernández Gallardo, Benjamin Weiss
2017-08-22 16:00-18:00 F11 Special Session: Acoustic Manifestations of Social Characteristics Special Session Tue-SS-5-11-4 623 Prosodic analysis of attention-drawing speech Carlos Ishi, Jun Arai, Norihiro Hagita
2017-08-22 16:00-18:00 F11 Special Session: Acoustic Manifestations of Social Characteristics Special Session Tue-SS-5-11-5 1055 Perceptual and acoustic correlates of gender in the prepubertal voice Adrian Simpson, Riccarda Funk, Frederik Palmer
2017-08-22 16:00-18:00 F11 Special Session: Acoustic Manifestations of Social Characteristics Special Session Tue-SS-5-11-6 1248 To see or not to see: Interlocutor visibility and likeability influence convergence in intonation Katrin Schweitzer, Michael Walsh, Antje Schweitzer
2017-08-22 16:00-18:00 F11 Special Session: Acoustic Manifestations of Social Characteristics Special Session Tue-SS-5-11-7 1394 Acoustic correlates of parental role and gender identity in the speech of expecting parents Melanie Weirich, Adrian Simpson
2017-08-22 16:00-18:00 F11 Special Session: Acoustic Manifestations of Social Characteristics Special Session Tue-SS-5-11-8 1732 A Semi-Supervised Learning Approach for Acoustic-Prosodic Personality Perception in Under-Resourced Domains Rubén Solera-Ureña, Helena Moniz, Fernando Batista, Vera Cabarrao, Anna Pompili, Ramón Fernández-Astudillo, Joana Campos, Ana Paiva, Isabel Trancoso
2017-08-22 16:00-18:00 F11 Special Session: Acoustic Manifestations of Social Characteristics Special Session Tue-SS-5-11-9 1746 Effects of Talker Dialect, Gender & Race on Accuracy of Bing Speech and YouTube Automatic Captions Rachael Tatman, Conner Kasten
2017-08-22 16:00-16:20 Main hall Neural Network Acoustic Models for ASR II Oral Tue-O-5-1-1 1705 Recurrent Neural Aligner: An Encoder-Decoder Neural Network Model for Sequence to Sequence Mapping Hasim Sak, Matt Shannon, Kanishka Rao, Francoise Beaufays
2017-08-22 16:20-16:40 Main hall Neural Network Acoustic Models for ASR II Oral Tue-O-5-1-2 429 Highway-LSTM and Recurrent Highway Networks for Speech Recognition Golan Pundak, Tara Sainath
2017-08-22 16:40-17:00 Main hall Neural Network Acoustic Models for ASR II Oral Tue-O-5-1-3 775 Improving speech recognition by revising gated recurrent units Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio
2017-08-22 17:00-17:20 Main hall Neural Network Acoustic Models for ASR II Oral Tue-O-5-1-4 856 Stochastic Recurrent Neural Network for Speech Recognition Jen-Tzung Chien, Chen Shen
2017-08-22 17:20-17:40 Main hall Neural Network Acoustic Models for ASR II Oral Tue-O-5-1-5 1064 Frame and Segment Level Recurrent Neural Networks for Phone Classification Martin Ratajczak, Sebastian Tschiatschek, Franz Pernkopf
2017-08-22 17:40-18:00 Main hall Neural Network Acoustic Models for ASR II Oral Tue-O-5-1-6 1695 Deep Learning-based Telephony Speech Recognition in the Wild Kyu Han, Seongjun Hahm, Byung-Hak Kim, Jungsuk Kim, Ian Lane
2017-08-22 13:30-13:50 E10 Voice Conversion 1 Oral Tue-O-4-10-1 247 Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities Hiroyuki Miyoshi, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari
2017-08-22 13:50-14:10 E10 Voice Conversion 1 Oral Tue-O-4-10-2 349 Learning Latent Representations for Speech Generation and Transformation Wei-Ning Hsu, Yu Zhang, James Glass
2017-08-22 14:10-14:30 E10 Voice Conversion 1 Oral Tue-O-4-10-3 961 Parallel-data-free Many-to-many Voice Conversion based on DNN Integrated with Eigenspace Using a Non-parallel Speech Corpus Tetsuya Hashimoto, Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu
2017-08-22 14:30-14:50 E10 Voice Conversion 1 Oral Tue-O-4-10-4 970 Sequence-to-Sequence Voice Conversion with Similarity Metric Learned Using Generative Adversarial Networks Takuhiro Kaneko, Hirokazu Kameoka, Kaoru Hiramatsu, Kunio Kashino
2017-08-22 14:50-15:10 E10 Voice Conversion 1 Oral Tue-O-4-10-5 1453 A mouth opening effect based on pole modification for expressive singing voice transformation Luc Ardaillon, Axel Roebel
2017-08-22 15:10-15:30 E10 Voice Conversion 1 Oral Tue-O-4-10-6 1434 Siamese Autoencoders for Speech Style Extraction and Switching Applied to Voice Identification and Conversion Seyed Hamidreza Mohammadi, Alexander Kain
2017-08-22 10:00-10:20 C6 Phonation and voice quality Oral Tue-O-3-6-1 1155 Creak as a feature of lexical stress in Estonian Kätlin Aare, Pärtel Lippus, Juraj Šimko
2017-08-22 10:20-10:40 C6 Phonation and voice quality Oral Tue-O-3-6-2 1535 Cross-speaker Variation in Voice Source Correlates of Focus and Deaccentuation Irena Yanushevskaya, Ailbhe Ní Chasaide, Christer Gobl
2017-08-22 10:40-11:00 C6 Phonation and voice quality Oral Tue-O-3-6-3 604 Acoustic Characterization of Word-final Glottal Stops in Mizo and Assam Sora Sishir Kalita, Wendy Lalhminghlui, Luke Horo, Priyankoo Sarmah, S R Mahadeva Prasanna, Samarendra Dandapat
2017-08-22 11:00-11:20 C6 Phonation and voice quality Oral Tue-O-3-6-4 79 Iterative Optimal Preemphasis for Improved Glottal-Flow Estimation by Iterative Adaptive Inverse Filtering Parham Mokhtari, Hiroshi Ando
2017-08-22 11:20-11:40 C6 Phonation and voice quality Oral Tue-O-3-6-5 870 Automatic Measurement of Pre-aspiration Yaniv Sheena, Michaela Hejna, Yossi Adi, Joseph Keshet
2017-08-22 11:40-12:00 C6 Phonation and voice quality Oral Tue-O-3-6-6 1774 Acoustic and electroglottographic study of breathy and modal vowels as produced by heritage and native Gujarati speakers Kiranpreet Nara
2017-08-22 16:00-16:20 C6 Prosody (rhythm, stress, quantity, phrasing) Oral Tue-O-5-6-1 544 Similar prosodic structure perceived differently in German and English Heather Kember, Ann-Kathrin Grohe, Katharina Zahner, Bettina Braun, Andrea Weber, Anne Cutler
2017-08-22 16:20-16:40 C6 Prosody (rhythm, stress, quantity, phrasing) Oral Tue-O-5-6-2 1214 Disambiguate or not? – The role of prosody in unambiguous and potentially ambiguous anaphora production in strictly Mandarin parallel structures Luying Hou, Bert Le Bruyn, René Kager
2017-08-22 16:40-17:00 C6 Prosody (rhythm, stress, quantity, phrasing) Oral Tue-O-5-6-3 1514 Acoustic Properties of Canonical and Non-Canonical Stress in French, Turkish, Armenian and Brazilian Portuguese Angeliki Athanasopoulou, Irene Vogel, Hossep Dolatian
2017-08-22 17:00-17:20 C6 Prosody (rhythm, stress, quantity, phrasing) Oral Tue-O-5-6-4 987 Phonological complexity, segment rate and speech tempo perception Leendert Plug, Rachel Smith
2017-08-22 17:20-17:40 C6 Prosody (rhythm, stress, quantity, phrasing) Oral Tue-O-5-6-5 29 On the Duration of Mandarin Tones Jing Yang, Yu Zhang, Aijun Li, Li Xu
2017-08-22 17:40-18:00 C6 Prosody (rhythm, stress, quantity, phrasing) Oral Tue-O-5-6-6 1134 The formant dynamics of long close vowels in three varieties of Swedish Otto Ewald, Eva Liina Asu, Susanne Schötz
2017-08-22 13:30-13:50 A2 Models of Speech Perception Oral Tue-O-4-2-1 567 A Comparison of Sentence-level Speech Intelligibility Metrics Alexander Kain, Max Del Giudice, Kris Tjaden
2017-08-22 13:50-14:10 A2 Models of Speech Perception Oral Tue-O-4-2-2 196 An auditory model of speaker size perception for voiced speech sounds Toshio Irino, Eri Takimoto, Toshie Matsui, Roy Patterson
2017-08-22 14:10-14:30 A2 Models of Speech Perception Oral Tue-O-4-2-3 1048 The recognition of compounds: a computational account Louis ten Bosch, Lou Boves, Mirjam Ernestus
2017-08-22 14:30-14:50 A2 Models of Speech Perception Oral Tue-O-4-2-4 1158 Humans do not maximize the probability of correct decision when recognizing DANTALE words in noise Mohsen Zareian Jahromi, Jan Østergaard, Jesper Jensen
2017-08-22 14:50-15:10 A2 Models of Speech Perception Oral Tue-O-4-2-5 1360 Single-ended prediction of listening effort based on automatic speech recognition Rainer Huber, Constantin Spille, Bernd T. Meyer
2017-08-22 15:10-15:30 A2 Models of Speech Perception Oral Tue-O-4-2-6 1611 Modeling categorical perception with the receptive fields of auditory neurons Chris Neufeld
2017-08-22 16:00-18:00 Poster 4 Speech-enhancement Poster Tue-P-5-4-1 62 A Post-filtering Approach Based on Locally Linear Embedding Difference Compensation for Speech Enhancement YICHIAO WU, Hsin-Te Hwang, Syu-Siang Wang, Chin-Cheng Hsu, Yu Tsao, Hsin-Min Wang
2017-08-22 16:00-18:00 Poster 4 Speech-enhancement Poster Tue-P-5-4-2 240 Multi-target Ensemble Learning for Monaural Speech Separation Hui Zhang, Xueliang Zhang, Guanglai Gao
2017-08-22 16:00-18:00 Poster 4 Speech-enhancement Poster Tue-P-5-4-3 543 Improved Example-based Speech Enhancement by Using Deep Neural Network Acoustic Model for Noise Robust Example Search Atsunori Ogawa, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani
2017-08-22 16:00-18:00 Poster 4 Speech-enhancement Poster Tue-P-5-4-4 1041 Subjective intelligibility of deep neural network-based speech enhancement Femke B. Gelderblom, Tron V. Tronstad, Erlend M. Viggen
2017-08-22 16:00-18:00 Poster 4 Speech-enhancement Poster Tue-P-5-4-5 1157 REAL-TIME MODULATION ENHANCEMENT OF TEMPORAL ENVELOPES FOR INCREASING SPEECH INTELLIGIBILITY Maria Koutsogiannaki, Holly Francois, Kihyun Choo, Eunmi Oh
2017-08-22 16:00-18:00 Poster 4 Speech-enhancement Poster Tue-P-5-4-6 1173 On the influence of modifying magnitude and phase spectrum to enhance noisy speech signals Hans-Guenter Hirsch, Michael Gref
2017-08-22 16:00-18:00 Poster 4 Speech-enhancement Poster Tue-P-5-4-7 1243 MixMax Approximation as a Super-Gaussian Log-Spectral Amplitude Estimator for Speech Enhancement Robert Rehr, Timo Gerkmann
2017-08-22 16:00-18:00 Poster 4 Speech-enhancement Poster Tue-P-5-4-8 1257 Binary mask estimation strategies for constrained imputation-based speech enhancement Ricard Marxer, Jon Barker
2017-08-22 16:00-18:00 Poster 4 Speech-enhancement Poster Tue-P-5-4-9 1465 A Fully Convolutional Network for Speech Enhancement Serim Park, Jinwon Lee
2017-08-22 16:00-18:00 Poster 4 Speech-enhancement Poster Tue-P-5-4-10 1492 Speech enhancement using non-negative spectrogram models with mel-generalized cepstral regularization Li Li, Hirokazu Kameoka, Tomoki Toda, Shoji Makino
2017-08-22 16:00-18:00 Poster 4 Speech-enhancement Poster Tue-P-5-4-11 1504 A comparison of perceptually motivated loss functions for binary mask estimation in speech separation Danny Websdale, Ben Milner
2017-08-22 16:00-18:00 Poster 4 Speech-enhancement Poster Tue-P-5-4-12 1620 Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification Daniel Michelsanti, Zheng-Hua Tan
2017-08-22 16:00-18:00 Poster 4 Speech-enhancement Poster Tue-P-5-4-13 1672 Speech Enhancement Using Bayesian Wavenet Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Dinei Florencio, Mark Hasegawa-Johnson
2017-08-22 16:00-18:00 Poster 4 Speech-enhancement Poster Tue-P-5-4-14 297 BINAURAL REVERBERANT SPEECH SEPARATION BASED ON DEEP NEURAL NETWORKS Xueliang Zhang, DeLiang Wang
2017-08-22 16:00-18:00 Poster 4 Speech-enhancement Poster Tue-P-5-4-15 1225 On the quality and intelligibility of noisy speech processed for near-end listening enhancement Catalin Zorila, Yannis Stylianou
2017-08-22 13:30-13:50 C6 Prosody (tone and intonation) Oral Tue-O-4-6-1 1635 The Vocative Chant and Beyond: German Calling Melodies under Routine and Urgent Contexts Sergio Quiroz, Marzena Zygis
2017-08-22 13:50-14:10 C6 Prosody (tone and intonation) Oral Tue-O-4-6-2 1044 Comparing languages using hierarchical prosodic analysis Juraj Šimko, Antti Suni, Katri Hiovain, Martti Vainio
2017-08-22 14:10-14:30 C6 Prosody (tone and intonation) Oral Tue-O-4-6-3 264 Intonation Facilitates Prediction of Focus even in the Presence of Lexical Tones Martin Ho Kwan Ip, Anne Cutler
2017-08-22 14:30-14:50 C6 Prosody (tone and intonation) Oral Tue-O-4-6-4 839 Mind the peak: When museum is temporarily understood as musical in Australian English Katharina Zahner, Heather Kember, Bettina Braun
2017-08-22 14:50-15:10 C6 Prosody (tone and intonation) Oral Tue-O-4-6-5 1353 Pashto intonation patterns Luca Rognoni, Judith Bishop, Miriam Corris
2017-08-22 15:10-15:30 C6 Prosody (tone and intonation) Oral Tue-O-4-6-6 175 A new model of final lowering in spontaneous monologue Kikuo Maekawa
2017-08-22 13:30-15:30 E397 Show & Tell 4 Show&Tell Tue-S&T-4-B-1 10030 Evolving recurrent neural networks that process and classify raw audio in a streaming fashion Adrien DANIEL
2017-08-22 13:30-15:30 E397 Show & Tell 4 Show&Tell Tue-S&T-4-B-2 10032 Combining Gaussian mixture models and segmental feature models for speaker recognition Milana Milošević, Ulrike Glavitsch
2017-08-22 13:30-15:30 E397 Show & Tell 4 Show&Tell Tue-S&T-4-B-3 10036 Did you laugh enough today? – Deep Neural Networks for Mobile and Wearable Laughter Trackers Gerhard Hagerer, Nicholas Cummins, Florian Eyben, Björn Schuller
2017-08-22 13:30-15:30 E397 Show & Tell 4 Show&Tell Tue-S&T-4-B-4 10037 Low-Frequency Ultrasonic Communication for Speech Broadcasting in Public Transportation Kwang Myung Jeon, Nam Kyun Kim, Chan Woong Kwak, Jung Min Moon, Hong Kook Kim
2017-08-22 13:30-15:30 E397 Show & Tell 4 Show&Tell Tue-S&T-4-B-5 10039 Real-time Speech Enhancement with GCC-NMF: Demonstration on the Raspberry Pi and NVIDIA Jetson Sean Wood, Jean Rouat
2017-08-22 16:00-16:20 B4 Glottal Source Modeling Oral Tue-O-5-4-1 15 A new cosine series antialiasing function and its application to aliasing-free glottal source models for speech and singing synthesis Hideki Kawahara, Ken-Ichi Sakakibara, Hideki Banno, Masanori Morise, Tomoki Toda, Toshio Irino
2017-08-22 16:20-16:40 B4 Glottal Source Modeling Oral Tue-O-5-4-2 400 Speaking style conversion from normal to Lombard speech using a glottal vocoder and Bayesian GMMs Ana Ramírez López, Shreyas Seshadri, Lauri Juvela, Okko Räsänen, Paavo Alku
2017-08-22 16:40-17:00 B4 Glottal Source Modeling Oral Tue-O-5-4-3 848 Reducing mismatch in training of DNN-based glottal excitation models in a statistical parametric text-to-speech system Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku
2017-08-22 17:00-17:20 B4 Glottal Source Modeling Oral Tue-O-5-4-4 1202 Semi Parametric Concatenative TTS with Instant Voice Modification Capabilities Alexander Sorin, Slava Shechtman, Asaf Rendel
2017-08-22 17:20-17:40 B4 Glottal Source Modeling Oral Tue-O-5-4-5 1722 Modeling laryngeal muscle activation noise for low-order physiological based speech synthesis Rodrigo Manriquez, Sean Peterson, Pavel Prado, Patricio Orio, Matias Zañartu
2017-08-22 17:40-18:00 B4 Glottal Source Modeling Oral Tue-O-5-4-6 1647 Direct Modelling of Magnitude and Phase Spectra for Statistical Parametric Speech Synthesis Felipe Espic, Cassia Valentini-Botinhao, Simon King
2017-08-22 10:00-10:20 E10 Emotion Recognition Oral Tue-O-3-10-1 200 Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms Aharon Satt, Shai Rozenberg, Ron Hoory
2017-08-22 10:20-10:40 E10 Emotion Recognition Oral Tue-O-3-10-2 713 Interaction and Transition Model for Speech Emotion Recognition in Dialogue Ruo Zhang, Atsushi Ando, Satoshi Kobashikawa, Yushi Aono
2017-08-22 10:40-11:00 E10 Emotion Recognition Oral Tue-O-3-10-3 1637 Progressive Neural Networks for Transfer Learning in Emotion Recognition John Gideon, Soheil Khorram, Zakaria Aldeneh, Dimitrios Dimitriadis, Emily Mower Provost
2017-08-22 11:00-11:20 E10 Emotion Recognition Oral Tue-O-3-10-4 1494 Jointly Predicting Arousal, Valence and Dominance with Multi-Task Learning Srinivas Parthasarathy, Carlos Busso
2017-08-22 11:20-11:40 E10 Emotion Recognition Oral Tue-O-3-10-5 94 Discretized Continuous Speech Emotion Recognition with Multi-Task Deep Recurrent Neural Network Duc Le, Zakaria Aldeneh, Emily Mower Provost
2017-08-22 11:40-12:00 E10 Emotion Recognition Oral Tue-O-3-10-6 736 Towards Speech Emotion Recognition “in the wild” using Aggregated Corpora and Deep Multi-Task Learning Jaebok Kim, Gwenn Englebienne, Khiet Truong, Vanessa Evers
2017-08-22 16:00-18:00 Poster 1 L1 and L2 Acquisition Poster Tue-P-5-1-1 379 An Automatically Aligned Corpus of Child-directed Speech Micha Elsner, Kiwako Ito
2017-08-22 16:00-18:00 Poster 1 L1 and L2 Acquisition Poster Tue-P-5-1-2 9 A comparison of Danish listeners’ processing cost in judging the truth value of Norwegian, Swedish, and English sentences Ocke-Schwen Bohn, Trine Askjær-Jørgensen
2017-08-22 16:00-18:00 Poster 1 L1 and L2 Acquisition Poster Tue-P-5-1-3 1065 The acquisition of focal lengthening in Stockholm Swedish Anna Sara Hexeberg Romøren, Aoju Chen
2017-08-22 16:00-18:00 Poster 1 L1 and L2 Acquisition Poster Tue-P-5-1-4 1282 On the role of temporal variability in the acquisition of the German vowel length contrast Felicitas Kleber
2017-08-22 16:00-18:00 Poster 1 L1 and L2 Acquisition Poster Tue-P-5-1-5 1607 A data-driven approach for perceptually validated acoustic features for children’s sibilant fricative productions Patrick Reidy, Mary Beckman, Jan Edwards, Benjamin Munson
2017-08-22 16:00-18:00 Poster 1 L1 and L2 Acquisition Poster Tue-P-5-1-6 64 Quality Assessment of ESL Learner’s Sentence Prosody with TTS Synthesized Voice as Reference Yujia Xiao, Frank Soong
2017-08-22 16:00-18:00 Poster 1 L1 and L2 Acquisition Poster Tue-P-5-1-7 143 Mechanisms of Tone Sandhi Rule Application by Non-native Speakers Si Chen, YUNJUAN HE, Chun Wah Yuen, Bei Li, Yike Yang
2017-08-22 16:00-18:00 Poster 1 L1 and L2 Acquisition Poster Tue-P-5-1-8 289 Changes in early L2 cue-weighting of non-native speech: Evidence from learners of Mandarin Chinese Seth Wiener
2017-08-22 16:00-18:00 Poster 1 L1 and L2 Acquisition Poster Tue-P-5-1-9 1600 Directing Attention during Perceptual Training: A Preliminary Study of Phonetic Learning in Southern Min by Mandarin Speakers Ying Chen, Eric Pederson
2017-08-22 16:00-18:00 Poster 1 L1 and L2 Acquisition Poster Tue-P-5-1-10 332 Prosody analysis of L2 English for naturalness evaluation through speech modification Dean Luo, Ruxin Luo, Lixin Wang
2017-08-22 16:00-18:00 Poster 1 L1 and L2 Acquisition Poster Tue-P-5-1-11 337 Measuring Encoding Efficiency in Swedish and English Language Learner Speech Production Gintare Grigonyte, Gerold Schneider
2017-08-22 16:00-18:00 Poster 1 L1 and L2 Acquisition Poster Tue-P-5-1-12 369 Lexical adaptation to a novel accent in German: A comparison between German, Swedish, and Finnish listeners Adriana Hanulikova, Jenny Ekström
2017-08-22 16:00-18:00 Poster 1 L1 and L2 Acquisition Poster Tue-P-5-1-13 743 Qualitative differences in L3 learners’ neurophysiological response to L1 versus L2 transfer Alejandra Keidel Fernández, Thomas Hörberg
2017-08-22 16:00-18:00 Poster 1 L1 and L2 Acquisition Poster Tue-P-5-1-14 1052 Articulation rate in Swedish child-directed speech increases as a function of the age of the child even when surprisal is controlled for Johan Sjons, Thomas Hörberg, Robert Östling, Johannes Bjerva
2017-08-22 16:00-18:00 Poster 1 L1 and L2 Acquisition Poster Tue-P-5-1-15 714 The relationship between the perception and production of non-native tones Kaile Zhang, Gang Peng
2017-08-22 16:00-18:00 Poster 1 L1 and L2 Acquisition Poster Tue-P-5-1-16 1110 MMN responses in adults after exposure to bimodal and unimodal frequency distributions of rotated speech Ellen Marklund, Elísabet Eir Cortes, Johan Sjons
2017-08-22 13:30-15:30 Poster 3 Dialog modelling Poster Tue-P-4-3-1 651 Online End-of-Turn Detection from Speech based on Stacked Time-Asynchronous Sequential Networks Ryo Masumura, Taichi Asami, Hirokazu Masataki, Ryo Ishii, Ryuichiro Higashinaka
2017-08-22 13:30-15:30 Poster 3 Dialog modelling Poster Tue-P-4-3-2 1176 Improving prediction of speech activity using multi-participant respiratory state Marcin Wlodarczak, Kornel Laskowski, Mattias Heldner, Kätlin Aare
2017-08-22 13:30-15:30 Poster 3 Dialog modelling Poster Tue-P-4-3-3 1495 Turn-Taking Offsets and Dialogue Context Peter Heeman, Rebecca Lunsford
2017-08-22 13:30-15:30 Poster 3 Dialog modelling Poster Tue-P-4-3-4 1593 Towards Deep End-of-Turn Prediction for Situated Spoken Dialogue Systems Angelika Maier, Julian Hough, David Schlangen
2017-08-22 13:30-15:30 Poster 3 Dialog modelling Poster Tue-P-4-3-5 837 End-of-Utterance Prediction by Prosodic Features and Phrase-Dependency Structure in Spontaneous Japanese Speech Yuichi Ishimoto, Takehiro Teraoka, Mika Enomoto
2017-08-22 13:30-15:30 Poster 3 Dialog modelling Poster Tue-P-4-3-6 965 A Turn-taking Estimation Model based on Joint Embedding of Lexical and Prosodic Contents Chaoran Liu, Carlos Ishi, Hiroshi Ishiguro
2017-08-22 13:30-15:30 Poster 3 Dialog modelling Poster Tue-P-4-3-7 457 Social Signal Detection in Spontaneous Dialogue Using Bidirectional LSTM-CTC Hirofumi Inaguma, Koji Inoue, Masato Mimura, Tatsuya Kawahara
2017-08-22 13:30-15:30 Poster 3 Dialog modelling Poster Tue-P-4-3-8 1568 Entrainment in Multi-Party Spoken Dialogues at Multiple Linguistic Levels Zahra Rahimi, Anish Kumar, Diane Litman, Susannah Paletz, Mingzhi Yu
2017-08-22 13:30-15:30 Poster 3 Dialog modelling Poster Tue-P-4-3-9 1604 Measuring Synchrony in Task-based Dialogues Justine Reverdy, Carl Vogel
2017-08-22 13:30-15:30 Poster 3 Dialog modelling Poster Tue-P-4-3-10 161 Sequence to Sequence Modeling for User Simulation in Dialog Systems Paul Crook, Alex Marin
2017-08-22 13:30-15:30 Poster 3 Dialog modelling Poster Tue-P-4-3-11 1213 Issues in Human and Automated Scoring of Fluency, Pronunciation and Intonation During Human–Machine Spoken Dialog Interactions Vikram Ramanarayanan, Patrick Lange, Keelan Evanini, Hillary Molloy, David Suendermann-Oeft
2017-08-22 13:30-15:30 Poster 3 Dialog modelling Poster Tue-P-4-3-12 725 Hierarchical LSTMs with Joint Learning for Estimating Customer Satisfaction from Contact Center Calls Atsushi Ando, Ryo Masumura, Hosana Kamiyama, Satoshi Kobashikawa, Yushi Aono
2017-08-22 13:30-15:30 Poster 3 Dialog modelling Poster Tue-P-4-3-13 1032 Domain-independent User Satisfaction Reward Estimation for Dialogue Policy Learning Stefan Ultes, Paweł Budzianowski, Iñigo Casanueva, Nikola Mrkšić, Lina M. Rojas Barahona, Pei-Hao Su, Tsung-Hsien Wen, Milica Gasic, Steve Young
2017-08-22 13:30-15:30 Poster 3 Dialog modelling Poster Tue-P-4-3-14 1006 Analysis of the Relationship between Prosodic Features of Fillers and Its Forms or Occurrence Positions Shizuka Nakamura, Ryosuke Nakanishi, Katsuya Takanashi, Tatsuya Kawahara
2017-08-22 16:00-16:20 D8 Speech recognition for langauge learning Oral Tue-O-5-8-1 250 Bidirectional LSTM-RNN for Improving Automated Assessment of Non-native Children’s Speech Yao Qian, Keelan Evanini, Xinhao Wang, Chong Min Lee, Matthew Mulholland
2017-08-22 16:20-16:40 D8 Speech recognition for langauge learning Oral Tue-O-5-8-2 728 Automatic Scoring of Shadowing Speech based on DNN Posteriors and their DTW Junwei Yue, Fumiya Shiozawa, Shohei Toyama, Yutaka Yamauchi, Kayoko Ito, Daisuke Saito, Nobuaki Minematsu
2017-08-22 16:40-17:00 D8 Speech recognition for langauge learning Oral Tue-O-5-8-3 1174 Off-Topic Spoken Response Detection Using Siamese Convolutional Neural Networks Chong Min Lee, Su-Youn Yoon, Xinhao Wang, Matthew Mulholland, Ikkyu Choi, Keelan Evanini
2017-08-22 17:00-17:20 D8 Speech recognition for langauge learning Oral Tue-O-5-8-4 1350 Phonological Feature Based Mispronunciation Detection and Diagnosis using Multi-Task DNNs and Active Learning Vipul Arora, Aditi Lahiri, Henning Reetz
2017-08-22 17:20-17:40 D8 Speech recognition for langauge learning Oral Tue-O-5-8-5 1522 Detection of Mispronunciations and Disfluencies in Children Reading Aloud Jorge Proença, Carla Lopes, Michael Tjalve, Andreas Stolcke, Sara Candeias, Fernando Perdigão
2017-08-22 17:40-18:00 D8 Speech recognition for langauge learning Oral Tue-O-5-8-6 366 Automatic assessment of non-native prosody by measuring distances on prosodic label sequences David Escudero-Mancebo, César González-Ferreras, Eva Estebas-Vilaplana, Lourdes Aguilar
2017-08-22 10:00-12:00 F11 Special Session: Speech and Human-Robot Interaction Special Session Tue-SS-3-11-1 631 Motion analysis in vocalized surprise expressions Carlos Ishi, Takashi Minato, Hiroshi Ishiguro
2017-08-22 10:00-12:00 F11 Special Session: Speech and Human-Robot Interaction Special Session Tue-SS-3-11-2 730 Automatic Classification of Autistic Child Vocalisations: A Novel Database and Results Alice Baird, Shahin Amiriparian, Nicholas Cummins, Alyssa M. Alcorn, Anton Batliner, Sergey Pugachevskiy, Michael Freitag, Mauric Gerczuk, Björn Schuller
2017-08-22 10:00-12:00 F11 Special Session: Speech and Human-Robot Interaction Special Session Tue-SS-3-11-3 926 Crowd-Sourced Design of Artificial Attentive Listeners Catharine Oertel, Patrik Jonell, Dimosthenis Kontogiorgos, Joseph Mendelson, Jonas Beskow, Joakim Gustafson
2017-08-22 10:00-12:00 F11 Special Session: Speech and Human-Robot Interaction Special Session Tue-SS-3-11-4 1223 Elicitation Design for Acoustic Depression Classification: An Investigation of Articulation Effort, Linguistic Complexity, and Word Affect Brian Stasak, Julien Epps, Roland Goecke
2017-08-22 10:00-12:00 F11 Special Session: Speech and Human-Robot Interaction Special Session Tue-SS-3-11-5 1308 Robustness over time-varying channels in DNN-HMM ASR based human-robot interaction Jose Novoa, Jorge Wuth, Juan Pablo Escudero, Josue Fredes, Rodrigo Mahu, Richard Stern, Nestor Becerra Yoma
2017-08-22 10:00-12:00 F11 Special Session: Speech and Human-Robot Interaction Special Session Tue-SS-3-11-6 1395 Analysis of Engagement and User Experience with a Laughter Responsive Social Robot Bekir Berker Türker, Zana Buçinca, Engin Erzin, Yücel Yemez, Metin Sezgin
2017-08-22 10:00-12:00 F11 Special Session: Speech and Human-Robot Interaction Special Session Tue-SS-3-11-7 1431 Studying the link between inter-speaker coordination and speech imitation through human-machine interactions Leonardo Lancia, Thierry Chaminade, Noël Nguyen, Laurent Prévot
2017-08-22 10:00-12:00 F11 Special Session: Speech and Human-Robot Interaction Special Session Tue-SS-3-11-8 1606 Enhancing Backchannel Prediction Using Word Embeddings Robin Rüde, Markus Müller, Sebastian Stüker, Alex Waibel
2017-08-22 10:00-12:00 Poster 1 Short Utterances Speaker Recognition Poster Tue-P-3-1-1 137 A Generative Model for Score Normalization in Speaker Recognition Albert Swart, Niko Brummer
2017-08-22 10:00-12:00 Poster 1 Short Utterances Speaker Recognition Poster Tue-P-3-1-2 1419 Content Normalization for Text-dependent Speaker Verification Subhadeep Dey, Srikanth Madikeri, Petr Motlicek, Marc Ferras
2017-08-22 10:00-12:00 Poster 1 Short Utterances Speaker Recognition Poster Tue-P-3-1-3 1608 End-to-End Text-Independent Speaker Verification with Triplet Loss on Short Utterances Chunlei Zhang, Kazuhito Koishida
2017-08-22 10:00-12:00 Poster 1 Short Utterances Speaker Recognition Poster Tue-P-3-1-4 883 Adversarial Network Bottleneck Features for Noise Robust Speaker Verification Hong Yu, Zheng-Hua Tan, Zhanyu Ma, Jun Guo
2017-08-22 10:00-12:00 Poster 1 Short Utterances Speaker Recognition Poster Tue-P-3-1-5 1125 What Does the Speaker Embedding Encode? Shuai Wang, Yanmin Qian, Kai Yu
2017-08-22 10:00-12:00 Poster 1 Short Utterances Speaker Recognition Poster Tue-P-3-1-6 266 Incorporating Local Acoustic Variability Information into Short Duration Speaker Verification Jianbo Ma, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Kong Aik Lee
2017-08-22 10:00-12:00 Poster 1 Short Utterances Speaker Recognition Poster Tue-P-3-1-7 1036 DNN i-vector Speaker Verification with Short, Text-constrained Test Utterances Jinghua Zhong, Wenping Hu, Frank Soong, Helen Meng
2017-08-22 10:00-12:00 Poster 1 Short Utterances Speaker Recognition Poster Tue-P-3-1-8 734 Time-Varying Autoregressions for Speaker Verification in Reverberant Conditions Ville Vestman, Dhananjaya Gowda, Md Sahidullah, Paavo Alku, Tomi Kinnunen
2017-08-22 10:00-12:00 Poster 1 Short Utterances Speaker Recognition Poster Tue-P-3-1-9 1575 Deep Speaker Embeddings for Short-Duration Speaker Verification Gautam Bhattacharya, Md Jahangir Alam, Patrick Kenny
2017-08-22 10:00-12:00 Poster 1 Short Utterances Speaker Recognition Poster Tue-P-3-1-10 157 Using Voice Quality Features to Improve Short-Utterance, Text-Independent Speaker Verification Systems Soo Jin Park, Gary Yeung, Jody Kreiman, Patricia Keating, Abeer Alwan
2017-08-22 10:00-12:00 Poster 1 Short Utterances Speaker Recognition Poster Tue-P-3-1-11 108 Gain Compensation for Fast I-Vector Extraction over Short Duration Kong Aik Lee, Haizhou Li
2017-08-22 10:00-12:00 Poster 1 Short Utterances Speaker Recognition Poster Tue-P-3-1-12 1050 Joint Training of Expanded End-to-end DNN for Text-dependent Speaker Verification Hee-Soo Heo, Jee-Weon Jung, IL-Ho Yang, Sung-Hyun Yoon, Ha-Jin Yu
2017-08-22 10:00-10:20 D8 Speech Synthesis: Prosody Oral Tue-O-3-8-1 246 An RNN-based Quantized F0 Model with Multi-tier Feedback Links for Text-to-Speech Synthesis Xin Wang, Shinji Takaki, Junichi Yamagishi
2017-08-22 10:20-10:40 D8 Speech Synthesis: Prosody Oral Tue-O-3-8-2 419 Phrase break prediction for long-form reading TTS: exploiting the text structure information Viacheslav Klimkov, Adam Nadolski, Alexis Moinet, Bartosz Putrycz, Roberto Barra-Chicote, Thomas Merritt, Thomas Drugman
2017-08-22 10:40-11:00 D8 Speech Synthesis: Prosody Oral Tue-O-3-8-3 688 Physically constrained statistical F0 prediction for electrolaryngeal speech enhancement Kou Tanaka, Hirokazu Kameoka, Tomoki Toda, Satoshi Nakamura
2017-08-22 11:00-11:20 D8 Speech Synthesis: Prosody Oral Tue-O-3-8-4 719 DNN-SPACE: DNN-HMM-based Generative Model of Voice $F_0$ Contours for Statistical Phrase/Accent Command Estimation Nobukatsu Hojo, Ohsugi Yasuhito, Yusuke Ijima, Hirokazu Kameoka
2017-08-22 11:20-11:40 D8 Speech Synthesis: Prosody Oral Tue-O-3-8-5 1355 Controlling prominence realisation in parametric DNN-based speech synthesis. Zofia Malisz, Harald Berthelsen, Jonas Beskow, Joakim Gustafson
2017-08-22 11:40-12:00 D8 Speech Synthesis: Prosody Oral Tue-O-3-8-6 1528 Increasing Recall of Lengthening Detection via Semi-Automatic Classification Simon Betz, Jana Voße, Sina Zarrieß, Petra Wagner
2017-08-22 16:00-18:00 Poster 3 Source separation and voice activity detection Poster Tue-P-5-3-1 40 Audio Content based Geotagging in Multimedia Anurag Kumar, Benjamin Elizalde, Bhiksha Raj
2017-08-22 16:00-18:00 Poster 3 Source separation and voice activity detection Poster Tue-P-5-3-2 55 Time Delay Histogram Based Speech Source Separation Using a Planar Array Zhaoqiong Huang, Zhanzhong Cao, Dongwen Ying, Jielin Pan, Yonghong Yan
2017-08-22 16:00-18:00 Poster 3 Source separation and voice activity detection Poster Tue-P-5-3-3 135 Excitation Source Features for Improving the Detection of Vowel Onset and Offset Points in a Speech Sequence Gayadhar Pradhan, Avinash Kumar, Syed Shahnawazuddin
2017-08-22 16:00-18:00 Poster 3 Source separation and voice activity detection Poster Tue-P-5-3-4 189 A Contrast Function and Algorithm for Blind Separation of Audio Signals Wei Gao, Roberto Togneri, Victor Sreeram
2017-08-22 16:00-18:00 Poster 3 Source separation and voice activity detection Poster Tue-P-5-3-5 199 Weighted Spatial Covariance Matrix Estimation for MUSIC based TDOA Estimation of Speech Source Chenglin Xu, Xiong Xiao, Sining Sun, Wei Rao, Eng Siong Chng, Haizhou Li
2017-08-22 16:00-18:00 Poster 3 Source separation and voice activity detection Poster Tue-P-5-3-6 229 Speaker Direction-of-Arrival Estimation Based On Frequency-Independent Beampattern Feng Guo, Yuhang Cao, Zheng Liu, Jiaen Liang, Baoqing Li, Xiaobing Yuan
2017-08-22 16:00-18:00 Poster 3 Source separation and voice activity detection Poster Tue-P-5-3-7 271 A Mask Estimation Method Integrating Data Field Model for Speech Enhancement Xianyun Wang, Changchun Bao, Feng Bao
2017-08-22 16:00-18:00 Poster 3 Source separation and voice activity detection Poster Tue-P-5-3-8 496 Improved end-of-query detection for streaming speech recognition Matt Shannon, Gabor Simko, Shuo-Yiin Chang, Carolina Parada
2017-08-22 16:00-18:00 Poster 3 Source separation and voice activity detection Poster Tue-P-5-3-9 593 Using Approximated Auditory Roughness as a Pre-filtering Feature for Human Screaming and Affective Speech AED Di He, Zuofu Cheng, Mark Hasegawa-Johnson, Deming Chen
2017-08-22 16:00-18:00 Poster 3 Source separation and voice activity detection Poster Tue-P-5-3-10 754 Improving Source Separation via Multi-Speaker Representations Jeroen Zegers, Hugo Van hamme
2017-08-22 16:00-18:00 Poster 3 Source separation and voice activity detection Poster Tue-P-5-3-11 940 Multiple Sound Source Counting and Localization Based on Spatial Principal Eigenvector Bing Yang, Hong Liu, Cheng Pang
2017-08-22 16:00-18:00 Poster 3 Source separation and voice activity detection Poster Tue-P-5-3-12 954 Subband selection for binaural speech source localization Karthik Girija Ramesan, Prasanta Ghosh
2017-08-22 16:00-18:00 Poster 3 Source separation and voice activity detection Poster Tue-P-5-3-13 1227 Unmixing Convolutive Mixtures by Exploiting Amplitude Co-modulation: Methods and Evaluation on Mandarin Speech Recordings Bo-Rui Chen, Huang-Yi Lee, Yi-Wen Liu
2017-08-22 16:00-18:00 Poster 3 Source separation and voice activity detection Poster Tue-P-5-3-14 1573 Bimodal Recurrent Neural Network for Audiovisual Voice Activity Detection Fei Tao, Carlos Busso
2017-08-22 16:00-18:00 Poster 3 Source separation and voice activity detection Poster Tue-P-5-3-15 1673 Domain-Specific Utterance End-Point Detection for Speech Recognition Roland Maas, Ariya Rastrow, Kyle Goehner, Gautam Tiwari, Shaun Joseph, Bjorn Hoffmeister
2017-08-22 16:00-18:00 Poster 3 Source separation and voice activity detection Poster Tue-P-5-3-16 1760 Speech detection and enhancement using single microphone for distant speech applications in reverberant environments Vinay Kothapally, John H.L. Hansen
2017-08-22 13:30-15:30 Poster 1 Acoustic Models for ASR I Poster Tue-P-4-1-1 129 An exploration of dropout with LSTMs Gaofeng Cheng, Vijayaditya Peddinti, Dan Povey, Vimal Manohar, Sanjeev Khudanpur, Yonghong Yan
2017-08-22 13:30-15:30 Poster 1 Acoustic Models for ASR I Poster Tue-P-4-1-2 477 Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition Jaeyoung Kim, Mostafa El-Khamy, Jungwon Lee
2017-08-22 13:30-15:30 Poster 1 Acoustic Models for ASR I Poster Tue-P-4-1-3 873 UNFOLDED DEEP RECURRENT CONVOLUTIONAL NEURAL NETWORK WITH JUMP AHEAD CONNECTIONS FOR ACOUSTIC MODELING Tien Dung Tran, Marc Delcroix, Shigeki Karita, Michael Hentschel, Atsunori Ogawa, Tomohiro Nakatani
2017-08-22 13:30-15:30 Poster 1 Acoustic Models for ASR I Poster Tue-P-4-1-4 554 Forward-backward Convolutional LSTM for Acoustic Modeling Shigeki Karita, Atsunori Ogawa, Marc Delcroix, Tomohiro Nakatani
2017-08-22 13:30-15:30 Poster 1 Acoustic Models for ASR I Poster Tue-P-4-1-5 1737 Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting Sercan Arik, Markus Kliegl, Rewon Child, Joel Hestness, Andrew Gibiansky, Chris Fougner, Ryan Prenger, Adam Coates
2017-08-22 13:30-15:30 Poster 1 Acoustic Models for ASR I Poster Tue-P-4-1-6 1233 Deep Activation Mixture Model for Speech Recognition Chunyang Wu, Mark Gales
2017-08-22 13:30-15:30 Poster 1 Acoustic Models for ASR I Poster Tue-P-4-1-7 920 Ensembles of Multi-scale VGG Acoustic Models Michael Heck, Masayuki Suzuki, Takashi Fukuda, Gakuto Kurata, Satoshi Nakamura
2017-08-22 13:30-15:30 Poster 1 Acoustic Models for ASR I Poster Tue-P-4-1-8 338 Training Context-Dependent DNN Acoustic Models using Probabilistic Sampling Tamás Grósz, Gábor Gosztolya, László Tóth
2017-08-22 13:30-15:30 Poster 1 Acoustic Models for ASR I Poster Tue-P-4-1-9 899 A Comparative Evaluation of GMM-Free State Tying Methods for ASR Tamás Grósz, Gábor Gosztolya, László Tóth
2017-08-22 16:00-18:00 Poster 2 Voice, Speech and Hearing Disorders Poster Tue-P-5-2-1 25 Float Like a Butterfly Sting Like a Bee: Changes in Speech Preceded Parkinsonism Diagnosis for Muhammad Ali Visar Berisha, Julie Liss, Timothy Huston, Alan Wisler, Yishan Jiao, Jonathan Eig
2017-08-22 16:00-18:00 Poster 2 Voice, Speech and Hearing Disorders Poster Tue-P-5-2-2 335 Cepstral and entropy analyses in vowels excerpted from continuous speech of dysphonic and control speakers Antonella Castellana, Andreas Selamtzis, Giampiero Salvi, Alessio Carullo, Arianna Astolfi
2017-08-22 16:00-18:00 Poster 2 Voice, Speech and Hearing Disorders Poster Tue-P-5-2-3 478 Classification of bulbar ALS from kinematic features of the jaw and lips: Towards computer-mediated assessment Andrea Bandini, Jordan Green, Lorne Zinman, Yana Yunusova
2017-08-22 16:00-18:00 Poster 2 Voice, Speech and Hearing Disorders Poster Tue-P-5-2-4 589 Zero Frequency Filter Based Analysis of Voice Disorders Nagaraj Adiga, Vikram C M, Keerthi Pullela, S R Mahadeva Prasanna
2017-08-22 16:00-18:00 Poster 2 Voice, Speech and Hearing Disorders Poster Tue-P-5-2-5 1245 Hypernasality Severity Analysis in Cleft Lip and Palate Speech Using Vowel Space Area. Nikitha K, Sishir Kalita, CM Vikram, M. Pushpavathi, S R Mahadeva Prasanna
2017-08-22 16:00-18:00 Poster 2 Voice, Speech and Hearing Disorders Poster Tue-P-5-2-6 1363 Automatic Prediction of Speech Evaluation Metrics for Dysarthric Speech Imed Laaridh, Waad Ben Kheder, Corinne Fredouille, Christine Meunier
2017-08-22 16:00-18:00 Poster 2 Voice, Speech and Hearing Disorders Poster Tue-P-5-2-7 416 Apkinson – A mobile monitoring solution for Parkinson’s disease Philipp Klumpp, Thomas Janu, Tomás Arias-Vergara, Juan Camilo Vásquez Correa, Juan Rafael Orozco-Arroyave, Elmar Noeth
2017-08-22 16:00-18:00 Poster 2 Voice, Speech and Hearing Disorders Poster Tue-P-5-2-8 762 Dysprosody differentiate between Parkinson’s disease, progressive supranuclear palsy, and multiple system atrophy Jan Hlavnička, Tereza Tykalová, Roman Čmejla, Jiří Klempíř, Evžen Růžička, Jan Rusz
2017-08-22 16:00-18:00 Poster 2 Voice, Speech and Hearing Disorders Poster Tue-P-5-2-9 1222 Interpretable Objective Assessment of Dysarthric Speech based on Deep Neural Networks Ming Tu, Visar Berisha, Julie Liss
2017-08-22 16:00-18:00 Poster 2 Voice, Speech and Hearing Disorders Poster Tue-P-5-2-10 1318 Deep Autoencoder based Speech Features for Improved Dysarthric Speech Recognition Bhavik Vachhani, Chitralekha Bhat, Biswajit Das, Sunil Kumar Kopparapu
2017-08-22 16:00-18:00 Poster 2 Voice, Speech and Hearing Disorders Poster Tue-P-5-2-11 1740 Prediction of Speech Delay from Acoustic Measurements Jason Lilley, Madhavi Ratnagiri, H Timothy Bunnell
2017-08-22 16:00-18:00 Poster 2 Voice, Speech and Hearing Disorders Poster Tue-P-5-2-12 329 The Frequency Range of “The Ling Six Sounds” in Standard Chinese Aijun Li, Hua Zhang, Wen Sun
2017-08-22 16:00-18:00 Poster 2 Voice, Speech and Hearing Disorders Poster Tue-P-5-2-13 1698 Production of sustained vowels and categorical perception of tones in Mandarin among cochlear-implanted children Wentao Gu, Jiao Yin, James Mahshie
2017-08-22 13:30-15:30 F11 Special Session: Incremental Processing and Responsive Behaviour Special Session Tue-SS-4-11-1 396 Adjusting the Frame: Biphasic Performative Control of Speech Rhythm Samuel Delalez, Christophe d’Alessandro
2017-08-22 13:30-15:30 F11 Special Session: Incremental Processing and Responsive Behaviour Special Session Tue-SS-4-11-2 738 Incremental Dialogue Act Recognition: token- vs chunk-based classification Eustace Ebhotemhen, Volha Petukhova, Dietrich Klakow
2017-08-22 13:30-15:30 F11 Special Session: Incremental Processing and Responsive Behaviour Special Session Tue-SS-4-11-3 1042 A Computational Model for Phonetically Responsive Spoken Dialogue Systems Eran Raveh, Ingmar Steiner, Bernd Möbius
2017-08-22 13:30-15:30 F11 Special Session: Incremental Processing and Responsive Behaviour Special Session Tue-SS-4-11-4 1676 Attentional factors in listeners’ uptake of gesture cues during speech processing Raheleh Saryazdi, Craig Chambers
2017-08-22 10:00-10:20 Main hall Neural Network Acoustic Models for ASR I Oral Tue-O-3-1-1 233 A Comparison of Sequence-to-Sequence Models for Speech Recognition Rohit Prabhavalkar, Kanishka Rao, Tara Sainath, Bo Li, Leif Johnson, Navdeep Jaitly
2017-08-22 10:20-10:40 Main hall Neural Network Acoustic Models for ASR I Oral Tue-O-3-1-2 1073 CTC in the Context of Generalized Full-Sum HMM Training Albert Zeyer, Eugen Beck, Ralf Schlüter, Hermann Ney
2017-08-22 10:40-11:00 Main hall Neural Network Acoustic Models for ASR I Oral Tue-O-3-1-3 1296 Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM Takaaki Hori, Shinji Watanabe, Yu Zhang, William Chan
2017-08-22 11:00-11:20 Main hall Neural Network Acoustic Models for ASR I Oral Tue-O-3-1-4 71 Multitask Learning with CTC and Segmental CRF for Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith
2017-08-22 11:20-11:40 Main hall Neural Network Acoustic Models for ASR I Oral Tue-O-3-1-5 546 Direct Acoustics-to-Word Models for English Conversational Speech Recognition Kartik Audhkhasi, Bhuvana Ramabhadran, George Saon, Michael Picheny, David Nahamoo
2017-08-22 11:40-12:00 Main hall Neural Network Acoustic Models for ASR I Oral Tue-O-3-1-6 1164 Reducing the Computational Complexity of Two-Dimensional LSTMs Bo Li, Tara Sainath
2017-08-22 13:30-13:50 D8 Emotion Modeling Oral Tue-O-4-8-1 619 Speech Emotion Recognition with Emotion-Pair based Framework Considering Emotion Distribution Information in Dimensional Emotion Space Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng, Lianhong Cai
2017-08-22 13:50-14:10 D8 Emotion Modeling Oral Tue-O-4-8-2 1421 Adversarial Auto-encoders for Speech Based Emotion Recognition Saurabh Sahu, Rahul Gupta, Ganesh Sivaraman, Wael Abdalmageed, Carol Espy-Wilson
2017-08-22 14:10-14:30 D8 Emotion Modeling Oral Tue-O-4-8-3 512 An Investigation of Emotion Prediction Uncertainty Using Gaussian Mixture Regression Ting Dang, Vidhyasaharan Sethu, Julien Epps, Eliathamby Ambikairajah
2017-08-22 14:30-14:50 D8 Emotion Modeling Oral Tue-O-4-8-4 548 Capturing Long-term Temporal Dependencies with Convolutional Networks for Continuous Emotion Recognition Soheil Khorram, Zakaria Aldeneh, Dimitrios Dimitriadis, Melvin McInnis, Emily Mower Provost
2017-08-22 14:50-15:10 D8 Emotion Modeling Oral Tue-O-4-8-5 1181 Voice-to-affect mapping: inferences on language voice baseline settings Ailbhe Ní Chasaide, Irena Yanushevskaya, Christer Gobl
2017-08-22 15:10-15:30 D8 Emotion Modeling Oral Tue-O-4-8-6 917 Attentive Convolutional Neural Network based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech Michael Neumann, Ngoc Thang Vu
2017-08-22 13:30-13:50 B4 Source separation and auditory scene analysis Oral Tue-O-4-4-1 830 A Maximum Likelihood Approach to Deep Neural Network Based Nonlinear Spectral Mapping for Single-Channel Speech Separation Yannan Wang, Jun Du, Lirong Dai, Chin-Hui Lee
2017-08-22 13:50-14:10 B4 Source separation and auditory scene analysis Oral Tue-O-4-4-2 721 Deep clustering-based beamforming for separation with unknown number of sources Takuya Higuchi, Keisuke Kinoshita, Marc Delcroix, Katerina Zmolikova, Tomohiro Nakatani
2017-08-22 14:10-14:30 B4 Source separation and auditory scene analysis Oral Tue-O-4-4-3 66 Time-frequency masking for blind source separation with preserved spatial cues Shadi Pirhosseinloo, Kostas Kokkinakis
2017-08-22 14:30-14:50 B4 Source separation and auditory scene analysis Oral Tue-O-4-4-4 832 Variational Recurrent Neural Networks for Speech Separation Jen-Tzung Chien, Kuan-Ting Kuo
2017-08-22 14:50-15:10 B4 Source separation and auditory scene analysis Oral Tue-O-4-4-5 188 Detecting overlapped speech on short timeframes using deep learning Valentin Andrei, Horia Cucu, Corneliu Burileanu
2017-08-22 15:10-15:30 B4 Source separation and auditory scene analysis Oral Tue-O-4-4-6 549 Ideal ratio mask estimation using deep neural networks for monaural speech segregation in noisy reverberant conditions Xu Li, Junfeng Li, Yonghong Yan
2017-08-22 10:00-10:20 A2 Models of Speech Production Oral Tue-O-3-2-1 181 Functional principal component analysis of vocal tract area functions Jorge Lucero
2017-08-22 10:20-10:40 A2 Models of Speech Production Oral Tue-O-3-2-2 260 Analysis of acoustic-to-articulatory speech inversion across different accents and languages Ganesh Sivaraman, Carol Espy-Wilson, Martijn Wieling
2017-08-22 10:40-11:00 A2 Models of Speech Production Oral Tue-O-3-2-3 617 Integrated mechanical model of [r]-[l] and [b]-[m]-[w] producing consonant cluster [br] Takayuki Arai
2017-08-22 11:00-11:20 A2 Models of Speech Production Oral Tue-O-3-2-4 804 A Speaker Adaptive DNN Training Approach for Speaker-independent Acoustic Inversion Leonardo Badino, Luca Franceschi, Raman Arora, Michele Donini, Massimiliano Pontil
2017-08-22 11:20-11:40 A2 Models of Speech Production Oral Tue-O-3-2-5 1010 Acoustic-to-articulatory mapping based on mixture of probabilistic canonical correlation analysis Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu
2017-08-22 11:40-12:00 A2 Models of Speech Production Oral Tue-O-3-2-6 1488 Test-retest repeatability of articulatory strategies using real-time magnetic resonance imaging Tanner Sorensen, Asterios Toutios, Johannes Toger, Louis Goldstein, Shrikanth Narayanan
2017-08-22 13:30-15:30 E306 Show & Tell 3 Show&Tell Tue-S&T-4-A-1 10017 Applications of the BBN Sage Speech Processing Platform Ralf Meermeier, Sean Colbath
2017-08-22 13:30-15:30 E306 Show & Tell 3 Show&Tell Tue-S&T-4-A-2 10025 Bob Speaks Kaldi Milos Cernak, Alain Komaty, Amir Mohammadi, Andre Anjos, Sebastien Marcel
2017-08-22 13:30-15:30 E306 Show & Tell 3 Show&Tell Tue-S&T-4-A-3 10028 Real time pitch shifting with formant structure preservation using the phase vocoder Michał Lenarczyk
2017-08-22 13:30-15:30 E306 Show & Tell 3 Show&Tell Tue-S&T-4-A-4 10043 A Signal Processing Approach for Speaker Separation using SFF Analysis Nivedita Chennupati, Narayana Murthy BHVS, Bayya Yegnanarayana
2017-08-22 13:30-15:30 E306 Show & Tell 3 Show&Tell Tue-S&T-4-A-5 10056 Speech Recognition and Understanding on Hardware-Accelerated DSP Georg Stemmer, Munir Georges, Joachim Hofer, Piotr Rozen, Josef Bauer, Jakub Nowicki, Tobias Bocklet, Hannah Colett, Ohad Falik, Michael Deisher, Sylvia Downing
2017-08-22 16:00-16:20 A2 Speaker Recognition Evaluation Oral Tue-O-5-2-1 203 The I4U Mega Fusion and Collaboration for NIST Speaker Recognition Evaluation 2016 Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Anthony Larcher, Chunlei Zhang, Andreas Nautsch, Themos Stafylakis, Gang Liu, Mickael Rouvier, Wei Rao, Federico Alegre, Jianbo Ma, Manwai Mak, Achintya Sarkar, Héctor Delgado, Rahim Saeidi, Hagai Aronowitz, Aleksandr Sizov, Hanwu Sun, Guangsen Wang, Trung Hieu Nguyen, Bin Ma, Ville Vestman, Md Sahidullah, Miikka Halonen, Anssi Kanervisto, Gael Le Lan, Fahimeh Bahmaninezhad, Sergey Isadskiy, Christian Rathgeb, Christoph Busch, Georgios Tzimiropoulos, Qi Qian, Zhibin Wang, Qingen Zhao, Tianzhou Wang, Hao Li, Jian Xue, Shenghuo Zhu, Rong Jin, Tuo Zhao, Pierre-Michel Bousquet, Moez Ajili, Waad Ben Kheder, Driss Matrouf, Zhi Hao Lim, Chenglin Xu, Haihua Xu, Xiong Xiao, Eng Siong Chng, Benoit Fauve, Vidhyasaharan Sethu, Kaavya Sriskandaraja, W. W. Lin, Zheng-Hua Tan, Dennis Alexander Lehmann Thomsen, Massimiliano Todisco, Nicholas Evans, Haizhou Li, John H.L. Hansen, Jean-Francois Bonastre
2017-08-22 16:20-16:40 A2 Speaker Recognition Evaluation Oral Tue-O-5-2-2 537 The MIT-LL, JHU and LRDE NIST 2016 Speaker Recognition Evaluation System Pedro Torres-Carrasquillo, Fred Richardson, Shahan Nercessian, Douglas Sturim, William Campbell, Youngjune Gwon, Swaroop Vattam, Najim Dehak, Harish Mallidi, Phani Sankar Nidadavolu, Ruizhi Li, Reda Dehak
2017-08-22 16:40-17:00 A2 Speaker Recognition Evaluation Oral Tue-O-5-2-3 797 Nuance – Politecnico di Torino’s 2016 NIST Speaker Recognition Evaluation System Daniele Colibro, Claudio Vair, Emanuele Dalmasso, Kevin Farrell, Gennady Karvitsky, Sandro Cumani, Pietro Laface
2017-08-22 17:00-17:20 A2 Speaker Recognition Evaluation Oral Tue-O-5-2-4 555 UTD-CRSS Systems for 2016 NIST Speaker Recognition Evaluation Chunlei Zhang, Fahimeh Bahmaninezhad, Shivesh Ranjan, Chengzhu Yu, Navid Shokouhi, John H.L. Hansen
2017-08-22 17:20-17:40 A2 Speaker Recognition Evaluation Oral Tue-O-5-2-5 1498 Analysis and Description of ABC Submission to NIST SRE 2016 Oldrich Plchot, Pavel Matejka, Anna Silnova, Ondřej Novotný, Mireia Diez, Johan Rohdin, Ondrej Glembek, Niko Brummer, Albert Swart, Jesús Jorrín, Leibny Paola Garcia Perera, Luis Buera, Patrick Kenny, Md Jahangir Alam, Gautam Bhattacharya
2017-08-22 17:40-18:00 A2 Speaker Recognition Evaluation Oral Tue-O-5-2-6 458 The 2016 NIST Speaker Recognition Evaluation Seyed Omid Sadjadi, Timothee Kheyrkhah, Audrey Tong, Craig Greenberg, Douglas Reynolds, Elliot Singer, Lisa Mason, Jaime Hernandez-Cordero
2017-08-23 16:00-16:20 E10 Language models for ASR Oral Wed-O-8-10-1 1203 Effectively Building Tera Scale MaxEnt Language Models Incorporating Non-Linguistic Signals Fadi Biadsy, Mohammadreza Ghodsi, Diamantino Caseiro
2017-08-23 16:20-16:40 E10 Language models for ASR Oral Wed-O-8-10-2 1598 Semi-supervised Adaptation of RNNLMs by Fine-tuning with Domain-specific Auxiliary Features Salil Deena, Raymond W. M. Ng, Pranava Madhyastha, Lucia Specia, Thomas Hain
2017-08-23 16:40-17:00 E10 Language models for ASR Oral Wed-O-8-10-3 147 Approximated and domain-adapted LSTM language models for first-pass decoding in speech recognition Mittul Singh, Youssef Oualil, Dietrich Klakow
2017-08-23 17:00-17:20 E10 Language models for ASR Oral Wed-O-8-10-4 493 Sparse Non-negative Matrix Language Modeling: Maximum Entropy Flexibility on the Cheap Ciprian Chelba, Diamantino Caseiro, Fadi Biadsy
2017-08-23 17:20-17:40 E10 Language models for ASR Oral Wed-O-8-10-5 426 Multi-scale Context Adaptation for Improving Child Automatic Speech Recognition in Child-Adult Spoken Interactions Manoj Kumar, Daniel Bone, Kelly McWilliams, Shanna Williams, Thomas Lyon, Shrikanth Narayanan
2017-08-23 17:40-18:00 E10 Language models for ASR Oral Wed-O-8-10-6 1790 Using Knowledge Graph And Search Query Click Logs in Statistical Language Model For Speech Recognition Weiwu Zhu
2017-08-23 16:00-18:00 Poster 2 Articulatory & Acoustic Phonetics Poster Wed-P-8-2-1 1720 Mental Representation of Japanese Mora: focusing on intrinsic duration Kosuke Sugai
2017-08-23 16:00-18:00 Poster 2 Articulatory & Acoustic Phonetics Poster Wed-P-8-2-2 765 Temporal Dynamics of Lateral Channel Formation in /l/: 3D EMA Data from Australian English Jia Ying, Christopher Carignan, Jason Shaw, Michael Proctor, Donald Derrick, Catherine Best
2017-08-23 16:00-18:00 Poster 2 Articulatory & Acoustic Phonetics Poster Wed-P-8-2-3 1154 Vowel and Consonant Sequences in three Bavarian varieties in Austria Nicola Klingler, Sylvia Moosmüller, Hannes Scheutz
2017-08-23 16:00-18:00 Poster 2 Articulatory & Acoustic Phonetics Poster Wed-P-8-2-4 1609 Acoustic cues to the singleton-geminate contrast: the case of Libyan Arabic sonorants Amel Issa
2017-08-23 16:00-18:00 Poster 2 Articulatory & Acoustic Phonetics Poster Wed-P-8-2-5 838 Mel-cepstral distortion of German vowels in different information density contexts Erika Brandt, Frank Zimmerer, Bistra Andreeva, Bernd Möbius
2017-08-23 16:00-18:00 Poster 2 Articulatory & Acoustic Phonetics Poster Wed-P-8-2-6 1161 Effect of formant and F0 discontinuity on perceived vowel duration: Impacts for concatenative speech synthesis Tomáš Bořil, Pavel Šturm, Radek Skarnitzl, Jan Volín
2017-08-23 16:00-18:00 Poster 2 Articulatory & Acoustic Phonetics Poster Wed-P-8-2-7 578 An ultrasound study of alveolar and retroflex consonants in Arrernte: stressed and unstressed syllables Marija Tabain, Richard Beare
2017-08-23 16:00-18:00 Poster 2 Articulatory & Acoustic Phonetics Poster Wed-P-8-2-8 1140 Reshaping the transformed LF model: generating the glottal source from the waveshape parameter Rd Christer Gobl
2017-08-23 16:00-18:00 Poster 2 Articulatory & Acoustic Phonetics Poster Wed-P-8-2-9 722 Kinematic signatures of prosody in Lombard speech Štefan Beňuš, Juraj Šimko, Mona Lehtinen
2017-08-23 16:00-18:00 Poster 2 Articulatory & Acoustic Phonetics Poster Wed-P-8-2-10 1285 What do Finnish and Central Bavarian have in common? Towards an acoustically based quantity typology Markus Jochim, Felicitas Kleber
2017-08-23 16:00-18:00 Poster 2 Articulatory & Acoustic Phonetics Poster Wed-P-8-2-11 1027 Locating burst onsets using SFF envelope and phase information Bhanu Teja Nellore, RaviShankar Prasad, Sudarsana Reddy Kadiri, Suryakanth V Gangashetty, Bayya Yegnanarayana
2017-08-23 16:00-18:00 Poster 2 Articulatory & Acoustic Phonetics Poster Wed-P-8-2-12 876 A Preliminary Phonetic Investigation of Alphabetic Words in Mandarin Chinese Hongwei Ding, Yuanyuan Zhang, Hongchao Liu, Chu-Ren Huang
2017-08-23 16:00-18:00 Poster 2 Articulatory & Acoustic Phonetics Poster Wed-P-8-2-13 1306 A Quantitative Measure of the Impact of Coarticulation on Phone Discriminability Thomas Schatz, Rory Turnbull, Francis Bach, Emmanuel Dupoux
2017-08-23 13:30-13:50 D8 Lexical and Pronunciation modeling Oral Wed-O-7-8-1 1436 Multitask Sequence-to-Sequence Models for Grapheme-to-Phoneme Conversion Benjamin Milde, Christoph Schmidt, Joachim Köhler
2017-08-23 13:50-14:10 D8 Lexical and Pronunciation modeling Oral Wed-O-7-8-2 588 Acoustic data-driven lexicon learning based on a greedy pronunciation selection framework Xiaohui Zhang, Vimal Manohar, Dan Povey, Sanjeev Khudanpur
2017-08-23 14:10-14:30 D8 Lexical and Pronunciation modeling Oral Wed-O-7-8-3 1081 Semi-Supervised Learning of a Pronunciation Dictionary from Disjoint Phonemic Transcripts and Text Takahiro Shinozaki, Shinji Watanabe, Daichi Mochihashi, Graham Neubig
2017-08-23 14:30-14:50 D8 Lexical and Pronunciation modeling Oral Wed-O-7-8-4 103 Improved subword modeling for WFST-based speech recognition Peter Smit, Sami Virpioja, Mikko Kurimo
2017-08-23 14:50-15:10 D8 Lexical and Pronunciation modeling Oral Wed-O-7-8-5 47 Pronunciation learning with RNN-transducers Antoine Bruguier, Danushen Gnanapragasam, Leif Johnson, Kanishka Rao, Francoise Beaufays
2017-08-23 15:10-15:30 D8 Lexical and Pronunciation modeling Oral Wed-O-7-8-6 1117 Learning Similarity Functions for Pronunciation Variations Einat Naaman, Yossi Adi, Joseph Keshet
2017-08-23 10:00-10:20 C6 Dialog and Prosody Oral Wed-O-6-6-1 1159 Prosodic Event Recognition using Convolutional Neural Networks with Context Information Sabrina Stehwien, Ngoc Thang Vu
2017-08-23 10:20-10:40 C6 Dialog and Prosody Oral Wed-O-6-6-2 453 Prosodic Facilitation and Interference while Judging on the Veracity of Synthesized Statements Ramiro H. Galvez, Štefan Beňuš, Agustín Gravano, Marian Trnka
2017-08-23 10:40-11:00 C6 Dialog and Prosody Oral Wed-O-6-6-3 811 An investigation of pitch matching across adjacent turns in a corpus of spontaneous German Margaret Zellers, Antje Schweitzer
2017-08-23 11:00-11:20 C6 Dialog and Prosody Oral Wed-O-6-6-4 795 The Relationship between F0 Synchrony and Speech Convergence in Dyadic Interaction Sankar Mukherjee, Alessandro D’Ausilio, Noël Nguyen, Luciano Fadiga, Leonardo Badino
2017-08-23 11:20-11:40 C6 Dialog and Prosody Oral Wed-O-6-6-5 424 The role of linguistic and prosodic cues on the prediction of self-reported satisfaction in contact centre phone calls Jordi Luque, Ariadna Sánchez, Carlos Segura, Martí Umbert, Luis Ángel Galindo
2017-08-23 11:40-12:00 C6 Dialog and Prosody Oral Wed-O-6-6-6 124 Cross-linguistic study of the production of turn-taking cues in American English and Argentine Spanish Pablo Brusco, Agustin Gravano, Juan Manuel Pérez
2017-08-23 10:00-12:00 Poster 1 Speech Recognition: Technologies for new applicaitions and paradigms Poster Wed-P-6-1-1 166 Developing On-Line Speaker Diarization System Dimitrios Dimitriadis, Petr Fousek
2017-08-23 10:00-12:00 Poster 1 Speech Recognition: Technologies for new applicaitions and paradigms Poster Wed-P-6-1-2 339 Comparison of Non-parametric Bayesian Mixture Models for Syllable Clustering and Zero-Resource Speech Processing Shreyas Seshadri, Ulpu Remes, Okko Räsänen
2017-08-23 10:00-12:00 Poster 1 Speech Recognition: Technologies for new applicaitions and paradigms Poster Wed-P-6-1-3 1541 Automatic Evaluation of Children Reading Aloud on Sentences and Pseudowords Jorge Proença, Carla Lopes, Michael Tjalve, Andreas Stolcke, Sara Candeias, Fernando Perdigão
2017-08-23 10:00-12:00 Poster 1 Speech Recognition: Technologies for new applicaitions and paradigms Poster Wed-P-6-1-4 388 Off-topic Spoken Response Detection with Word Embeddings Su-Youn Yoon, Chong Min Lee, Ikkyu Choi, Xinhao Wang, Matthew Mulholland, Keelan Evanini
2017-08-23 10:00-12:00 Poster 1 Speech Recognition: Technologies for new applicaitions and paradigms Poster Wed-P-6-1-5 464 Improving Mispronunciation Detection for Non-Native Learners with Multisource Information and LSTM-Based Deep Models Wei Li, Nancy F Chen, Sabato Marco Siniscalchi, Chin-Hui Lee
2017-08-23 10:00-12:00 Poster 1 Speech Recognition: Technologies for new applicaitions and paradigms Poster Wed-P-6-1-6 750 Automatic Explanation Spot Estimation Method Targeted at Text and Figures in Lecture Slides Shoko Tsujimura, Kazumasa Yamamoto, Seiichi Nakagawa
2017-08-23 10:00-12:00 Poster 1 Speech Recognition: Technologies for new applicaitions and paradigms Poster Wed-P-6-1-7 952 Multiview Representation Learning via Deep CCA for Silent Speech Recognition Myungjong Kim, Beiming Cao, Ted Mau, Jun Wang
2017-08-23 10:00-12:00 Poster 1 Speech Recognition: Technologies for new applicaitions and paradigms Poster Wed-P-6-1-8 978 Use of Graphemic Lexicons for Spoken Language Assessment Kate Knill, Mark Gales, Kostas Kyriakopoulos, Anton Ragni, Yu Wang
2017-08-23 10:00-12:00 Poster 1 Speech Recognition: Technologies for new applicaitions and paradigms Poster Wed-P-6-1-9 1079 Distilling Knowledge from an Ensemble of Models for Punctuation Prediction Jiangyan Yi, Jianhua Tao, Zhengqi Wen, Ya Li
2017-08-23 10:00-12:00 Poster 1 Speech Recognition: Technologies for new applicaitions and paradigms Poster Wed-P-6-1-10 1274 A Mostly Data-driven Approach to Inverse Text Normalization Ernest Pusateri, Bharat Ambati, Elizabeth Brooks, Ondrej Platek, Donald McAllaster, Venki Nagesha
2017-08-23 10:00-12:00 Poster 1 Speech Recognition: Technologies for new applicaitions and paradigms Poster Wed-P-6-1-11 1567 Mismatched Crowdsourcing From Multiple Annotator Languages For Recognizing Zero-resourced Languages: A Nullspace Clustering Approach Wenda Chen, Mark Hasegawa-Johnson, Nancy Chen, Boon Pang Lim
2017-08-23 10:00-12:00 Poster 1 Speech Recognition: Technologies for new applicaitions and paradigms Poster Wed-P-6-1-12 1710 Experiments in Character-level Neural Network Models for Punctuation William Gale, Sarangarajan Parthasarathy
2017-08-23 10:00-12:00 Poster 1 Speech Recognition: Technologies for new applicaitions and paradigms Poster Wed-P-6-1-13 1778 Multi-Channel Apollo Mission Speech Transcript Calibration Lakshmish Kaushik, Abhijeet Sangwan, John H.L. Hansen
2017-08-23 10:00-10:20 E10 Acoustic model adaptation Oral Wed-O-6-10-1 519 Large-Scale Domain Adaptation via Teacher-Student Learning Jinyu Li, Michael Seltzer, Xi Wang, Rui Zhao, Yifan Gong
2017-08-23 10:20-10:40 E10 Acoustic model adaptation Oral Wed-O-6-10-2 302 Improving Children’s Speech Recognition through Explicit Pitch Scaling based on Iterative Spectrogram Inversion Waquar Ahmad, Syed Shahnawazuddin, Hemant Kumar Kathania, Gayadhar Pradhan, A. B. Samaddar
2017-08-23 10:40-11:00 E10 Acoustic model adaptation Oral Wed-O-6-10-3 368 RNN-LDA Clustering for Feature Based DNN Adaptation Xurong Xie, Xunying Liu, Tan Lee, Lan Wang
2017-08-23 11:00-11:20 E10 Acoustic model adaptation Oral Wed-O-6-10-4 1342 Robust online i-vectors for unsupervised adaptation of DNN acoustic models: A study in the context of digital voice assistants Harish Arsikere, Sri Garimella
2017-08-23 11:20-11:40 E10 Acoustic model adaptation Oral Wed-O-6-10-5 1446 Semi-supervised Learning with Semantic Knowledge Extraction for Improved Speech Recognition in Air Traffic Control Ajay Srinivasamurthy, Petr Motlicek, Ivan Himawan, Gyorgy Szaszak, Youssef Oualil, Hartmut Helmke
2017-08-23 11:40-12:00 E10 Acoustic model adaptation Oral Wed-O-6-10-6 556 Dynamic Layer Normalization for Adaptive Neural Acoustic Modeling in Speech Recognition Taesup Kim, Inchul Song, Yoshua Bengio
2017-08-23 13:30-15:30 Poster 1 Special Session: Digital Revolution for Under-resourced Languages II Special Session Wed-SS-7-1-1 1009 Transfer Learning and Distillation Techniques to Improve the Acoustic Modeling of Low Resource Languages Basil Abraham, Tejaswi Seeram, Srinivasan Umesh
2017-08-23 13:30-15:30 Poster 1 Special Session: Digital Revolution for Under-resourced Languages II Special Session Wed-SS-7-1-2 268 Machine Assisted Analysis of Vowel Length Contrasts in Wolof Elodie Gauthier, Laurent Besacier, Sylvie Voisin
2017-08-23 13:30-15:30 Poster 1 Special Session: Digital Revolution for Under-resourced Languages II Special Session Wed-SS-7-1-3 582 Deep Autoencoder Based Multi-task Learning Using Probabilistic Transcriptions Amit Das, Mark Hasegawa-Johnson, Karel Vesely
2017-08-23 13:30-15:30 Poster 1 Special Session: Digital Revolution for Under-resourced Languages II Special Session Wed-SS-7-1-4 1398 Nativization of foreign names in TTS for automatic reading of world news in Swahili Joseph Mendelson, Pilar Oplustil, Oliver Watts, Simon King
2017-08-23 13:30-15:30 Poster 1 Special Session: Digital Revolution for Under-resourced Languages II Special Session Wed-SS-7-1-5 226 Extracting Situation Frames from non-English Speech: Evaluation Framework and Pilot Results Nikolaos Malandrakis, Ondrej Glembek, Shrikanth Narayanan
2017-08-23 13:30-15:30 Poster 1 Special Session: Digital Revolution for Under-resourced Languages II Special Session Wed-SS-7-1-6 37 Uniform Multilingual Multi-Speaker Acoustic Model for Statistical Parametric Speech Synthesis of Low-Resourced Languages Alexander Gutkin
2017-08-23 13:30-15:30 Poster 1 Special Session: Digital Revolution for Under-resourced Languages II Special Session Wed-SS-7-1-7 903 Building an ASR corpus using Althingi’s Parliamentary Speeches Inga Rún Helgadóttir, Róbert Kjaran, Anna Björk Nikulásdóttir, Jon Gudnason
2017-08-23 13:30-15:30 Poster 1 Special Session: Digital Revolution for Under-resourced Languages II Special Session Wed-SS-7-1-8 1407 The ABAIR initiative: Bringing Spoken Irish into the Digital Space Ailbhe Ní Chasaide, Neasa Ní Chiaráin, Christoph Wendler, Harald Berthelsen, Andy Murphy, Christer Gobl
2017-08-23 13:30-15:30 Poster 1 Special Session: Digital Revolution for Under-resourced Languages II Special Session Wed-SS-7-1-9 1476 Unsupervised Speech Signal to Symbol Transformation for Zero Resource Speech Applications Saurabhchand Bhati, Shekhar Nayak, Sri Rama Murty Kodukula
2017-08-23 13:30-15:30 Poster 1 Special Session: Digital Revolution for Under-resourced Languages II Special Session Wed-SS-7-1-10 1262 Leveraging Text Data for Word Segmentation for Underresourced Languages Thomas Glarner, Benedikt Boenninghoff, Oliver Walter, Reinhold Haeb-Umbach
2017-08-23 13:30-15:30 Poster 1 Special Session: Digital Revolution for Under-resourced Languages II Special Session Wed-SS-7-1-11 215 The motivation and development of MPAi, a Māori Pronunication Aid. Catherine Watson, Peter Keegan, Margaret Maclagan, Ray Harlow, Jeanette King
2017-08-23 13:30-15:30 Poster 1 Special Session: Digital Revolution for Under-resourced Languages II Special Session Wed-SS-7-1-12 928 Implementation of a Radiology Speech Recognition System for Estonian using Open Source Software Tanel Alumäe, Andrus Paats, Ivo Fridolin, Einar Meister
2017-08-23 13:30-15:30 Poster 1 Special Session: Digital Revolution for Under-resourced Languages II Special Session Wed-SS-7-1-13 1352 Building ASR corpora using Eyra Jon Gudnason, Matthías Pétursson, Róbert Kjaran, Simon Kluepfel, Anna Nikulásdóttir
2017-08-23 13:30-15:30 Poster 1 Special Session: Digital Revolution for Under-resourced Languages II Special Session Wed-SS-7-1-14 1139 Rapid development of TTS corpora for four South African languages Daniel Van Niekerk, Charl Van Heerden, Marelie Davel, Neil Kleynhans, Oddur Kjartansson, Martin Jansche, Linne Ha
2017-08-23 13:30-15:30 Poster 1 Special Session: Digital Revolution for Under-resourced Languages II Special Session Wed-SS-7-1-15 880 Very low resource radio browsing for agile developmental and humanitarian monitoring Armin Saeb, Raghav Menon, Hugh Cameron, William Kibira, John Quinn, Thomas Niesler
2017-08-23 13:30-15:30 Poster 1 Special Session: Digital Revolution for Under-resourced Languages II Special Session Wed-SS-7-1-16 160 Areal and Phylogenetic Features for Multilingual Speech Synthesis Alexander Gutkin, Richard Sproat
2017-08-23 13:30-15:30 Poster 1 Special Session: Digital Revolution for Under-resourced Languages II Special Session Wed-SS-7-1-17 855 Eliciting meaningful units from speech Daniil Kocharov, Tatiana Kachkovskaia, Pavel Skrelin
2017-08-23 13:30-15:30 Poster 1 Special Session: Digital Revolution for Under-resourced Languages II Special Session Wed-SS-7-1-18 1558 First Results in Developing a Medieval Latin Language Charter Dictation System for the East-Central Europe Region Peter Mihajlik, Lili Szabo, Balazs Tarjan, Andras Balog, Krisztina Rabai
2017-08-23 13:30-15:30 Poster 1 Special Session: Digital Revolution for Under-resourced Languages II Special Session Wed-SS-7-1-19 300 On the Linguistic Relevance of Speech Units Learned by Unsupervised Acoustic Modeling Siyuan Feng, Tan Lee
2017-08-23 13:30-15:30 Poster 1 Special Session: Digital Revolution for Under-resourced Languages II Special Session Wed-SS-7-1-20 180 Team ELISA System for DARPA LORELEI Speech Evaluation 2016 Pavlos Papadopoulos, Ruchir Travadi, Colin Vaz, Nikolaos Malandrakis, Ulf Hermjakob, Nima Pourdamghani, Michael Pust, Boliang Zhang, Xiaoman Pan, Di Lu, Ying Lin, Ondrej Glembek, Murali Karthick B, Martin Karafiat, Lukas Burget, Mark Hasegawa-Johnson, Heng Ji, Jonathan May, Kevin Knight, Shrikanth Narayanan
2017-08-23 13:30-15:30 Poster 1 Special Session: Digital Revolution for Under-resourced Languages II Special Session Wed-SS-7-1-21 1028 Joint Estimation of Articulatory Features and Acoustic models for Low-Resource Languages Basil Abraham, Srinivasan Umesh, Neethu Mariam Joy
2017-08-23 13:30-15:30 Poster 1 Special Session: Digital Revolution for Under-resourced Languages II Special Session Wed-SS-7-1-22 1129 Improving DNN Bluetooth Narrowband Acoustic Models by Cross-bandwidth and Cross-lingual Initialization Xiaodan Zhuang, Arnab Ghoshal, Antti-Veikko Rosti, Matthias Paulik, Daben Liu
2017-08-23 16:00-16:20 C6 Multi-channel speech enhancement Oral Wed-O-8-6-1 187 Tight integration of spatial and spectral features for BSS with Deep Clustering embeddings Lukas Drude, Reinhold Haeb-Umbach
2017-08-23 16:20-16:40 C6 Multi-channel speech enhancement Oral Wed-O-8-6-2 667 Speaker-aware neural network based beamformer for speaker extraction in speech mixtures Katerina Zmolikova, Marc Delcroix, Keisuke Kinoshita, Takuya Higuchi, Atsunori Ogawa, Tomohiro Nakatani
2017-08-23 16:40-17:00 C6 Multi-channel speech enhancement Oral Wed-O-8-6-3 1186 Eigenvector-based Speech Mask Estimation using Logistic Regression Lukas Pfeifenberger, Matthias Zöhrer, Franz Pernkopf
2017-08-23 17:00-17:20 C6 Multi-channel speech enhancement Oral Wed-O-8-6-4 1458 Real-time Speech Enhancement with GCC-NMF Sean Wood, Jean Rouat
2017-08-23 17:20-17:40 C6 Multi-channel speech enhancement Oral Wed-O-8-6-5 1464 Coherence-based dual-channel noise reduction algorithm in a complex noisy environment Youna Ji, Jun Byun, Young-cheol Park
2017-08-23 17:40-18:00 C6 Multi-channel speech enhancement Oral Wed-O-8-6-6 1659 Glottal Model Based Speech Beamforming for Ad-Hoc Microphone Arrays Yang Zhang, Dinei Florencio, Mark Hasegawa-Johnson
2017-08-23 16:00-18:00 Poster 4 Voice Conversion 2 Poster Wed-P-8-4-1 63 Voice Conversion from Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks Chin-Cheng Hsu, Hsin-Te Hwang, YICHIAO WU, Yu Tsao, Hsin-Min Wang
2017-08-23 16:00-18:00 Poster 4 Voice Conversion 2 Poster Wed-P-8-4-2 133 CAB: An Energy-Based Speaker Clustering Model for Rapid Adaptation in Non-Parallel Voice Conversion Toru Nakashika
2017-08-23 16:00-18:00 Poster 4 Voice Conversion 2 Poster Wed-P-8-4-3 664 Phoneme-Discriminative Features for Dysarthric Speech Conversion Ryo Aihara, Tetsuya Takiguchi, Yasuo Ariki
2017-08-23 16:00-18:00 Poster 4 Voice Conversion 2 Poster Wed-P-8-4-4 694 Denoising Recurrent Neural Network for Deep Bidirectional LSTM based Voice Conversion Jie Wu, Dongyan Huang, Lei Xie, Haizhou Li
2017-08-23 16:00-18:00 Poster 4 Voice Conversion 2 Poster Wed-P-8-4-5 841 Speaker Dependent Approach for Enhancing a Glossectomy Patient’s Speech via GMM-based Voice Conversion Kei Tanaka, Sunao Hara, Masanobu Abe, Masaaki Sato, Shogo Minagi
2017-08-23 16:00-18:00 Poster 4 Voice Conversion 2 Poster Wed-P-8-4-6 962 Generative Adversarial Network-based Postfilter for STFT Spectrograms Takuhiro Kaneko, Shinji Takaki, Hirokazu Kameoka, Junichi Yamagishi
2017-08-23 16:00-18:00 Poster 4 Voice Conversion 2 Poster Wed-P-8-4-7 1288 Generative adversarial network-based glottal waveform model for statistical parametric speech synthesis Bajibabu Bollepalli, Lauri Juvela, Paavo Alku
2017-08-23 16:00-18:00 Poster 4 Voice Conversion 2 Poster Wed-P-8-4-8 984 Emotional Voice Conversion with Adaptive Scales F0 based on Wavelet Transform using Limited Amount of Emotional Data Zhaojie Luo, Jinhui Chen, Tetsuya Takiguchi, Yasuo Ariki
2017-08-23 16:00-18:00 Poster 4 Voice Conversion 2 Poster Wed-P-8-4-9 1038 Speaker adaptation in DNN-based speech synthesis using d-vectors Rama Sanand Doddipatla, Norbert Braunschweiler, Ranniery Maia
2017-08-23 16:00-18:00 Poster 4 Voice Conversion 2 Poster Wed-P-8-4-10 1122 Spectro-Temporal Modelling with Time-Frequency LSTM and Structured Output Layer for Voice Conversion Runnan Li, Zhiyong Wu, Yishuang Ning, Lifa Sun, Helen Meng, Lianhong Cai
2017-08-23 16:00-18:00 Poster 4 Voice Conversion 2 Poster Wed-P-8-4-11 1538 Segment Level Voice Conversion with Recurrent Neural Networks Miguel Ramos, Alan W Black, Ramón Astudillo, Isabel Trancoso, Nuno Fonseca
2017-08-23 13:30-13:50 C6 Dialog systems Oral Wed-O-7-6-1 1326 An End-to-End Trainable Neural Network Model with Belief Tracking for Task-Oriented Dialog Bing Liu, Ian Lane
2017-08-23 13:50-14:10 C6 Dialog systems Oral Wed-O-7-6-2 1060 Deep Reinforcement Learning of Dialogue Policies with Less Weight Updates Heriberto Cuayahuitl, Seunghak Yu
2017-08-23 14:10-14:30 C6 Dialog systems Oral Wed-O-7-6-3 1574 Towards End-to-End Spoken Dialogue Systems with Turn Embeddings Ali Orkan Bayer, Evgeny Stepanov, Giuseppe Riccardi
2017-08-23 14:30-14:50 C6 Dialog systems Oral Wed-O-7-6-4 501 Speech and Text Analysis for Multimodal Addressee Detection in Human-Human-Computer Interaction Oleg Akhtiamov, Maxim Sidorov, Alexey Karpov, Wolfgang Minker
2017-08-23 14:50-15:10 C6 Dialog systems Oral Wed-O-7-6-5 1205 Rushing to Judgement: How Do Laypeople Rate Caller Engagement in Thin-Slice Videos of Human–Machine Dialog? Vikram Ramanarayanan, Chee Wee (Ben) Leong, David Suendermann-Oeft
2017-08-23 15:10-15:30 C6 Dialog systems Oral Wed-O-7-6-6 753 Hyperarticulation of Corrections in Multilingual Dialogue Systems Ivan Kraljevski, Diane Hirschfeld
2017-08-23 13:30-15:30 Poster 3 Music and audio processing Poster Wed-P-7-3-1 17 Sinusoidal Partials Tracking for Singing Analysis Using the Heuristic of the Minimal Frequency and Magnitude Difference Kin Wah Edward Lin, Hans Anderson, Clifford So, Simon Lui
2017-08-23 13:30-15:30 Poster 3 Music and audio processing Poster Wed-P-7-3-2 101 Audio Scene Classification with Deep Recurrent Neural Networks Huy Phan, Philipp Koch, Fabrice Katzberg, Marco Maass, Radoslaw Mazur, Alfred Mertins
2017-08-23 13:30-15:30 Poster 3 Music and audio processing Poster Wed-P-7-3-3 119 Automatic time-frequency analysis of echolocation signals using the matched Gaussian multitaper spectrogram Maria Sandsten, Isabella Reinhold, Josefin Starkhammar
2017-08-23 13:30-15:30 Poster 3 Music and audio processing Poster Wed-P-7-3-4 213 Classification-Based Detection of Glottal Closure Instants from Speech Signals Jindrich Matousek, Daniel Tihelka
2017-08-23 13:30-15:30 Poster 3 Music and audio processing Poster Wed-P-7-3-5 222 A Domain Knowledge-Assisted Nonlinear Model for Head-Related Transfer Functions Based on Bottleneck Deep Neural Network Xiaoke Qi, Jianhua Tao
2017-08-23 13:30-15:30 Poster 3 Music and audio processing Poster Wed-P-7-3-6 315 Laryngeal Articulation during Trumpet Performance: An Exploratory Study Luis M.T. Jesus, Bruno Rocha, Andreia Hall
2017-08-23 13:30-15:30 Poster 3 Music and audio processing Poster Wed-P-7-3-7 395 Matrix of Polynomials Model based Polynomial Dictionary Learning Method for Acoustic Impulse Response Modeling Jian Guan, Xuan Wang, Pengming Feng, Jing Dong, Wenwu Wang
2017-08-23 13:30-15:30 Poster 3 Music and audio processing Poster Wed-P-7-3-8 431 Acoustic Scene Classification using a CNN-SuperVector system trained with Auditory and Spectrogram Image Features Rakib Hyder, Shabnam Ghaffarzadegan, Zhe Feng, John H.L. Hansen, Taufiq Hasan
2017-08-23 13:30-15:30 Poster 3 Music and audio processing Poster Wed-P-7-3-9 485 AN ENVIRONMENTAL FEATURE REPRESENTATION FOR ROBUST SPEECH RECOGNITION AND FOR ENVIRONMENT IDENTIFICATION Xue Feng
2017-08-23 13:30-15:30 Poster 3 Music and audio processing Poster Wed-P-7-3-10 486 Attention and Localization based on a Deep Convolutional Recurrent Model for Weakly Supervised Audio Tagging Yong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang, Mark D. Plumbley
2017-08-23 13:30-15:30 Poster 3 Music and audio processing Poster Wed-P-7-3-11 866 An audio based piano performance evaluation method using deep neural network based acoustic modeling Jing Pan, Ming Li, Zhanmei Song, Xin Li, Xiaolin Liu, Hua Yi, Manman Zhu
2017-08-23 13:30-15:30 Poster 3 Music and audio processing Poster Wed-P-7-3-12 1000 Music Tempo Estimation Using Sub-band Synchrony Shreyan Chowdhury, Tanaya Guha, Rajesh Hegde
2017-08-23 13:30-15:30 Poster 3 Music and audio processing Poster Wed-P-7-3-13 1469 A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification Yun Wang, Florian Metze
2017-08-23 13:30-15:30 Poster 3 Music and audio processing Poster Wed-P-7-3-14 1590 A Note Based Query By Humming System using Convolutional Neural Network Naziba Mostafa, Pascale Fung
2017-08-23 13:30-15:30 E306 Show & Tell 5 Show&Tell Wed-S&T-7-A-1 10022 Creating a Voice for MiRo, the World’s First Commercial Biomimetic Robot Roger Moore, Ben Mitchinson
2017-08-23 13:30-15:30 E306 Show & Tell 5 Show&Tell Wed-S&T-7-A-2 10023 A Thematicity-based Prosody Enrichment Tool for CTS Monica Dominguez, Mireia Farrús, Leo Wanner
2017-08-23 13:30-15:30 E306 Show & Tell 5 Show&Tell Wed-S&T-7-A-3 10024 WebSubDub – Experimental system for creating high-quality alternative audio track for TV broadcasting Martin Grůber, Jindrich Matousek, Zdeněk Hanzlíček, Jakub Vít, Daniel Tihelka
2017-08-23 13:30-15:30 E306 Show & Tell 5 Show&Tell Wed-S&T-7-A-4 10026 Voice Conservation and TTS System for People Facing Total Laryngectomy Markéta Jůzová, Daniel Tihelka, Jindrich Matousek, Zdenek Hanzlicek
2017-08-23 13:30-15:30 E306 Show & Tell 5 Show&Tell Wed-S&T-7-A-5 10042 TBT(Toolkit to Build TTS): A High Performance Framework to build Multiple Language HTS Voice Atish Ghone, Rachana Nerpagar, Pranaw Kumar, Arun Baby, Aswin Shanmugam, Sasikumar Mukundan, Hema Murthy
2017-08-23 13:30-13:50 A2 Noise robust speech recognition Oral Wed-O-7-2-1 901 Speech Representation Learning Using Unsupervised Data-Driven Modulation Filtering for Robust ASR Purvi Agrawal, Sriram Ganapathy
2017-08-23 13:50-14:10 A2 Noise robust speech recognition Oral Wed-O-7-2-2 642 Combined Multi-channel NMF-based Robust Beamforming for Noisy Speech Recognition Masato Mimura, Yoshiaki Bando, Kazuki Shimada, Shinsuke Sakai, Kazuyoshi Yoshii, Tatsuya Kawahara
2017-08-23 14:10-14:30 A2 Noise robust speech recognition Oral Wed-O-7-2-3 305 Recognizing Multi-talker Speech with Permutation Invariant Training Dong Yu, Xuankai Chang, Yanmin Qian
2017-08-23 14:30-14:50 A2 Noise robust speech recognition Oral Wed-O-7-2-4 61 Coupled initialization of multi-channel non-negative matrix factorization based on spatial and spectral information Yuuki Tachioka, Tomohiro Narita, Iori Miura, Takanobu Uramoto, Natsuki Monta, Shingo Uenohara, Ken’ichi Furuya, Shinji Watanabe, Jonathan Le Roux
2017-08-23 14:50-15:10 A2 Noise robust speech recognition Oral Wed-O-7-2-5 211 Channel Compensation in the Generalised Vector Taylor Series Approach to Robust ASR Erfan Loweimi, Jon Barker, Thomas Hain
2017-08-23 15:10-15:30 A2 Noise robust speech recognition Oral Wed-O-7-2-6 1570 Robust Speech Recognition Via Anchor Word Representations Brian King, I-Fan Chen, Yonatan Vaizman, Yuzong Liu, Roland Maas, SHK (Hari) Parthasarathi, Bjorn Hoffmeister
2017-08-23 10:00-12:00 Poster 4 Speech intelligibility Poster Wed-P-6-4-1 36 Predicting Automatic Speech Recognition Performance over Communication Channels from Instrumental Speech Quality and Intelligibility Scores Laura Fernández Gallardo, Sebastian Möller, John Beerends
2017-08-23 10:00-12:00 Poster 4 Speech intelligibility Poster Wed-P-6-4-2 105 Speech intelligibility in cars: the effect of speaking style, noise and listener age Cassia Valentini-Botinhao, Junichi Yamagishi
2017-08-23 10:00-12:00 Poster 4 Speech intelligibility Poster Wed-P-6-4-3 170 Predicting Speech Intelligibility Using a Gammachirp Envelope Distortion Index Based on the Signal-to-Distortion Ratio Katsuhiko Yamamoto, Toshio Irino, Toshie Matsui, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani
2017-08-23 10:00-12:00 Poster 4 Speech intelligibility Poster Wed-P-6-4-4 281 Intelligibilities of Mandarin Chinese Sentences with Spectral “Holes” Yafan Chen, Yong Xu, Jun Yang
2017-08-23 10:00-12:00 Poster 4 Speech intelligibility Poster Wed-P-6-4-5 500 The effect of situation-specific non-speech acoustic cues on the intelligibility of speech in noise Lauren Ward, Ben Shirley, Yan Tang, William Davies
2017-08-23 10:00-12:00 Poster 4 Speech intelligibility Poster Wed-P-6-4-6 1043 On the use of Band Importance Weighting in the Short-Time Objective Intelligibility Measure Asger Heidemann Andersen, Jan Mark de Haan, Zheng-Hua Tan, Jesper Jensen
2017-08-23 10:00-12:00 Poster 4 Speech intelligibility Poster Wed-P-6-4-7 1168 Listening in the dips: Comparing relevant features for speech recognition in humans and machines Constantin Spille, Bernd T. Meyer
2017-08-23 10:00-12:00 Poster 3 Spoken document processing Poster Wed-P-6-3-1 1592 Query-by-Example Search with Discriminative Neural Acoustic Word Embeddings Shane Settle, Keith Levin, Herman Kamper, Karen Livescu
2017-08-23 10:00-12:00 Poster 3 Spoken document processing Poster Wed-P-6-3-2 634 Constructing Acoustic Distances between Subwords and States Obtained from a Deep Neural Network for Spoken Term Detection Daisuke Kaneko
2017-08-23 10:00-12:00 Poster 3 Spoken document processing Poster Wed-P-6-3-3 1367 Fast and Accurate OOV Decoder on High-Level Features Yuri Khokhlov, Natalia Tomashenko, Ivan Medennikov, Aleksei Romanenko
2017-08-23 10:00-12:00 Poster 3 Spoken document processing Poster Wed-P-6-3-4 612 Exploring the Use of Significant Words Language Modeling for Spoken Document Retrieval Ying-Wen Chen, Kuan-Yu Chen, Hsin-Min Wang, Berlin Chen
2017-08-23 10:00-12:00 Poster 3 Spoken document processing Poster Wed-P-6-3-5 893 Incorporating Acoustic Features for Spontaneous Speech driven Content Retrieval Hiroto Tasaki, Tomoyosi Akiba
2017-08-23 10:00-12:00 Poster 3 Spoken document processing Poster Wed-P-6-3-6 862 Order-Preserving Abstractive Summarization for Spoken Content Based on Connectionist Temporal Classification Bo Ru Lu, Frank Shyu, Yun-Nung Chen, Hung-yi Lee, Lin-shan Lee
2017-08-23 10:00-12:00 Poster 3 Spoken document processing Poster Wed-P-6-3-7 1752 Automatic Alignment between Classroom Lecture Utterances and Slide Components Masatoshi Tsuchiya, Ryo Minamiguchi
2017-08-23 10:00-12:00 Poster 3 Spoken document processing Poster Wed-P-6-3-8 1183 Compensating Gender Variability in Query-by-Example Search on Speech Using Voice Conversion Paula Lopez-Otero, Laura Docio-Fernandez, Carmen Garcia-Mateo
2017-08-23 10:00-12:00 Poster 3 Spoken document processing Poster Wed-P-6-3-9 516 Zero-Shot Learning across Heterogenous Overlapping Domains Anjishnu Kumar, Pavankumar Muddireddy, Markus Dreyer, Bjorn Hoffmeister
2017-08-23 10:00-12:00 Poster 3 Spoken document processing Poster Wed-P-6-3-10 392 Hierarchical Recurrent Neural Network for Story Segmentation Emiru Tsunoo, Peter Bell, Steve Renals
2017-08-23 10:00-12:00 Poster 3 Spoken document processing Poster Wed-P-6-3-11 1231 Evaluating automatic topic segmentation as a segment retrieval task Abdessalam Bouchekif, Delphine Charlet, Geraldine Damnati, Nathalie Camelin, Yannick Estève
2017-08-23 10:00-12:00 Poster 3 Spoken document processing Poster Wed-P-6-3-12 650 Improving Speech Recognizers by Refining Broadcast Data with Inaccurate Subtitle Timestamps Jeong-Uk Bang, Mu-Yeol Choi, Sang-Hun Kim, Oh-Wook Kwon
2017-08-23 10:00-12:00 Poster 3 Spoken document processing Poster Wed-P-6-3-13 1087 A relevance score estimation for spoken term detection based on RNN-generated pronunciation embeddings Jan Švec, Josef V. Psutka, Luboš Šmídl, Jan Trmal
2017-08-23 10:00-10:20 Main hall Speech Production and Physiology Oral Wed-O-6-1-1 285 Aerodynamic features of French fricatives Rosario Signorello, Sergio Hassid, Didier Demolin
2017-08-23 10:20-10:40 Main hall Speech Production and Physiology Oral Wed-O-6-1-2 1126 Inter-speaker variability: speaker normalisation and quantitative estimation of articulatory invariants in speech production for French Antoine Serrurier, Pierre Badin, Louis-Jean Boe, Laurent Lamalle, Christiane Neuschaefer-Rube
2017-08-23 10:40-11:00 Main hall Speech Production and Physiology Oral Wed-O-6-1-3 1190 Comparison of Basic Beatboxing Articulations between Expert and Novice Artists using Real-Time Magnetic Resonance Imaging Nimisha Patil, Timothy Greer, Reed Blaylock, Shrikanth Narayanan
2017-08-23 11:00-11:20 Main hall Speech Production and Physiology Oral Wed-O-6-1-4 1576 Speaker-specific Biomechanical Model-based Investigation of a Simple Speech Task based on Tagged-MRI Keyi Tang, Negar Mohaghegh Harandi, Jonghye Woo, Georges El Fakhri, Maureen Stone, Sidney Fels
2017-08-23 11:20-11:40 Main hall Speech Production and Physiology Oral Wed-O-6-1-5 1631 Sounds of the Human Vocal Tract Reed Blaylock, Nimisha Patil, Timothy Greer, Shrikanth Narayanan
2017-08-23 11:40-12:00 Main hall Speech Production and Physiology Oral Wed-O-6-1-6 1675 A simulation study on the effect of glottal boundary conditions on vocal tract formants Yasufumi Uezu, Tokihiko Kaburagi
2017-08-23 16:00-18:00 Poster 3 Language understanding and generation Poster Wed-P-8-3-1 638 Zero-shot Learning for Natural Language Understanding using Domain-Independent Sequential Structure and Question Types Kugatsu Sadamitsu, Yukinori Homma, Ryuichiro Higashinaka, Yoshihiro Matsuo
2017-08-23 16:00-18:00 Poster 3 Language understanding and generation Poster Wed-P-8-3-2 269 Parallel Hierarchical Attention Networks with Shared Memory Reader for Multi-Stream Conversational Document Classification Naoki Sawada, Ryo Masumura, Hiromitsu Nishizaki
2017-08-23 16:00-18:00 Poster 3 Language understanding and generation Poster Wed-P-8-3-3 357 Internal Memory Gate for Recurrent Neural Networks with Application to Spoken Language Understanding Mohamed Morchid
2017-08-23 16:00-18:00 Poster 3 Language understanding and generation Poster Wed-P-8-3-4 422 Character-based Embedding Models and Reranking Strategies for Understanding Natural Language Meal Descriptions Mandy Korpusik, Zachary Collins, James Glass
2017-08-23 16:00-18:00 Poster 3 Language understanding and generation Poster Wed-P-8-3-5 1029 Quaternion Denoising Encoder-Decoder for Theme Identification of Telephone Conversations Titouan Parcollet, Mohamed Morchid, Georges Linares
2017-08-23 16:00-18:00 Poster 3 Language understanding and generation Poster Wed-P-8-3-6 1178 ASR error management for improving spoken language understanding Edwin Simonnet, Sahar Ghannay, Nathalie Camelin, Yannick Estève, Renato de Mori
2017-08-23 16:00-18:00 Poster 3 Language understanding and generation Poster Wed-P-8-3-7 1321 Jointly Trained Sequential Labeling and Classification by Sparse Attention Neural Networks Mingbo Ma, Kai Zhao, Liang Huang, Bing Xiang, Bowen Zhou
2017-08-23 16:00-18:00 Poster 3 Language understanding and generation Poster Wed-P-8-3-8 1525 To Plan or not to Plan? Discourse planning in slot-value informed sequence to sequence models for language generation Neha Nayak, Dilek Hakkani-Tur, Marilyn Walker, Larry Heck
2017-08-23 16:00-18:00 Poster 3 Language understanding and generation Poster Wed-P-8-3-9 921 Online adaptation of an attention-based neural network for natural language generation Matthieu Riou, Bassam Jabaian, Stéphane Huet, Fabrice Lefevre
2017-08-23 16:00-18:00 Poster 3 Language understanding and generation Poster Wed-P-8-3-10 275 Spanish Sign Language Recognition with Different Topology Hidden Markov Models Carlos-D. Martínez-Hinarejos, Zuzanna Parcheta
2017-08-23 16:00-18:00 Poster 3 Language understanding and generation Poster Wed-P-8-3-11 1382 OpenMM: An Open-source Multimodal Feature Extraction Tool Michelle Morales, Stefan Scherer, Rivka Levitan
2017-08-23 16:00-18:00 Poster 3 Language understanding and generation Poster Wed-P-8-3-12 1496 Speaker Dependency Analysis, Audiovisual Fusion Cues and A Multimodal BLSTM for Conversational Engagement Recognition Yuyun Huang, Emer Gilmartin, Nick Campbell
2017-08-23 16:00-18:00 Poster 3 Language understanding and generation Poster Wed-P-8-3-13 1413 Cross-Subject Continuous Emotion Recognition using Speech and Body Motion in Dyadic Interactions Syeda Narjis Fatima, Engin Erzin
2017-08-23 13:30-13:50 E10 Language Recognition Oral Wed-O-7-10-1 1334 Spoken Language Identification using LSTM-based Angular Proximity Gregory Gelly, Jean-Luc Gauvain
2017-08-23 13:50-14:10 E10 Language Recognition Oral Wed-O-7-10-2 44 End-to-End Language Identification Using High-Order Utterance Representation with Bilinear Pooling Ma Jin, Yan Song, Ian McLoughlin, Wu Guo, Lirong Dai
2017-08-23 14:10-14:30 E10 Language Recognition Oral Wed-O-7-10-3 576 Dialect Recognition Based on Unsupervised Bottleneck Features Qian Zhang, John H.L. Hansen
2017-08-23 14:30-14:50 E10 Language Recognition Oral Wed-O-7-10-4 596 Investigating Scalability in Hierarchical Language Identification System Saad Irtza, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Haizhou Li
2017-08-23 14:50-15:10 E10 Language Recognition Oral Wed-O-7-10-5 245 Improving Sub-phone Modeling for Better Native Language Identification with Non-native English Speech Yao Qian, Keelan Evanini, Xinhao Wang, David Suendermann-Oeft, Robert A Pugh, Patrick L Lange, Hillary R Molloy, Frank K Soong
2017-08-23 15:10-15:30 E10 Language Recognition Oral Wed-O-7-10-6 1391 QMDIS: QCRI-MIT Advanced Dialect Identification System Sameer Khurana, Maryam Najafian, Ahmed Ali, Tuka Al Hanai, Yonatan Belinkov, James Glass
2017-08-23 10:00-12:00 A2 Special Session: Digital Revolution for Under-resourced Languages I Special Session Wed-SS-6-2-1 1009 Transfer Learning and Distillation Techniques to Improve the Acoustic Modeling of Low Resource Languages Basil Abraham, Tejaswi Seeram, Srinivasan Umesh
2017-08-23 10:00-12:00 A2 Special Session: Digital Revolution for Under-resourced Languages I Special Session Wed-SS-6-2-2 268 Machine Assisted Analysis of Vowel Length Contrasts in Wolof Elodie Gauthier, Laurent Besacier, Sylvie Voisin
2017-08-23 10:00-12:00 A2 Special Session: Digital Revolution for Under-resourced Languages I Special Session Wed-SS-6-2-3 582 Deep Autoencoder Based Multi-task Learning Using Probabilistic Transcriptions Amit Das, Mark Hasegawa-Johnson, Karel Vesely
2017-08-23 10:00-12:00 A2 Special Session: Digital Revolution for Under-resourced Languages I Special Session Wed-SS-6-2-4 1398 Nativization of foreign names in TTS for automatic reading of world news in Swahili Joseph Mendelson, Pilar Oplustil, Oliver Watts, Simon King
2017-08-23 10:00-12:00 A2 Special Session: Digital Revolution for Under-resourced Languages I Special Session Wed-SS-6-2-5 226 Extracting Situation Frames from non-English Speech: Evaluation Framework and Pilot Results Nikolaos Malandrakis, Ondrej Glembek, Shrikanth Narayanan
2017-08-23 10:00-12:00 A2 Special Session: Digital Revolution for Under-resourced Languages I Special Session Wed-SS-6-2-6 37 Uniform Multilingual Multi-Speaker Acoustic Model for Statistical Parametric Speech Synthesis of Low-Resourced Languages Alexander Gutkin
2017-08-23 10:00-12:00 A2 Special Session: Digital Revolution for Under-resourced Languages I Special Session Wed-SS-6-2-7 903 Building an ASR corpus using Althingi’s Parliamentary Speeches Inga Rún Helgadóttir, Róbert Kjaran, Anna Björk Nikulásdóttir, Jon Gudnason
2017-08-23 10:00-12:00 A2 Special Session: Digital Revolution for Under-resourced Languages I Special Session Wed-SS-6-2-8 1407 The ABAIR initiative: Bringing Spoken Irish into the Digital Space Ailbhe Ní Chasaide, Neasa Ní Chiaráin, Christoph Wendler, Harald Berthelsen, Andy Murphy, Christer Gobl
2017-08-23 10:00-12:00 A2 Special Session: Digital Revolution for Under-resourced Languages I Special Session Wed-SS-6-2-9 1476 Unsupervised Speech Signal to Symbol Transformation for Zero Resource Speech Applications Saurabhchand Bhati, Shekhar Nayak, Sri Rama Murty Kodukula
2017-08-23 10:00-12:00 A2 Special Session: Digital Revolution for Under-resourced Languages I Special Session Wed-SS-6-2-10 1262 Leveraging Text Data for Word Segmentation for Underresourced Languages Thomas Glarner, Benedikt Boenninghoff, Oliver Walter, Reinhold Haeb-Umbach
2017-08-23 10:00-12:00 A2 Special Session: Digital Revolution for Under-resourced Languages I Special Session Wed-SS-6-2-11 215 The motivation and development of MPAi, a Māori Pronunication Aid. Catherine Watson, Peter Keegan, Margaret Maclagan, Ray Harlow, Jeanette King
2017-08-23 10:00-12:00 A2 Special Session: Digital Revolution for Under-resourced Languages I Special Session Wed-SS-6-2-12 928 Implementation of a Radiology Speech Recognition System for Estonian using Open Source Software Tanel Alumäe, Andrus Paats, Ivo Fridolin, Einar Meister
2017-08-23 10:00-12:00 A2 Special Session: Digital Revolution for Under-resourced Languages I Special Session Wed-SS-6-2-13 1352 Building ASR corpora using Eyra Jon Gudnason, Matthías Pétursson, Róbert Kjaran, Simon Kluepfel, Anna Nikulásdóttir
2017-08-23 10:00-12:00 A2 Special Session: Digital Revolution for Under-resourced Languages I Special Session Wed-SS-6-2-14 1139 Rapid development of TTS corpora for four South African languages Daniel Van Niekerk, Charl Van Heerden, Marelie Davel, Neil Kleynhans, Oddur Kjartansson, Martin Jansche, Linne Ha
2017-08-23 10:00-12:00 A2 Special Session: Digital Revolution for Under-resourced Languages I Special Session Wed-SS-6-2-15 880 Very low resource radio browsing for agile developmental and humanitarian monitoring Armin Saeb, Raghav Menon, Hugh Cameron, William Kibira, John Quinn, Thomas Niesler
2017-08-23 10:00-12:00 A2 Special Session: Digital Revolution for Under-resourced Languages I Special Session Wed-SS-6-2-16 160 Areal and Phylogenetic Features for Multilingual Speech Synthesis Alexander Gutkin, Richard Sproat
2017-08-23 10:00-12:00 A2 Special Session: Digital Revolution for Under-resourced Languages I Special Session Wed-SS-6-2-17 855 Eliciting meaningful units from speech Daniil Kocharov, Tatiana Kachkovskaia, Pavel Skrelin
2017-08-23 10:00-12:00 A2 Special Session: Digital Revolution for Under-resourced Languages I Special Session Wed-SS-6-2-18 1558 First Results in Developing a Medieval Latin Language Charter Dictation System for the East-Central Europe Region Peter Mihajlik, Lili Szabo, Balazs Tarjan, Andras Balog, Krisztina Rabai
2017-08-23 10:00-12:00 A2 Special Session: Digital Revolution for Under-resourced Languages I Special Session Wed-SS-6-2-19 300 On the Linguistic Relevance of Speech Units Learned by Unsupervised Acoustic Modeling Siyuan Feng, Tan Lee
2017-08-23 10:00-12:00 A2 Special Session: Digital Revolution for Under-resourced Languages I Special Session Wed-SS-6-2-20 180 Team ELISA System for DARPA LORELEI Speech Evaluation 2016 Pavlos Papadopoulos, Ruchir Travadi, Colin Vaz, Nikolaos Malandrakis, Ulf Hermjakob, Nima Pourdamghani, Michael Pust, Boliang Zhang, Xiaoman Pan, Di Lu, Ying Lin, Ondrej Glembek, Murali Karthick B, Martin Karafiat, Lukas Burget, Mark Hasegawa-Johnson, Heng Ji, Jonathan May, Kevin Knight, Shrikanth Narayanan
2017-08-23 10:00-12:00 A2 Special Session: Digital Revolution for Under-resourced Languages I Special Session Wed-SS-6-2-21 1028 Joint Estimation of Articulatory Features and Acoustic models for Low-Resource Languages Basil Abraham, Srinivasan Umesh, Neethu Mariam Joy
2017-08-23 10:00-12:00 A2 Special Session: Digital Revolution for Under-resourced Languages I Special Session Wed-SS-6-2-22 1129 Improving DNN Bluetooth Narrowband Acoustic Models by Cross-bandwidth and Cross-lingual Initialization Xiaodan Zhuang, Arnab Ghoshal, Antti-Veikko Rosti, Matthias Paulik, Daben Liu
2017-08-23 16:00-16:20 D8 Speech Recognition: Applications in medical practice Oral Wed-O-8-8-1 280 Acoustic Assessment of Disordered Voice with Continuous Speech Based on Utterance-level ASR Posterior Features Yuanyuan Liu, Tan Lee, P.C. Ching, Thomas K.T. Law, Kathy Y.S. Lee
2017-08-23 16:20-16:40 D8 Speech Recognition: Applications in medical practice Oral Wed-O-8-8-2 303 Multi-Stage DNN Training for Automatic Recognition of Dysarthric Speech Emre Yilmaz, Mario Ganzeboom, Catia Cucchiarini, Helmer Strik
2017-08-23 16:40-17:00 D8 Speech Recognition: Applications in medical practice Oral Wed-O-8-8-3 455 Improving child speech disorder assessment by incorporating out-of-domain adult speech Daniel Smith, Alex Sneddon, Lauren Ward, Andreas Duenser, Jill Freyne, David Silvera-Tawil, Angela Morgan
2017-08-23 17:00-17:20 D8 Speech Recognition: Applications in medical practice Oral Wed-O-8-8-4 878 On Improving Acoustic Models For TORGO Dysarthric Speech Database Neethu Mariam Joy, Srinivasan Umesh, Basil Abraham
2017-08-23 17:20-17:40 D8 Speech Recognition: Applications in medical practice Oral Wed-O-8-8-5 1251 Glottal Source Features for Automatic Speech-based Depression Assessment Olympia Simantiraki, Paulos Charonyktakis, Anastasia Pampouchidou, Manolis Tsiknakis, Martin Cooke
2017-08-23 17:40-18:00 D8 Speech Recognition: Applications in medical practice Oral Wed-O-8-8-6 1712 Speech Processing Approach for Diagnosing Dementia in an Early Stage Roozbeh Sadeghian, J. David Schaffer, Stephen Zahorian
2017-08-23 10:00-10:20 B4 Speech and harmonic analysis Oral Wed-O-6-4-1 1172 A robust and alternative approach to zero frequency filtering method for epoch extraction Gangamohan Paidi, Bayya Yegnanarayana
2017-08-23 10:20-10:40 B4 Speech and harmonic analysis Oral Wed-O-6-4-2 21 Improving YANGsaf F0 Estimator with Adaptive Kalman Filter Kanru Hua
2017-08-23 10:40-11:00 B4 Speech and harmonic analysis Oral Wed-O-6-4-3 1138 A Spectro-Temporal Demodulation Technique for Pitch Estimation Jitendra Dhiman, Nagaraj Adiga, Chandra Sekhar Seelamantula
2017-08-23 11:00-11:20 B4 Speech and harmonic analysis Oral Wed-O-6-4-4 1061 Robust method for estimating F0 of complex tone based on pitch perception of amplitude modulated signal Kenichiro Miwa, Masashi Unoki
2017-08-23 11:20-11:40 B4 Speech and harmonic analysis Oral Wed-O-6-4-5 1254 Low-Complexity Pitch Estimation Based on Phase Differences Between Low-Resolution Spectra Simon Graf, Tobias Herbig, Markus Buck, Gerhard Schmidt
2017-08-23 11:40-12:00 B4 Speech and harmonic analysis Oral Wed-O-6-4-6 68 Harvest: A high-performance fundamental frequency estimator from speech signals Masanori Morise
2017-08-23 16:00-18:00 Poster 1 Prosody Poster Wed-P-8-1-1 291 Trisyllabic tone 3 sandhi patterns in Mandarin produced by Cantonese speakers Jung-Yueh Tu, Janice Wing-Sze Wong, Jih-Ho Cha
2017-08-23 16:00-18:00 Poster 1 Prosody Poster Wed-P-8-1-2 840 Intonation of contrastive topic in Estonian Heete Sahkai, Meelis Mihkla
2017-08-23 16:00-18:00 Poster 1 Prosody Poster Wed-P-8-1-3 1235 Reanalyze Fundamental Frequency Peak Delay in Mandarin Lixia Hao, Wei Zhang, Yanlu Xie, Jinsong Zhang
2017-08-23 16:00-18:00 Poster 1 Prosody Poster Wed-P-8-1-4 1430 How does the absence of shared knowledge between interlocutors affect the production of French prosodic forms? Amandine Michelas, Cécile Cau, Maud Champagne-Lavau
2017-08-23 16:00-18:00 Poster 1 Prosody Poster Wed-P-8-1-5 1500 Three Dimensions of Sentence Prosody and their (Non-)Interactions Michael Wagner, Michael McAuliffe
2017-08-23 16:00-18:00 Poster 1 Prosody Poster Wed-P-8-1-6 710 Using Prosody to Classify Discourse Relations Janine Kleinhans, Mireia Farrús, Agustin Gravano, Juan Manuel Pérez, Catherine Lai, Leo Wanner
2017-08-23 16:00-18:00 Poster 1 Prosody Poster Wed-P-8-1-7 1585 Canonical Correlation Analysis and Prediction of Perceived Rhythmic Prominences and Pitch Tones in Speech Elizabeth Godoy, James Williamson, Thomas Quatieri
2017-08-23 16:00-18:00 Poster 1 Prosody Poster Wed-P-8-1-8 1237 Evaluation of Spectral Tilt Measures for Sentence Prominence Under Different Noise Conditions Sofoklis Kakouros, Okko Räsänen, Paavo Alku
2017-08-23 16:00-18:00 Poster 1 Prosody Poster Wed-P-8-1-9 1578 Creaky voice as a function of tonal categories and prosodic boundaries Jianjing Kuang
2017-08-23 16:00-18:00 Poster 1 Prosody Poster Wed-P-8-1-10 417 The Acoustics of Word Stress in Czech as a Function of Speaking Style Radek Skarnitzl, Anders Eriksson
2017-08-23 16:00-18:00 Poster 1 Prosody Poster Wed-P-8-1-11 177 What You See Is What You Get Prosodically Less – Visibility Shapes Prosodic Prominence Production in Spontaneous Interaction Petra Wagner, Nataliya Bryhadyr
2017-08-23 16:00-18:00 Poster 1 Prosody Poster Wed-P-8-1-12 1167 Focus Acoustics in Mandarin Nominals Yu-Yin Hsu, Anqi Xu
2017-08-23 16:00-18:00 Poster 1 Prosody Poster Wed-P-8-1-13 1502 Exploring multidimensionality: Acoustic and articulatory correlates of Swedish word accents Malin Svensson Lundmark, Gilbert Ambrazaitis, Otto Ewald
2017-08-23 16:00-18:00 Poster 1 Prosody Poster Wed-P-8-1-14 1279 The Perception of English Intonation Patterns by German L2 speakers of English Karin Puga, Robert Fuchs, Jane Setter, Peggy Mok
2017-08-23 13:30-15:30 E397 Show & Tell 6 Show&Tell Wed-S&T-7-B-1 10029 Integrating the Talkamatic Dialogue Manager with Alexa Staffan Larsson, Fredrik Kronlid, Andreas Krona, Alex Berman
2017-08-23 13:30-15:30 E397 Show & Tell 6 Show&Tell Wed-S&T-7-B-2 10031 A Robust Medical Speech-to-Speech/Speech-to-Sign Phraselator Farhia Ahmed, Pierrette Bouillon, Chelle Destefano, Johanna Gerlach, Sonia Halimi, Angela Hooper, Manny Rayner, Hervé Spechbach, Irene Strasly, Nikos Tsourakis
2017-08-23 13:30-15:30 E397 Show & Tell 6 Show&Tell Wed-S&T-7-B-3 10041 Towards an Autarkic Embedded Cognitive User Interface Frank Duckhorn, Markus Huber, Werner Meyer, Oliver Jokisch, Constanze Tschöpe, Matthias Wolff
2017-08-23 13:30-15:30 E397 Show & Tell 6 Show&Tell Wed-S&T-7-B-4 10050 Nora the Empathetic Psychologist Genta Indra Winata, Onno Kampman, Yang Yang, Anik Dey, Pascale Fung
2017-08-23 13:30-15:30 E397 Show & Tell 6 Show&Tell Wed-S&T-7-B-5 10057 Modifying Amazon’s Alexa ASR Grammar and Lexicon – A Case Study Aman Kumar, Hassan Alam, Manan Vyas, Tina Werner, Rachmat Hartono
2017-08-23 16:00-16:20 Main hall Speaker Database and Anti-spoofing Oral Wed-O-8-1-1 256 Detection of Replay Attacks using Single Frequency Filtering Cepstral Coefficients K N R K Raju Alluri, Sivanand Achanta, Sudarsana Reddy Kadiri, Suryakanth V Gangashetty, Anil Kumar Vuppala
2017-08-23 16:20-16:40 Main hall Speaker Database and Anti-spoofing Oral Wed-O-8-1-2 1393 Unsupervised Representation Learning Using Convolutional Restricted Boltzmann Machine for Spoof Speech Detection Hardik Sailor, Madhu Kamble, Hemant Patil
2017-08-23 16:40-17:00 Main hall Speaker Database and Anti-spoofing Oral Wed-O-8-1-3 836 Independent Modelling of High and Low Energy Speech Frames for Spoofing Detection Gajan Suthokumar, Kaavya Sriskandaraja, Vidhyasaharan Sethu, Chamith Wijenayake, Eliathamby Ambikairajah
2017-08-23 17:00-17:20 Main hall Speaker Database and Anti-spoofing Oral Wed-O-8-1-4 1758 Improving Speaker Verification Performance in Presence of Spoofing Attacks Using Out-of-Domain Spoofed Data Achintya Sarkar, Md Sahidullah, Zheng-Hua Tan, Tomi Kinnunen
2017-08-23 17:20-17:40 Main hall Speaker Database and Anti-spoofing Oral Wed-O-8-1-5 950 VoxCeleb: A large-scale speaker identification dataset Arsha Nagrani, Joon Son Chung, Andrew Zisserman
2017-08-23 17:40-18:00 Main hall Speaker Database and Anti-spoofing Oral Wed-O-8-1-6 1521 Call My Net Corpus: A Multilingual Corpus for Evaluation of Speaker Recognition Technology Karen Jones, Stephanie Strassel, Kevin Walker, David Graff, Jonathan Wright
2017-08-23 13:30-13:50 B4 Topic spotting, entity extraction and semantic analysis Oral Wed-O-7-4-1 518 Towards Zero-Shot Frame Semantic Parsing for Domain Scaling Ankur Bapna, Gokhan Tur, Dilek Hakkani-Tur, Larry Heck
2017-08-23 13:50-14:10 B4 Topic spotting, entity extraction and semantic analysis Oral Wed-O-7-4-2 1075 ClockWork-RNN based architectures for Slot Filling Despoina Georgiadou, Vassilios Diakoloukas, Vassilios Tsiaras, Vassilios Digalakis
2017-08-23 14:10-14:30 B4 Topic spotting, entity extraction and semantic analysis Oral Wed-O-7-4-3 1482 Investigating the Effect of ASR tuning on Named Entity Recognition Mohamed Ben Jannet, Olivier Galibert, Martine Adda-Decker, Sophie Rosset
2017-08-23 14:30-14:50 B4 Topic spotting, entity extraction and semantic analysis Oral Wed-O-7-4-4 1480 Label-dependency coding in Simple Recurrent Networks for Spoken Language Understanding Marco Dinarelli, Vedran Vukotic, Christian Raymond
2017-08-23 14:50-15:10 B4 Topic spotting, entity extraction and semantic analysis Oral Wed-O-7-4-5 590 Minimum Semantic Error Cost Training of Deep Long Short-Term Memory Networks for Topic Spotting on Conversational Speech Zhong Meng, Biing-Hwang (Fred) Juang
2017-08-23 15:10-15:30 B4 Topic spotting, entity extraction and semantic analysis Oral Wed-O-7-4-6 1093 Topic Identification for Speech without ASR Chunxi Liu, Jan Trmal, Matthew Wiesner, Craig Harman, Sanjeev Khudanpur
2017-08-23 13:30-15:30 Poster 4 Disorders related to Speech and Language Poster Wed-P-7-4-1 112 Manual and Automatic Transcriptions in Dementia Detection from Speech Jochen Weiner, Mathis Engelbart, Tanja Schultz
2017-08-23 13:30-15:30 Poster 4 Disorders related to Speech and Language Poster Wed-P-7-4-2 120 An Affect Prediction Approach through Depression Severity Parameter Incorporation in Neural Networks Rahul Gupta, Saurabh Sahu, Carol Espy-Wilson, Shrikanth Narayanan
2017-08-23 13:30-15:30 Poster 4 Disorders related to Speech and Language Poster Wed-P-7-4-3 216 Cross-Database Models for the Classification of Dysarthria Presence Stephanie Gillespie, Yash-Yee Logan, Elliot Moore, Jacqueline Laures-Gore, Scott Russell, Rupal Patel
2017-08-23 13:30-15:30 Poster 4 Disorders related to Speech and Language Poster Wed-P-7-4-4 381 Acoustic evaluation of nasality in cerebellar syndromes Michal Novotný, Jan Rusz, Karel Spálenka, Jiří Klempíř, Dana Horáková, Evžen Růžička
2017-08-23 13:30-15:30 Poster 4 Disorders related to Speech and Language Poster Wed-P-7-4-5 409 Emotional Speech of Mentally and Physically Disabled Individuals: Introducing The EmotAsS Database and First Findings Simone Hantke, Hesam Sagha, Nicholas Cummins, Björn Schuller
2017-08-23 13:30-15:30 Poster 4 Disorders related to Speech and Language Poster Wed-P-7-4-6 621 Phonological markers of Oxytocin and MDMA ingestion Carla Agurto, Raquel Norel, Rachel Ostrand, Gillinder Bedi, Harriet de Wit, Matthew J. Baggott, Matthew G. Kirkpatrick, Margaret Wardle, Guillermo Cecchi
2017-08-23 13:30-15:30 Poster 4 Disorders related to Speech and Language Poster Wed-P-7-4-7 690 An avatar-based system for identifying individuals likely to develop dementia Bahman Mirheidari, Daniel Blackburn, Kirsty Harkness, Traci Walker, Annalena Venneri, Markus Reuber, Heidi Christensen
2017-08-23 13:30-15:30 Poster 4 Disorders related to Speech and Language Poster Wed-P-7-4-8 1015 Cross-Domain Classification of Drowsiness in Speech: The Case of Alcohol Intoxication and Sleep Deprivation Yue Zhang, Felix Weninger, Björn Schuller
2017-08-23 13:30-15:30 Poster 4 Disorders related to Speech and Language Poster Wed-P-7-4-9 1201 Depression Detection Using Automatic Transcriptions of De-Identified Speech Paula Lopez-Otero, Laura Docio-Fernandez, Alberto Abad, Carmen Garcia-Mateo
2017-08-23 13:30-15:30 Poster 4 Disorders related to Speech and Language Poster Wed-P-7-4-10 1572 An N-Gram Based Approach to the Automatic Diagnosis of Alzheimer’s Disease from Spoken Language Sebastian Wankerl, Elmar Noeth, Stefan Evert
2017-08-23 13:30-15:30 Poster 4 Disorders related to Speech and Language Poster Wed-P-7-4-11 1599 Exploiting Intra-annotator Rating Consistency through Copeland’s Method for Estimation of Ground Truth Labels in Couples’ Therapy Karel Mundnich, Md Nasir, Panayiotis Georgiou, Shrikanth Narayanan
2017-08-23 13:30-15:30 Poster 4 Disorders related to Speech and Language Poster Wed-P-7-4-12 850 Rhythmic Characteristics of Parkinsonian Speech: A Study on Mandarin and Polish Massimo Pettorino, Wentao Gu, Paweł Półrola, Ping Fan
2017-08-23 13:30-15:30 F11 Special Session: Data Collection, Transcription and Annotation Issues in Child Language Acquisition Special Session Wed-SS-7-11-1 636 SLPAnnotator: Tools for implementing Sign Language Phonetic Annotation Kathleen Currie Hall, Scott Mackie, Michael Fry, Oksana Tkachman
2017-08-23 13:30-15:30 F11 Special Session: Data Collection, Transcription and Annotation Issues in Child Language Acquisition Special Session Wed-SS-7-11-2 1287 The LENA system applied to Swedish: Reliability of the Adult Word Count estimate Iris-Corinna Schwarz, Noor Botros, Alekzandra Lord, Amelie Marcusson, Henrik Tidelius, Ellen Marklund
2017-08-23 13:30-15:30 F11 Special Session: Data Collection, Transcription and Annotation Issues in Child Language Acquisition Special Session Wed-SS-7-11-3 1409 What do babies hear? Analyses of child- and adult-directed speech Marisa Casillas, Andrei Amatuni, Amanda Seidl, Melanie Soderstrom, Anne Warlaumont, Elika Bergelson
2017-08-23 13:30-15:30 F11 Special Session: Data Collection, Transcription and Annotation Issues in Child Language Acquisition Special Session Wed-SS-7-11-4 1418 A New Workflow for Semi-automatized Annotations: Tests with Long-Form Naturalistic Recordings of Children’s Language Environments Marisa Casillas, Elika Bergelson, Anne S. Warlaumont, Alejandrina Cristia, Melanie Soderstrom, Mark VanDam, Han Sloetjes
2017-08-23 13:30-15:30 F11 Special Session: Data Collection, Transcription and Annotation Issues in Child Language Acquisition Special Session Wed-SS-7-11-5 1443 Top-down versus bottom-up theories of phonological acquisition: A big data approach Christina Bergmann, Sho Tsuji, Alejandrina Cristia
2017-08-23 13:30-15:30 F11 Special Session: Data Collection, Transcription and Annotation Issues in Child Language Acquisition Special Session Wed-SS-7-11-6 1468 Which acoustic and phonological factors shape infants’ vowel discrimination? Exploiting natural variation in InPhonDB Sho Tsuji, Alejandrina Cristia
2017-08-23 16:00-18:00 F11 Special Session: Voice Attractiveness Special Session Wed-SS-8-11-1 130 Personalized Quantification of Voice Attractiveness in Multidimensional Merit Space Yasunari Obuchi
2017-08-23 16:00-18:00 F11 Special Session: Voice Attractiveness Special Session Wed-SS-8-11-2 142 The role of temporal amplitude modulations in the political arena: Hillary Clinton vs. Donald Trump Hans Rutger Bosker
2017-08-23 16:00-18:00 F11 Special Session: Voice Attractiveness Special Session Wed-SS-8-11-3 326 Perceptual Ratings of Voice Likability Collected through In-Lab Listening Tests vs. Mobile-Based Crowdsourcing Laura Fernández Gallardo, Rafael Zequeira Jiménez, Sebastian Möller
2017-08-23 16:00-18:00 F11 Special Session: Voice Attractiveness Special Session Wed-SS-8-11-4 367 Attractiveness of French voices for German listeners – results from native and non-native read speech Juergen Trouvain, Frank Zimmerer
2017-08-23 16:00-18:00 F11 Special Session: Voice Attractiveness Special Session Wed-SS-8-11-5 833 Social Attractiveness in Dialogs Antje Schweitzer, Natalie Lewandowski, Daniel Duran
2017-08-23 16:00-18:00 F11 Special Session: Voice Attractiveness Special Session Wed-SS-8-11-6 1349 A gender bias in the acoustic-melodic features of charismatic speech? Eszter Novak-Tot, Oliver Niebuhr, Aoju Chen
2017-08-23 16:00-18:00 F11 Special Session: Voice Attractiveness Special Session Wed-SS-8-11-7 1520 Pitch convergence as an effect of perceived attractiveness and likability Jan Michalsky, Heike Schoormann
2017-08-23 16:00-18:00 F11 Special Session: Voice Attractiveness Special Session Wed-SS-8-11-8 1691 Does Posh English Sound Attractive? Li Jiao, Chengxia Wang, Cristiane Hsu, Peter Birkholz, Yi Xu
2017-08-23 16:00-18:00 F11 Special Session: Voice Attractiveness Special Session Wed-SS-8-11-9 1697 Large-scale Speaker Ranking from Crowdsourced Pairwise Listener Ratings Timo Baumann
2017-08-23 16:00-16:20 B4 Speech Translation Oral Wed-O-8-4-1 503 Sequence-to-Sequence Models Can Directly Translate Foreign Speech Ron Weiss, Jan Chorowski, Navdeep Jaitly, Yonghui Wu, Zhifeng Chen
2017-08-23 16:20-16:40 B4 Speech Translation Oral Wed-O-8-4-2 944 Structured-based Curriculum Learning for End-to-end English-Japanese Speech Translation Takatomo Kano, Sakriani Sakti, Satoshi Nakamura
2017-08-23 16:40-17:00 B4 Speech Translation Oral Wed-O-8-4-3 1690 Assessing the tolerance of Neural Machine Translation systems against Speech Recognition Errors Nicholas Ruiz, Mattia Antonino Di Gangi, Nicola Bertoldi, Marcello Federico
2017-08-23 17:00-17:20 B4 Speech Translation Oral Wed-O-8-4-4 896 Toward Expressive Speech Translation: A Unified Sequence-to-Sequence LSTMs Approach for Translating Words and Emphasis Quoc Truong Do, Sakriani Sakti, Satoshi Nakamura
2017-08-23 17:20-17:40 B4 Speech Translation Oral Wed-O-8-4-5 1320 NMT-based Segmentation and Punctuation Insertion for Real-time Spoken Language Translation Eunah Cho, Jan Niehues, Alex Waibel
2017-08-23 13:30-15:30 Poster 2 Speaker States and Traits Poster Wed-P-7-2-1 104 The Perception of Emotions in Noisified Nonsense Speech Emilia Parada-Cabaleiro, Alice Baird, Anton Batliner, Nicholas Cummins, Simone Hantke, Björn Schuller
2017-08-23 13:30-15:30 Poster 2 Speaker States and Traits Poster Wed-P-7-2-2 218 Attention Networks for Modeling Behavior in Addiction Counseling James Gibson, Dogan Can, Panayiotis Georgiou, David Atkins, Shrikanth Narayanan
2017-08-23 13:30-15:30 Poster 2 Speaker States and Traits Poster Wed-P-7-2-3 466 Computational Analysis of Acoustic Descriptors in Psychotic Patients Torsten Wörtwein, Tadas Baltrušaitis, Eugene Laksana, Luciana Pennant, Elizabeth Liebson, Dost Öngür, Justin Baker, Louis-Philippe Morency
2017-08-23 13:30-15:30 Poster 2 Speaker States and Traits Poster Wed-P-7-2-4 562 Modeling Perceivers Neural-Responses using Lobe-dependent Convolutional Neural Network to Improve Speech Emotion Recognition Ya-Tse Wu, Hsuan-Yu Chen, Yu-Hsien Liao, Li-Wei Kuo, Chi-Chun Lee
2017-08-23 13:30-15:30 Poster 2 Speaker States and Traits Poster Wed-P-7-2-5 887 Implementing gender-dependent vowel-level analysis for boosting speech-based depression recognition Bogdan Vlasenko, Hesam Sagha, Nicholas Cummins, Björn Schuller
2017-08-23 13:30-15:30 Poster 2 Speaker States and Traits Poster Wed-P-7-2-6 1379 Bilingual Word Embeddings for Cross-Lingual Personality Recognition Using Convolutional Neural Nets Farhad Bin Siddique, Pascale Fung
2017-08-23 13:30-15:30 Poster 2 Speaker States and Traits Poster Wed-P-7-2-7 994 Emotion category mapping to emotional space by cross-corpus emotion labeling Yoshiko Arimoto, Hiroki Mori
2017-08-23 13:30-15:30 Poster 2 Speaker States and Traits Poster Wed-P-7-2-8 1194 Big Five vs. Prosodic Features as Cues to Detect Abnormality in SSPNET-Personality Corpus Cédric Fayet, Arnaud Delhay, Damien Lolive, Pierre-Francois Marteau
2017-08-23 13:30-15:30 Poster 2 Speaker States and Traits Poster Wed-P-7-2-9 1584 Speech Rate Comparison when Talking to a System and Talking to a Human: A study from a Speech-to-Speech, Machine Translation mediated Map Task Akira Hayakawa, Carl Vogel, Saturnino Luz, Nick Campbell
2017-08-23 13:30-15:30 Poster 2 Speaker States and Traits Poster Wed-P-7-2-10 1621 Approaching Human Performance in Behavior Estimation in Couples Therapy Using Deep Sentence Embeddings Shao-Yen Tseng, Brian Baucom, Panayiotis Georgiou
2017-08-23 13:30-15:30 Poster 2 Speaker States and Traits Poster Wed-P-7-2-11 1641 Complexity in speech and its relation to emotional bond in therapist-patient interactions during suicide risk assessment interviews Md Nasir, Brian Baucom, Craig J. Bryan, Shrikanth Narayanan, Panayiotis Georgiou
2017-08-23 13:30-15:30 Poster 2 Speaker States and Traits Poster Wed-P-7-2-12 1707 An Investigation of Emotion Dynamics and Kalman Filtering for Speech-based Emotion Prediction Zhaocheng Huang, Julien Epps
2017-08-23 10:00-12:00 F11 Special Session: Computational Models in Child Language Acquisition Special Session Wed-SS-6-11-1 520 Multi-Task Learning for Mispronunciation Detection on Singapore Children’s Mandarin Speech Rong Tong, Nancy Chen, Bin Ma
2017-08-23 10:00-12:00 F11 Special Session: Computational Models in Child Language Acquisition Special Session Wed-SS-6-11-2 937 Relating unsupervised word segmentation to reported vocabulary acquisition Elin Larsen, Alejandrina Cristia, Emmanuel Dupoux
2017-08-23 10:00-12:00 F11 Special Session: Computational Models in Child Language Acquisition Special Session Wed-SS-6-11-3 1143 Modelling the Informativeness of Non-Verbal Cues in Parent–Child Interaction Mats Wirén, Kristina N. Björkenstam, Robert Östling
2017-08-23 10:00-12:00 F11 Special Session: Computational Models in Child Language Acquisition Special Session Wed-SS-6-11-4 1289 Computational simulations of temporal vocalization behavior in adult-child interaction Ellen Marklund, David Pagmar, Tove Gerholm, Lisa Gustavsson
2017-08-23 10:00-12:00 F11 Special Session: Computational Models in Child Language Acquisition Special Session Wed-SS-6-11-5 1634 Approximating phonotactic input in children’s linguistic environments from orthographic transcripts Sofia Strömbergsson, Jens Edlund, Jana Götze, Kristina Nilsson Björkenstam
2017-08-23 10:00-12:00 F11 Special Session: Computational Models in Child Language Acquisition Special Session Wed-SS-6-11-6 1689 Learning weakly-supervised multimodal phoneme embeddings Rahma Chaabouni, Ewan Dunbar, Neil Zeghidour, Emmanuel Dupoux
2017-08-23 10:00-12:00 Poster 2 Speaker and Language Recognition Applications Poster Wed-P-6-2-1 530 Calibration Approaches for Language Detection Mitchell McLaren, Luciana Ferrer, Diego Castan, Aaron Lawson
2017-08-23 10:00-12:00 Poster 2 Speaker and Language Recognition Applications Poster Wed-P-6-2-2 286 Bidirectional Modelling for Short Duration Language Identification Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Julien Epps
2017-08-23 10:00-12:00 Poster 2 Speaker and Language Recognition Applications Poster Wed-P-6-2-3 553 Conditional Generative Adversarial Nets Classifier for Spoken Language Identification Peng Shen, Xugang Lu, Sheng Li, Hisashi Kawai
2017-08-23 10:00-12:00 Poster 2 Speaker and Language Recognition Applications Poster Wed-P-6-2-4 1314 Tied Hidden Factors in Neural Networks for End-to-End Speaker Recognition Antonio Miguel, Jorge Llombart, Alfonso Ortega, Eduardo Lleida Solano
2017-08-23 10:00-12:00 Poster 2 Speaker and Language Recognition Applications Poster Wed-P-6-2-5 923 Speaker Clustering by Iteratively Finding Discriminative Feature Space and Cluster Labels Sungrack Yun, Hye Jin Jang, Taesu Kim
2017-08-23 10:00-12:00 Poster 2 Speaker and Language Recognition Applications Poster Wed-P-6-2-6 84 Domain Adaptation of PLDA models in Broadcast Diarization by means of Unsupervised Speaker Clustering Ignacio Viñals, Alfonso Ortega, Jesus Villalba, Antonio Miguel, Eduardo Lleida Solano
2017-08-23 10:00-12:00 Poster 2 Speaker and Language Recognition Applications Poster Wed-P-6-2-7 407 LSTM Neural Network-based Speaker Segmentation using Acoustic and Language Modelling Miquel Angel India Massana, José A. R. Fonollosa, Javier Hernando
2017-08-23 10:00-12:00 Poster 2 Speaker and Language Recognition Applications Poster Wed-P-6-2-8 1311 Acoustic Pairing of Original and Dubbed Voices in the Context of Video Game Localization Adrien Gresse, Mickael Rouvier, Richard Dufour, Vincent Labatut, Jean-Francois Bonastre
2017-08-23 10:00-12:00 Poster 2 Speaker and Language Recognition Applications Poster Wed-P-6-2-9 152 Homogeneity Measure Impact on Target and Non-target Trials in Forensic Voice Comparison Moez Ajili, Jean-Francois Bonastre, Waad Ben Kheder, Solange Rossato, Juliette Kahn
2017-08-23 10:00-12:00 Poster 2 Speaker and Language Recognition Applications Poster Wed-P-6-2-10 1023 Null-Hypothesis LLR: A proposal for Forensic Automatic Speaker Recognition Yosef A. Solewicz, Michael Jessen, David van der Vloed
2017-08-23 10:00-12:00 Poster 2 Speaker and Language Recognition Applications Poster Wed-P-6-2-11 997 The Opensesame NIST 2016 Speaker Recognition Evaluation System Gang Liu, Qi Qian, Zhibin Wang, Qingen Zhao, Tianzhou Wang, Hao Li, Jian Xue, Shenghuo Zhu, Rong Jin, Tuo Zhao
2017-08-23 10:00-12:00 Poster 2 Speaker and Language Recognition Applications Poster Wed-P-6-2-12 1307 IITG-Indigo System for NIST 2016 SRE Challenge Nagendra Kumar, Rohan Kumar Das, Sarfaraz Jelil, Dhanush B K, Harish Kashyap, Sri Rama Murty Kodukula, Sriram Ganapathy, Rohit Sinha, S R Mahadeva Prasanna
2017-08-23 10:00-12:00 Poster 2 Speaker and Language Recognition Applications Poster Wed-P-6-2-13 581 Locally Weighted Linear Discriminant Analysis for Robust Speaker Verification Abhinav Misra, Shivesh Ranjan, John H.L. Hansen
2017-08-23 16:00-18:00 A2 Special Session: Digital Revolution for Under-resourced Languages III Special Session Wed-SS-8-2-1 1009 Transfer Learning and Distillation Techniques to Improve the Acoustic Modeling of Low Resource Languages Basil Abraham, Tejaswi Seeram, Srinivasan Umesh
2017-08-23 16:00-18:00 A2 Special Session: Digital Revolution for Under-resourced Languages III Special Session Wed-SS-8-2-2 268 Machine Assisted Analysis of Vowel Length Contrasts in Wolof Elodie Gauthier, Laurent Besacier, Sylvie Voisin
2017-08-23 16:00-18:00 A2 Special Session: Digital Revolution for Under-resourced Languages III Special Session Wed-SS-8-2-3 582 Deep Autoencoder Based Multi-task Learning Using Probabilistic Transcriptions Amit Das, Mark Hasegawa-Johnson, Karel Vesely
2017-08-23 16:00-18:00 A2 Special Session: Digital Revolution for Under-resourced Languages III Special Session Wed-SS-8-2-4 1398 Nativization of foreign names in TTS for automatic reading of world news in Swahili Joseph Mendelson, Pilar Oplustil, Oliver Watts, Simon King
2017-08-23 16:00-18:00 A2 Special Session: Digital Revolution for Under-resourced Languages III Special Session Wed-SS-8-2-5 226 Extracting Situation Frames from non-English Speech: Evaluation Framework and Pilot Results Nikolaos Malandrakis, Ondrej Glembek, Shrikanth Narayanan
2017-08-23 16:00-18:00 A2 Special Session: Digital Revolution for Under-resourced Languages III Special Session Wed-SS-8-2-6 37 Uniform Multilingual Multi-Speaker Acoustic Model for Statistical Parametric Speech Synthesis of Low-Resourced Languages Alexander Gutkin
2017-08-23 16:00-18:00 A2 Special Session: Digital Revolution for Under-resourced Languages III Special Session Wed-SS-8-2-7 903 Building an ASR corpus using Althingi’s Parliamentary Speeches Inga Rún Helgadóttir, Róbert Kjaran, Anna Björk Nikulásdóttir, Jon Gudnason
2017-08-23 16:00-18:00 A2 Special Session: Digital Revolution for Under-resourced Languages III Special Session Wed-SS-8-2-8 1407 The ABAIR initiative: Bringing Spoken Irish into the Digital Space Ailbhe Ní Chasaide, Neasa Ní Chiaráin, Christoph Wendler, Harald Berthelsen, Andy Murphy, Christer Gobl
2017-08-23 16:00-18:00 A2 Special Session: Digital Revolution for Under-resourced Languages III Special Session Wed-SS-8-2-9 1476 Unsupervised Speech Signal to Symbol Transformation for Zero Resource Speech Applications Saurabhchand Bhati, Shekhar Nayak, Sri Rama Murty Kodukula
2017-08-23 16:00-18:00 A2 Special Session: Digital Revolution for Under-resourced Languages III Special Session Wed-SS-8-2-10 1262 Leveraging Text Data for Word Segmentation for Underresourced Languages Thomas Glarner, Benedikt Boenninghoff, Oliver Walter, Reinhold Haeb-Umbach
2017-08-23 16:00-18:00 A2 Special Session: Digital Revolution for Under-resourced Languages III Special Session Wed-SS-8-2-11 215 The motivation and development of MPAi, a Māori Pronunication Aid. Catherine Watson, Peter Keegan, Margaret Maclagan, Ray Harlow, Jeanette King
2017-08-23 16:00-18:00 A2 Special Session: Digital Revolution for Under-resourced Languages III Special Session Wed-SS-8-2-12 928 Implementation of a Radiology Speech Recognition System for Estonian using Open Source Software Tanel Alumäe, Andrus Paats, Ivo Fridolin, Einar Meister
2017-08-23 16:00-18:00 A2 Special Session: Digital Revolution for Under-resourced Languages III Special Session Wed-SS-8-2-13 1352 Building ASR corpora using Eyra Jon Gudnason, Matthías Pétursson, Róbert Kjaran, Simon Kluepfel, Anna Nikulásdóttir
2017-08-23 16:00-18:00 A2 Special Session: Digital Revolution for Under-resourced Languages III Special Session Wed-SS-8-2-14 1139 Rapid development of TTS corpora for four South African languages Daniel Van Niekerk, Charl Van Heerden, Marelie Davel, Neil Kleynhans, Oddur Kjartansson, Martin Jansche, Linne Ha
2017-08-23 16:00-18:00 A2 Special Session: Digital Revolution for Under-resourced Languages III Special Session Wed-SS-8-2-15 880 Very low resource radio browsing for agile developmental and humanitarian monitoring Armin Saeb, Raghav Menon, Hugh Cameron, William Kibira, John Quinn, Thomas Niesler
2017-08-23 16:00-18:00 A2 Special Session: Digital Revolution for Under-resourced Languages III Special Session Wed-SS-8-2-16 160 Areal and Phylogenetic Features for Multilingual Speech Synthesis Alexander Gutkin, Richard Sproat
2017-08-23 16:00-18:00 A2 Special Session: Digital Revolution for Under-resourced Languages III Special Session Wed-SS-8-2-17 855 Eliciting meaningful units from speech Daniil Kocharov, Tatiana Kachkovskaia, Pavel Skrelin
2017-08-23 16:00-18:00 A2 Special Session: Digital Revolution for Under-resourced Languages III Special Session Wed-SS-8-2-18 1558 First Results in Developing a Medieval Latin Language Charter Dictation System for the East-Central Europe Region Peter Mihajlik, Lili Szabo, Balazs Tarjan, Andras Balog, Krisztina Rabai
2017-08-23 16:00-18:00 A2 Special Session: Digital Revolution for Under-resourced Languages III Special Session Wed-SS-8-2-19 300 On the Linguistic Relevance of Speech Units Learned by Unsupervised Acoustic Modeling Siyuan Feng, Tan Lee
2017-08-23 16:00-18:00 A2 Special Session: Digital Revolution for Under-resourced Languages III Special Session Wed-SS-8-2-20 180 Team ELISA System for DARPA LORELEI Speech Evaluation 2016 Pavlos Papadopoulos, Ruchir Travadi, Colin Vaz, Nikolaos Malandrakis, Ulf Hermjakob, Nima Pourdamghani, Michael Pust, Boliang Zhang, Xiaoman Pan, Di Lu, Ying Lin, Ondrej Glembek, Murali Karthick B, Martin Karafiat, Lukas Burget, Mark Hasegawa-Johnson, Heng Ji, Jonathan May, Kevin Knight, Shrikanth Narayanan
2017-08-23 16:00-18:00 A2 Special Session: Digital Revolution for Under-resourced Languages III Special Session Wed-SS-8-2-21 1028 Joint Estimation of Articulatory Features and Acoustic models for Low-Resource Languages Basil Abraham, Srinivasan Umesh, Neethu Mariam Joy
2017-08-23 16:00-18:00 A2 Special Session: Digital Revolution for Under-resourced Languages III Special Session Wed-SS-8-2-22 1129 Improving DNN Bluetooth Narrowband Acoustic Models by Cross-bandwidth and Cross-lingual Initialization Xiaodan Zhuang, Arnab Ghoshal, Antti-Veikko Rosti, Matthias Paulik, Daben Liu
2017-08-23 10:00-10:20 D8 Social Signals, Styles, and Interaction Oral Wed-O-6-8-1 87 Emotional Features for Speech Overlaps Classification Olga Egorow, Andreas Wendemuth
2017-08-23 10:20-10:40 D8 Social Signals, Styles, and Interaction Oral Wed-O-6-8-2 563 Computing Multimodal Dyadic Behaviors during Spontaneous Diagnosis Interviews toward Automatic Categorization of Autism Spectrum Disorder Chin-Po Chen, Xian-Hong Tseng, Susan Shur-Fen Gau, Chi-Chun Lee
2017-08-23 10:40-11:00 D8 Social Signals, Styles, and Interaction Oral Wed-O-6-8-3 569 Deriving Dyad-Level Interaction Representation using Interlocutors Structural and Expressive Multimodal Behavior Features Yun-Shao Lin, Chi-Chun Lee
2017-08-23 11:00-11:20 D8 Social Signals, Styles, and Interaction Oral Wed-O-6-8-4 635 Spotting Social Signals in Conversational Speech over IP: A Deep Learning Perspective Raymond Brueckner, Maximilian Schmitt, Maja Pantic, Björn Schuller
2017-08-23 11:20-11:40 D8 Social Signals, Styles, and Interaction Oral Wed-O-6-8-5 932 Optimized Time Series Filters for Detecting Laughter and Filler Events Gábor Gosztolya
2017-08-23 11:40-12:00 D8 Social Signals, Styles, and Interaction Oral Wed-O-6-8-6 1633 Visual, Laughter, Applause and Spoken Expression Features for Predicting Engagement within TED Talks. Fasih Haider, Fahim A. Salim, Saturnino Luz, Carl Vogel, Owen Conlan, Nick Campbell
2017-08-23 13:30-13:50 Main hall Cognition and Brain Studies Oral Wed-O-7-1-1 73 An entrained rhythm’s frequency, not phase, influences temporal sampling of speech Hans Rutger Bosker, Anne Kösem
2017-08-23 13:50-14:10 Main hall Cognition and Brain Studies Oral Wed-O-7-1-2 658 Context regularity indexed by auditory N1 and P2 event-related potentials Xiao Wang, Yanhui Zhang, Gang Peng
2017-08-23 14:10-14:30 Main hall Cognition and Brain Studies Oral Wed-O-7-1-3 842 Discovering Language in Marmoset Vocalization Sakshi Verma, Lok Prateek Kotha, Karthik Pandia D S, Nauman Dawalatabad, Rogier Landman, Jitendra Sharma, Mriganka Sur, Hema Murthy
2017-08-23 14:30-14:50 Main hall Cognition and Brain Studies Oral Wed-O-7-1-4 854 Subject-independent Classification of Japanese Spoken Sentences by Multiple Frequency Bands Phase Pattern of EEG Response during Speech Perception Hiroki Watanabe, Hiroki Tanaka, Sakriani Sakti, Satoshi Nakamura
2017-08-23 14:50-15:10 Main hall Cognition and Brain Studies Oral Wed-O-7-1-5 934 The phonological status of the French Initial Accent and its role in semantic processing: an Event-Related Potentials study Noemie te Rietmolen, Radouane El Yagoubi, Alain Ghio, Corine Astésano
2017-08-23 15:10-15:30 Main hall Cognition and Brain Studies Oral Wed-O-7-1-6 1741 A Neuro-Experimental Evidence for the Motor Theory of Speech Perception Bin Zhao, Jianwu Dang, Gaoyan Zhang
2017-08-24 10:00-12:00 E10 Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) I Special Session Thu-SS-9-10-1 43 The INTERSPEECH 2017 Computational Paralinguistics Challenge: Addressee, Cold & Snoring Björn Schuller, Stefan Steidl, Anton Batliner, Elika Bergelson, Jarek Krajewski, Christoph Janott, Andrei Amatuni, Marisa Casillas, Amanda Seidl, Melanie Soderstrom, Anne Warlaumont, Guillermo Hidalgo, Sebastian Schnieder, Clemens Heiser, Winfried Hohenhorst, Michael Herzog, Maximilian Schmitt, Kun Qian, Yue Zhang, George Trigeorgis, Panagiotis Tzirakis, Stefanos Zafeiriou
2017-08-24 10:00-12:00 E10 Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) I Special Session Thu-SS-9-10-2 173 An ‘End-to-Evolution’ Hybrid Approach for Snore Sound Classification Michael Freitag, Shahin Amiriparian, Nicholas Cummins, Maurice Gerczuk, Björn Schuller
2017-08-24 10:00-12:00 E10 Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) I Special Session Thu-SS-9-10-3 434 Snore Sound Classification Using Image-based Deep Spectrum Features Shahin Amiriparian, Maurice Gerczuk, Sandra Ottl, Nicholas Cummins, Michael Freitag, Sergey Pugachevskiy, Alice Baird, Björn Schuller
2017-08-24 10:00-12:00 E10 Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) I Special Session Thu-SS-9-10-4 653 Introducing Weighted Kernel Classifiers for Handling Imbalanced Paralinguistic Corpora: Snoring, Addressee and Cold Heysem Kaya, Alexey Karpov
2017-08-24 10:00-12:00 E10 Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) I Special Session Thu-SS-9-10-5 905 DNN-based Feature Extraction and Classifier Combination for Child-Directed Speech, Cold and Snoring Identification Gábor Gosztolya, Róbert Busa-Fekete, Tamás Grósz, László Tóth
2017-08-24 10:00-12:00 E10 Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) I Special Session Thu-SS-9-10-6 1066 Infected Phonemes: How a Cold Impairs Speech on a Phonetic Level Johannes Wagner, Thiago Fraga-Silva, Yvan Josse, Dominik Schiller, Andreas Seiderer, Elisabeth André
2017-08-24 10:00-12:00 E10 Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) I Special Session Thu-SS-9-10-7 1211 A dual source-filter model of snore audio for snorer group classification Achuth Rao MV, Shivani Yadav, Prasanta Ghosh
2017-08-24 10:00-12:00 E10 Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) I Special Session Thu-SS-9-10-8 1261 It sounds like you have a cold! Testing voice features for the Interspeech 2017 Computational Paralinguistics Cold Challenge Mark Huckvale, András Beke
2017-08-24 10:00-12:00 E10 Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) I Special Session Thu-SS-9-10-9 1378 Exploring Fusion Methods and Feature Space for the Classification of Paralinguistic Information David Tavarez, Xabier Sarasola, Agustin Alonso, Jon Sanchez, Luis Serrano, Eva Navas, Inma Hernáez
2017-08-24 10:00-12:00 E10 Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) I Special Session Thu-SS-9-10-10 1445 End-to-End Deep Learning Framework for Speech Paralinguistics Detection Based on Perception Aware Spectrum Danwei Cai, Zhidong Ni, Wenbo Liu, Weicheng Cai, Gang Li, Ming Li
2017-08-24 10:00-12:00 E10 Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) I Special Session Thu-SS-9-10-11 1550 Phoneme state posteriorgram features for speech based automatic classification of speakers in cold and healthy conditions Akshay Kalkunte Suresh, Srinivasa Raghavan K M, Prasanta Ghosh
2017-08-24 10:00-12:00 E10 Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) I Special Session Thu-SS-9-10-12 1794 An Integrated Solution for Snoring Sound Classification Using Bhattacharyya Distance based GMM Supervectors with SVM, Feature Selection with Random Forest and Spectrogram with CNN Tin Lay Nwe, Tran Huy Dat, Ng Wen Zheng Terence, Bin Ma
2017-08-24 13:30-15:30 E306 Show & Tell 7 Show&Tell Thu-S&T-10-A-1 10002 Soundtracing for realtime speech adjustment to environmental conditions in 3D simulations Szymon Pałka, Tomasz Pędzimąż, Bartosz Ziolko
2017-08-24 13:30-15:30 E306 Show & Tell 7 Show&Tell Thu-S&T-10-A-2 10027 Vocal-tract Model with Static Articulators: Lips, Teeth, Tongue, and More Takayuki Arai
2017-08-24 13:30-15:30 E306 Show & Tell 7 Show&Tell Thu-S&T-10-A-3 10038 Remote articulation test system based on WebRTC Ikuyo Masuda-Katsuse
2017-08-24 13:30-15:30 E306 Show & Tell 7 Show&Tell Thu-S&T-10-A-4 10054 The ModelTalker Project: A web-based voice banking pipeline for ALS/MND patients H Timothy Bunnell, Jason Lilley, Kathleen McGrath
2017-08-24 13:30-15:30 E306 Show & Tell 7 Show&Tell Thu-S&T-10-A-5 10055 Visible Vowels: a Tool for the Visualization of Vowel Variation Wilbert Heeringa, Hans Van de Velde
2017-08-24 10:00-10:20 C6 Noise reduction Oral Thu-O-9-6-1 57 Deep Recurrent Neural Network based Monaural Speech Separation using Recurrent Temporal Restricted Boltzmann Machines Suman Samui, Indrajit Chakrabarti, Soumya Kanti Ghosh
2017-08-24 10:20-10:40 C6 Noise reduction Oral Thu-O-9-6-2 109 Improved Codebook-based Speech Enhancement based on MBE Model Qizheng Huang, Changchun Bao, Xianyun Wang
2017-08-24 10:40-11:00 C6 Noise reduction Oral Thu-O-9-6-3 515 Improving mask learning based speech enhancement system with restoration layers and residual connection Zhuo Chen, Jinyu Li, Yan Huang, Yifan Gong
2017-08-24 11:00-11:20 C6 Noise reduction Oral Thu-O-9-6-4 611 Exploring Low-Dimensional Structures of Modulation Spectra for Robust Speech Recognition Bi-Cheng Yan, Chin-Hong Shih, Shih-Hung Liu, Berlin Chen
2017-08-24 11:20-11:40 C6 Noise reduction Oral Thu-O-9-6-5 1428 SEGAN: Speech Enhancement Generative Adversarial Network Santiago Pascual, Antonio Bonafonte, Joan Serrà
2017-08-24 11:40-12:00 C6 Noise reduction Oral Thu-O-9-6-6 1653 Concatenative resynthesis using twin networks Soumi Maiti, Michael Mandel
2017-08-24 10:00-10:20 A2 Speaker Diarization Oral Thu-O-9-2-1 51 Speaker Diarization Using Convolutional Neural Network for Statistics Accumulation Refinement Zbynek Zajic, Marek Hruz, Ludek Muller
2017-08-24 10:20-10:40 A2 Speaker Diarization Oral Thu-O-9-2-2 1650 Speaker2Vec: Unsupervised Learning and Adaptation of a Speaker Manifold using Deep Neural Networks with an Evaluation on Speaker Segmentation Arindam Jati, Panayiotis Georgiou
2017-08-24 10:40-11:00 A2 Speaker Diarization Oral Thu-O-9-2-3 270 A Triplet Ranking-based Neural Network for Speaker Diarization and Linking Gaël Le Lan, Delphine Charlet, Anthony Larcher, Sylvain Meignier
2017-08-24 11:00-11:20 A2 Speaker Diarization Oral Thu-O-9-2-4 492 Estimating Speaker Clustering Quality Using Logistic Regression Yishai Cohen, Itshak Lapidot
2017-08-24 11:20-11:40 A2 Speaker Diarization Oral Thu-O-9-2-5 1067 Combining speaker turn embedding and incremental structure prediction for low-latency speaker diarization Guillaume Wisniewski, Hervé Bredin, Gregory Gelly, Claude Barras
2017-08-24 11:40-12:00 A2 Speaker Diarization Oral Thu-O-9-2-6 411 pyannote.metrics: a toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems Hervé Bredin
2017-08-24 13:30-13:50 D8 Forensic phonetics and sociophonetic varieties Oral Thu-O-10-8-1 1368 What is the relevant population? Considerations for the computation of likelihood ratios in forensic voice comparison Vincent Hughes, Paul Foulkes
2017-08-24 13:50-14:10 D8 Forensic phonetics and sociophonetic varieties Oral Thu-O-10-8-2 1080 Voice disguise vs. Impersonation: Acoustic and perceptual measurements of vocal flexibility in non experts Veronique Delvaux, Lise Caucheteux, Kathy Huet, Myriam Piccaluga, Bernard Harmegnies
2017-08-24 14:10-14:30 D8 Forensic phonetics and sociophonetic varieties Oral Thu-O-10-8-3 470 Schwa Realization in French: Using Automatic Speech Processing to Study Phonological and Socio-linguistic Factors in Large Corpora Yaru WU, Martine Adda-Decker, Cecile Fougeron, Lori Lamel
2017-08-24 14:30-14:50 D8 Forensic phonetics and sociophonetic varieties Oral Thu-O-10-8-4 922 The Social Life of Tswana Ejectives Daniel Duran, Jagoda Bruni, Grzegorz Dogil, Justus Roux
2017-08-24 14:50-15:10 D8 Forensic phonetics and sociophonetic varieties Oral Thu-O-10-8-5 50 How long is too long? How pause features after requests affect the perceived willingness of affirmative answers Lea S. Kohtz, Oliver Niebuhr
2017-08-24 15:10-15:30 D8 Forensic phonetics and sociophonetic varieties Oral Thu-O-10-8-6 1433 Shadowing Synthesized Speech – Segmental Analysis of Phonetic Convergence Iona Gessinger, Eran Raveh, Sébastien Le Maguer, Bernd Möbius, Ingmar Steiner
2017-08-24 10:00-12:00 Poster 1 Noise robust and Far-field ASR Poster Thu-P-9-1-1 1096 Improved Automatic Speech Recognition using Subband Temporal Envelope Features and Time-delay Neural Network Denoising Autoencoder Cong-Thanh Do, Yannis Stylianou
2017-08-24 10:00-12:00 Poster 1 Noise robust and Far-field ASR Poster Thu-P-9-1-2 225 Factored deep convolutional neural networks for noise robust speech recognition Masakiyo Fujimoto
2017-08-24 10:00-12:00 Poster 1 Noise robust and Far-field ASR Poster Thu-P-9-1-3 230 Global SNR Estimation of Speech Signals for Unknown Noise Conditions using Noise Adapted Non-linear Regression Pavlos Papadopoulos, Ruchir Travadi, Shrikanth Narayanan
2017-08-24 10:00-12:00 Poster 1 Noise robust and Far-field ASR Poster Thu-P-9-1-4 579 Joint Training of Multi-channel-condition Dereverberation and Acoustic Modeling of Microphone Array Speech for Robust Distant Speech Recognition Fengpei Ge, Kehuang Li, Bo Wu, Sabato Marco Siniscalchi, Yonghong Yan, Chin-Hui Lee
2017-08-24 10:00-12:00 Poster 1 Noise robust and Far-field ASR Poster Thu-P-9-1-5 793 Uncertainty decoding with adaptive sampling for noise robust DNN-based acoustic modeling Tien Dung Tran, Marc Delcroix, Atsunori Ogawa, Tomohiro Nakatani
2017-08-24 10:00-12:00 Poster 1 Noise robust and Far-field ASR Poster Thu-P-9-1-6 805 Attention-based LSTM with Multi-task Learning for Distant Speech Recognition Yu Zhang, Pengyuan Zhang, Yonghong Yan
2017-08-24 10:00-12:00 Poster 1 Noise robust and Far-field ASR Poster Thu-P-9-1-7 1315 To Improve the Robustness of LSTM-RNN Acoustic Models Using Higher-order Feedback From Multiple Histories Hengguan Huang, Brian Mak
2017-08-24 10:00-12:00 Poster 1 Noise robust and Far-field ASR Poster Thu-P-9-1-8 1536 End-to-End Speech Recognition with Auditory Attention for Multi-Microphone Distance Speech Recognition Suyoun Kim, Ian Lane
2017-08-24 10:00-12:00 Poster 1 Noise robust and Far-field ASR Poster Thu-P-9-1-9 1665 Robust Speech Recognition Based on Binaural Auditory Processing Anjali Menon, Chanwoo Kim, Richard Stern
2017-08-24 10:00-12:00 Poster 1 Noise robust and Far-field ASR Poster Thu-P-9-1-10 1791 Adaptive Multichannel Dereverberation for Automatic Speech Recognition Joe Caroselli, Izhak Shafran, Arun Narayanan, Richard Rose
2017-08-24 10:00-12:00 Poster 2 Multi-lingual models and Adaptation for ASR Poster Thu-P-9-2-1 111 Multilingual Recurrent Neural Networks with Residual Learning for Low-Resource Speech Recognition Shiyu Zhou, Yuanyuan Zhao, Shuang Xu, Bo Xu
2017-08-24 10:00-12:00 Poster 2 Multi-lingual models and Adaptation for ASR Poster Thu-P-9-2-2 505 CTC Training of Multi-Phone Acoustic Models for Speech Recognition Olivier Siohan
2017-08-24 10:00-12:00 Poster 2 Multi-lingual models and Adaptation for ASR Poster Thu-P-9-2-3 1242 An Investigation of Deep Neural Networks for Multilingual Speech Recognition Training and Adaptation Sibo Tong, Philip N. Garner, Herve Bourlard
2017-08-24 10:00-12:00 Poster 2 Multi-lingual models and Adaptation for ASR Poster Thu-P-9-2-4 1775 2016 BUT Babel system: Multilingual BLSTM acoustic model with i-vector based adaptation Martin Karafiat, Murali Karthick Baskar, Pavel Matejka, Karel Vesely, Frantisek Grezl, Lukas Burget, Jan Černocký
2017-08-24 10:00-12:00 Poster 2 Multi-lingual models and Adaptation for ASR Poster Thu-P-9-2-5 755 OPTIMIZING DNN ADAPTATION FOR RECOGNITION OF ENHANCED SPEECH Marco Matassoni, Alessio Brutti, Falavigna Daniele
2017-08-24 10:00-12:00 Poster 2 Multi-lingual models and Adaptation for ASR Poster Thu-P-9-2-6 783 Deep Least Squares Regression for Speaker Adaptation Younggwan Kim, Hyungjun Lim, Jahyun Goo, Hoirin Kim
2017-08-24 10:00-12:00 Poster 2 Multi-lingual models and Adaptation for ASR Poster Thu-P-9-2-7 788 Multi-task Learning using Mismatched Transcription for Under-resourced Speech Recognition Van Hai Do, Nancy F. Chen, Boon Pang Lim, Mark Hasegawa-Johnson
2017-08-24 10:00-12:00 Poster 2 Multi-lingual models and Adaptation for ASR Poster Thu-P-9-2-8 874 Generalized Distillation Framework For Speaker Normalization Neethu Mariam Joy, Sandeep Reddy Kothinti, Srinivasan Umesh, Basil Abraham
2017-08-24 10:00-12:00 Poster 2 Multi-lingual models and Adaptation for ASR Poster Thu-P-9-2-9 1136 Learning Factorized Transforms for Unsupervised Adaptation of LSTM-RNN Acoustic Models Lahiru Samarakoon, Brian Mak, Khe Chai Sim
2017-08-24 10:00-12:00 Poster 2 Multi-lingual models and Adaptation for ASR Poster Thu-P-9-2-10 1365 Factorised representations for neural network adaptation to diverse acoustic environments Joachim Fainberg, Steve Renals, Peter Bell
2017-08-24 10:00-12:00 Poster 4 Speech Synthesis: Data, Evaluation, and novel paradigms Poster Thu-P-9-4-1 171 Principles for learning controllable TTS from annotated and latent variation Gustav Eje Henter, Jaime Lorenzo-Trueba, Xin Wang, Junichi Yamagishi
2017-08-24 10:00-12:00 Poster 4 Speech Synthesis: Data, Evaluation, and novel paradigms Poster Thu-P-9-4-2 362 Sampling-based speech parameter generation using moment-matching networks Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari
2017-08-24 10:00-12:00 Poster 4 Speech Synthesis: Data, Evaluation, and novel paradigms Poster Thu-P-9-4-3 428 Unit selection with Hierarchical Cascaded Long Short Term Memory Bidirectional Recurrent Neural Nets Vincent Pollet, Enrico Zovato, Sufian Irhimeh, Pier Batzu
2017-08-24 10:00-12:00 Poster 4 Speech Synthesis: Data, Evaluation, and novel paradigms Poster Thu-P-9-4-4 465 Utterance Selection for Optimizing Intelligibility of TTS Voices Trained on ASR Data Erica Cooper, Xinyue Wang, Alison Chang, Yocheved Levitan, Julia Hirschberg
2017-08-24 10:00-12:00 Poster 4 Speech Synthesis: Data, Evaluation, and novel paradigms Poster Thu-P-9-4-5 479 Bias and Statistical Significance in Evaluating Speech Synthesis with Mean Opinion Scores Andrew Rosenberg, Bhuvana Ramabhadran
2017-08-24 10:00-12:00 Poster 4 Speech Synthesis: Data, Evaluation, and novel paradigms Poster Thu-P-9-4-6 587 Phase Modeling using Integrated Linear Prediction Residual for Statistical Parametric Speech Synthesis. Nagaraj Adiga, S R Mahadeva Prasanna
2017-08-24 10:00-12:00 Poster 4 Speech Synthesis: Data, Evaluation, and novel paradigms Poster Thu-P-9-4-7 802 Evaluation of a Silent Speech Interface based on Magnetic Sensing and Deep Learning for a Phonetically Rich Vocabulary Jose A. Gonzalez, Lam A. Cheah, Phil D. Green, James M. Gilbert, Stephen R. Ell, Roger Moore, Ed Holdsworth
2017-08-24 10:00-12:00 Poster 4 Speech Synthesis: Data, Evaluation, and novel paradigms Poster Thu-P-9-4-8 894 Predicting Head Pose from Speech with a Conditional Variational Autoencoder David Greenwood, Stephen Laycock, Iain Matthews
2017-08-24 10:00-12:00 Poster 4 Speech Synthesis: Data, Evaluation, and novel paradigms Poster Thu-P-9-4-9 1250 Real-time reactive speech synthesis: incorporating interruptions Mirjam Wester, David Braude, Blaise Potard, Matthew Aylett, Francesca Shaw
2017-08-24 10:00-12:00 Poster 4 Speech Synthesis: Data, Evaluation, and novel paradigms Poster Thu-P-9-4-10 1420 A Neural Parametric Singing Synthesizer Merlijn Blaauw, Jordi Bonada
2017-08-24 10:00-12:00 Poster 4 Speech Synthesis: Data, Evaluation, and novel paradigms Poster Thu-P-9-4-11 1452 Tacotron: Towards End-To-End Speech Synthesis Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, Quoc Le, Yannis Agiomyrgiannakis, Rob Clark, Rif A. Saurous
2017-08-24 10:00-12:00 Poster 4 Speech Synthesis: Data, Evaluation, and novel paradigms Poster Thu-P-9-4-12 1798 Siri On-Device Deep Learning-Guided Unit Selection Text-to-Speech System Tim Capes, Alistair Conkie, Ladan Golipour, Abie Hadjitarkhani, Qiong Hu, Nancy Huddleston, Jiangchuan Li, Matthias Neeracher, Kishore Prahallad, Tuomo Raitio, Ramya Rasipuram, Greg Townsend, David Winarsky, Zhizheng Wu, Hepeng Zhang
2017-08-24 10:00-12:00 Poster 4 Speech Synthesis: Data, Evaluation, and novel paradigms Poster Thu-P-9-4-13 402 An Expanded Taxonomy of Semiotic Classes for Text Normalization Daan van Esch, Richard Sproat
2017-08-24 10:00-12:00 Poster 3 Styles, varieties, forensics & tools Poster Thu-P-9-3-1 1579 The effects of real and placebo alcohol on deaffrication Urban Zihlmann
2017-08-24 10:00-12:00 Poster 3 Styles, varieties, forensics & tools Poster Thu-P-9-3-2 1390 Polyglot and Speech Corpus Tools: a system for representing, integrating, and querying speech corpora Michael McAuliffe, Elias Stengel-Eskin, Michaela Socolof, Morgan Sonderegger
2017-08-24 10:00-12:00 Poster 3 Styles, varieties, forensics & tools Poster Thu-P-9-3-3 1508 Mapping across feature spaces in forensic voice comparison: the contribution of auditory-based voice quality to (semi-)automatic system testing Vincent Hughes, Philip Harrison, Paul Foulkes, Peter French, Colleen Kavanagh, Eugenia San Segundo
2017-08-24 10:00-12:00 Poster 3 Styles, varieties, forensics & tools Poster Thu-P-9-3-4 449 Effect of Language, Speaking Style and Speaker on Long-term F0 Estimation Pablo Arantes, Anders Eriksson, Suska Gutzeit
2017-08-24 10:00-12:00 Poster 3 Styles, varieties, forensics & tools Poster Thu-P-9-3-5 1503 Stability of prosodic characteristics across age and gender groups Jan Volín, Tereza Tykalova, Tomáš Bořil
2017-08-24 10:00-12:00 Poster 3 Styles, varieties, forensics & tools Poster Thu-P-9-3-6 1392 Electrophysiological correlates of familiar voice recognition Julien Plante-Hebert, Victor Boucher, Boutheina Jemel
2017-08-24 10:00-12:00 Poster 3 Styles, varieties, forensics & tools Poster Thu-P-9-3-7 1280 Developing an Embosi (Bantu C25) Speech Variant Dictionary to Model Vowel Elision and Morpheme Deletion Jamison Cooper-Leavitt, Lori Lamel, Annie Rialland, Martine Adda-Decker, Gilles Adda
2017-08-24 10:00-12:00 Poster 3 Styles, varieties, forensics & tools Poster Thu-P-9-3-8 1448 Rd as a control parameter to explore affective correlates of the tense-lax continuum Andy Murphy, Irena Yanushevskaya, Ailbhe Ní Chasaide, Christer Gobl
2017-08-24 10:00-12:00 Poster 3 Styles, varieties, forensics & tools Poster Thu-P-9-3-9 7 Cross-linguistic Distinctions between Professional and Non-Professional Speaking Styles Plinio Barbosa, Sandra Madureira, Philippe Boula de Mareüil
2017-08-24 10:00-12:00 Poster 3 Styles, varieties, forensics & tools Poster Thu-P-9-3-10 990 Perception and production of word-final /ʁ/ in broadcast and spontaneous French Cedric Gendrot
2017-08-24 10:00-12:00 Poster 3 Styles, varieties, forensics & tools Poster Thu-P-9-3-11 882 Glottal source estimation from coded telephone speech using a deep neural network Narendra N P, Manu Airaksinen, Paavo Alku
2017-08-24 10:00-12:00 Poster 3 Styles, varieties, forensics & tools Poster Thu-P-9-3-12 971 Automatic Labelling of Prosodic Prominence, Phrasing and Disfluencies in French Speech by Simulating the Perception of Naïve and Expert Listeners George Christodoulides, Mathieu Avanzi, Anne Catherine Simon
2017-08-24 10:00-12:00 Poster 3 Styles, varieties, forensics & tools Poster Thu-P-9-3-13 164 Don’t Count on ASR to Transcribe for You: Breaking Bias with Two Crowds Michael Levit, Yan Huang, Shuangyu Chang, Yifan Gong
2017-08-24 10:00-12:00 Poster 3 Styles, varieties, forensics & tools Poster Thu-P-9-3-14 363 Effects of training data variety in generating glottal pulses from acoustic features with DNNs Manu Airaksinen, Paavo Alku
2017-08-24 10:00-12:00 Poster 3 Styles, varieties, forensics & tools Poster Thu-P-9-3-15 406 Towards Intelligent Crowdsourcing for Audio Data Annotation: Integrating Active Learning in the Real World Simone Hantke, Zixing Zhang, Björn Schuller
2017-08-24 13:30-13:50 A2 Robust Speaker Recognition Oral Thu-O-10-2-1 430 CNN-based joint mapping of short and long utterance i-vectors for speaker verification using short utterances Jinxi Guo, Usha Nookala, Abeer Alwan
2017-08-24 13:50-14:10 A2 Robust Speaker Recognition Oral Thu-O-10-2-2 1199 Curriculum Learning based Probabilistic Linear Discriminant Analysis for Noise Robust Speaker Recognition Shivesh Ranjan, Abhinav Misra, John H.L. Hansen
2017-08-24 14:10-14:30 A2 Robust Speaker Recognition Oral Thu-O-10-2-3 731 I-vector Transformation Using a Novel Discriminative Denoising Autoencoder for Noise-robust Speaker Recognition Shivangi Mahto, Hitoshi Yamamoto, Takafumi Koshinaka
2017-08-24 14:30-14:50 A2 Robust Speaker Recognition Oral Thu-O-10-2-4 727 Unsupervised Discriminative Training of PLDA for Domain Adaptation in Speaker Verification Qiongqiong Wang, Takafumi Koshinaka
2017-08-24 14:50-15:10 A2 Robust Speaker Recognition Oral Thu-O-10-2-5 1240 Speaker Verification Under Adverse Conditions Using I-vector Adaptation and Neural Networks Md Jahangir Alam, Patrick Kenny, Gautam Bhattacharya, Marcel Kockmann
2017-08-24 15:10-15:30 A2 Robust Speaker Recognition Oral Thu-O-10-2-6 605 Improving Robustness of Speaker Recognition to New Conditions Using Unlabeled Data Diego Castan, Mitchell McLaren, Luciana Ferrer, Aaron Lawson, Alicia Lozano-Diez
2017-08-24 13:30-15:30 E10 Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) II Special Session Thu-SS-10-10-1 43 The INTERSPEECH 2017 Computational Paralinguistics Challenge: Addressee, Cold & Snoring Björn Schuller, Stefan Steidl, Anton Batliner, Elika Bergelson, Jarek Krajewski, Christoph Janott, Andrei Amatuni, Marisa Casillas, Amanda Seidl, Melanie Soderstrom, Anne Warlaumont, Guillermo Hidalgo, Sebastian Schnieder, Clemens Heiser, Winfried Hohenhorst, Michael Herzog, Maximilian Schmitt, Kun Qian, Yue Zhang, George Trigeorgis, Panagiotis Tzirakis, Stefanos Zafeiriou
2017-08-24 13:30-15:30 E10 Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) II Special Session Thu-SS-10-10-2 173 An ‘End-to-Evolution’ Hybrid Approach for Snore Sound Classification Michael Freitag, Shahin Amiriparian, Nicholas Cummins, Maurice Gerczuk, Björn Schuller
2017-08-24 13:30-15:30 E10 Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) II Special Session Thu-SS-10-10-3 434 Snore Sound Classification Using Image-based Deep Spectrum Features Shahin Amiriparian, Maurice Gerczuk, Sandra Ottl, Nicholas Cummins, Michael Freitag, Sergey Pugachevskiy, Alice Baird, Björn Schuller
2017-08-24 13:30-15:30 E10 Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) II Special Session Thu-SS-10-10-4 653 Introducing Weighted Kernel Classifiers for Handling Imbalanced Paralinguistic Corpora: Snoring, Addressee and Cold Heysem Kaya, Alexey Karpov
2017-08-24 13:30-15:30 E10 Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) II Special Session Thu-SS-10-10-5 905 DNN-based Feature Extraction and Classifier Combination for Child-Directed Speech, Cold and Snoring Identification Gábor Gosztolya, Róbert Busa-Fekete, Tamás Grósz, László Tóth
2017-08-24 13:30-15:30 E10 Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) II Special Session Thu-SS-10-10-6 1066 Infected Phonemes: How a Cold Impairs Speech on a Phonetic Level Johannes Wagner, Thiago Fraga-Silva, Yvan Josse, Dominik Schiller, Andreas Seiderer, Elisabeth André
2017-08-24 13:30-15:30 E10 Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) II Special Session Thu-SS-10-10-7 1211 A dual source-filter model of snore audio for snorer group classification Achuth Rao MV, Shivani Yadav, Prasanta Ghosh
2017-08-24 13:30-15:30 E10 Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) II Special Session Thu-SS-10-10-8 1261 It sounds like you have a cold! Testing voice features for the Interspeech 2017 Computational Paralinguistics Cold Challenge Mark Huckvale, András Beke
2017-08-24 13:30-15:30 E10 Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) II Special Session Thu-SS-10-10-9 1378 Exploring Fusion Methods and Feature Space for the Classification of Paralinguistic Information David Tavarez, Xabier Sarasola, Agustin Alonso, Jon Sanchez, Luis Serrano, Eva Navas, Inma Hernáez
2017-08-24 13:30-15:30 E10 Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) II Special Session Thu-SS-10-10-10 1445 End-to-End Deep Learning Framework for Speech Paralinguistics Detection Based on Perception Aware Spectrum Danwei Cai, Zhidong Ni, Wenbo Liu, Weicheng Cai, Gang Li, Ming Li
2017-08-24 13:30-15:30 E10 Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) II Special Session Thu-SS-10-10-11 1550 Phoneme state posteriorgram features for speech based automatic classification of speakers in cold and healthy conditions Akshay Kalkunte Suresh, Srinivasa Raghavan K M, Prasanta Ghosh
2017-08-24 13:30-15:30 E10 Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) II Special Session Thu-SS-10-10-12 1794 An Integrated Solution for Snoring Sound Classification Using Bhattacharyya Distance based GMM Supervectors with SVM, Feature Selection with Random Forest and Spectrogram with CNN Tin Lay Nwe, Tran Huy Dat, Ng Wen Zheng Terence, Bin Ma
2017-08-24 13:30-13:50 B4 Multimodal resources and annotation Oral Thu-O-10-4-1 1305 CALYOU: A Comparable Spoken Algerian Corpus Harvested from YouTube Karima Abidi, Mohamed amine Menacer, Kamel Smaili
2017-08-24 13:50-14:10 B4 Multimodal resources and annotation Oral Thu-O-10-4-2 242 PRAV: A Phonetically Rich Audio Visual Corpus Abhishek Avinash Narwekar, Prasanta Ghosh
2017-08-24 14:10-14:30 B4 Multimodal resources and annotation Oral Thu-O-10-4-3 860 NTCD-TIMIT: A New Database and Baseline for Noise-robust Audio-visual Speech Recognition Ahmed Hussen Abdelaziz
2017-08-24 14:30-14:50 B4 Multimodal resources and annotation Oral Thu-O-10-4-4 1555 The Extended SPaRKy Restaurant Corpus: designing a corpus with variable information density David M. Howcroft, Dietrich Klakow, Vera Demberg
2017-08-24 14:50-15:10 B4 Multimodal resources and annotation Oral Thu-O-10-4-5 1115 Automatic Construction of the Finnish Parliament Speech Corpus André Mansikkaniemi, Peter Smit, Mikko Kurimo
2017-08-24 15:10-15:30 B4 Multimodal resources and annotation Oral Thu-O-10-4-6 1357 Building audio-visual phonetically annotated Arabic corpus for expressive text to speech Omnia Abdo, Sherif Abdou, Mervat Fashal
2017-08-24 10:00-12:00 F11 Special Session: State of the art in physics-based voice simulation Special Session Thu-SS-9-11-1 107 Acoustic analysis of detailed three-dimensional shape of the human nasal cavity and paranasal sinuses Tatsuya Kitamura, Hironori Takemoto, Hisanori Makinae, Tetsutaro Yamaguchi, Kotaro Maki
2017-08-24 10:00-12:00 F11 Special Session: State of the art in physics-based voice simulation Special Session Thu-SS-9-11-2 448 A semi-polar grid strategy for the three-dimensional finite element simulation of vowel-vowel sequences Marc Arnela, Saeed Dabbaghchian, Oriol Guasch, Olov Engwall
2017-08-24 10:00-12:00 F11 Special Session: State of the art in physics-based voice simulation Special Session Thu-SS-9-11-3 844 A Fast Robust 1D Flow Model for a Self-Oscillating Coupled 2D FEM Vocal Fold Simulation Arvind Vasudevan, Victor Zappi, Peter Anderson, Sidney Fels
2017-08-24 10:00-12:00 F11 Special Session: State of the art in physics-based voice simulation Special Session Thu-SS-9-11-4 875 Waveform patterns in pitch glides near a vocal tract resonance Tiina Murtola, Jarmo Malinen
2017-08-24 10:00-12:00 F11 Special Session: State of the art in physics-based voice simulation Special Session Thu-SS-9-11-5 1239 A unified numerical simulation of vowel production that comprises phonation and the emitted sound Niyazi Cem Degirmenci, Johan Jansson, Johan Hoffman, Marc Arnela, Patricia Sanchez-Martin, Oriol Guasch, Pr. Sten Ternström
2017-08-24 10:00-12:00 F11 Special Session: State of the art in physics-based voice simulation Special Session Thu-SS-9-11-6 1614 Synthesis of VV Utterances from Muscle Activation to Sound with a 3D Model Saeed Dabbaghchian, Marc Arnela, Olov Engwall, Oriol Guasch
2017-08-24 13:30-13:50 Main hall Neural Network Acoustic Models for ASR III Oral Thu-O-10-1-1 892 Deep Neural Factorization for Speech Recognition Jen-Tzung Chien, Chen Shen
2017-08-24 13:50-14:10 Main hall Neural Network Acoustic Models for ASR III Oral Thu-O-10-1-2 1385 Semi-supervised DNN training with word selection for ASR Karel Vesely, Lukas Burget, Jan Černocký
2017-08-24 14:10-14:30 Main hall Neural Network Acoustic Models for ASR III Oral Thu-O-10-1-3 751 Gaussian Prediction based Attention for Online End-to-End Speech Recognition Junfeng Hou, ShiLiang Zhang, Lirong Dai
2017-08-24 14:30-14:50 Main hall Neural Network Acoustic Models for ASR III Oral Thu-O-10-1-4 614 Efficient knowledge distillation from an ensemble of teachers Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata, Samuel Thomas, Jia Cui, Bhuvana Ramabhadran
2017-08-24 14:50-15:10 Main hall Neural Network Acoustic Models for ASR III Oral Thu-O-10-1-5 232 An Analysis of “Attention” in Sequence-to-Sequence Models Rohit Prabhavalkar, Tara Sainath, Bo Li, Kanishka Rao, Navdeep Jaitly
2017-08-24 15:10-15:30 Main hall Neural Network Acoustic Models for ASR III Oral Thu-O-10-1-6 1566 Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition Hagen Soltau, Hank Liao, Hasim Sak
2017-08-24 13:30-13:50 F11 Speech and audio segmentation and classification 1 Oral Thu-O-10-11-1 524 Occupancy Detection in Commercial and Residential Environments Using Audio Signal Shabnam Ghaffarzadegan, Attila Reiss, Mirko Ruhs, Robert Duerichen, Zhe Feng
2017-08-24 13:50-14:10 F11 Speech and audio segmentation and classification 1 Oral Thu-O-10-11-2 685 Data Augmentation, Missing Feature Mask and Kernel Classification for Through-The-Wall Acoustic Surveillance Tran-Huy Dat, Wen Zheng Terence Ng, Yi Ren Leng
2017-08-24 14:10-14:30 F11 Speech and audio segmentation and classification 1 Oral Thu-O-10-11-3 284 Endpoint detection using grid long short-term memory network for streaming speech recognition Shuo-Yiin Chang, Bo Li, Tara Sainath, Gabor Simko, Carolina Parada
2017-08-24 14:30-14:50 F11 Speech and audio segmentation and classification 1 Oral Thu-O-10-11-4 666 Deep Learning Techniques in Tandem with Signal Processing Cues for Phonetic Segmentation for Text to Speech Synthesis in Indian Languages Arun Baby, Jeena Prakash, Rupak Vignesh, Hema Murthy
2017-08-24 14:50-15:10 F11 Speech and audio segmentation and classification 1 Oral Thu-O-10-11-5 877 Gate Activation Signal Analysis for Gated Recurrent Neural Networks and Its Correlation with Phoneme Boundaries Yu-Hsuan Wang, Cheng-Tao Chung, Hung-yi Lee
2017-08-24 15:10-15:30 F11 Speech and audio segmentation and classification 1 Oral Thu-O-10-11-6 65 Speaker Change Detection in Broadcast TV using Bidirectional Long Short-Term Memory Networks Ruiqing Yin, Hervé Bredin, Claude Barras
2017-08-24 10:00-10:20 Main hall Discriminative training for ASR Oral Thu-O-9-1-1 1118 Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition Shubham Toshniwal, Hao Tang, Liang Lu, Karen Livescu
2017-08-24 10:20-10:40 Main hall Discriminative training for ASR Oral Thu-O-9-1-2 639 Optimizing expected word error rate via sampling for speech recognition Matt Shannon
2017-08-24 10:40-11:00 Main hall Discriminative training for ASR Oral Thu-O-9-1-3 231 Annealed F-smoothing as a Mechanism to Speed up Neural Network Training Tara Sainath, Vijay Peddinti, Olivier Siohan, Arun Narayanan
2017-08-24 11:00-11:20 Main hall Discriminative training for ASR Oral Thu-O-9-1-4 583 Non-Uniform MCE Training of Deep Long Short-Term Memory Recurrent Neural Networks for Keyword Spotting Zhong Meng, Biing-Hwang (Fred) Juang
2017-08-24 11:20-11:40 Main hall Discriminative training for ASR Oral Thu-O-9-1-5 1784 Exploiting Eigenposteriors for Semi-supervised Training of DNN Acoustic Models with Sequence Discrimination Pranay Dighe, Afsaneh Asaei, Herve Bourlard
2017-08-24 11:40-12:00 Main hall Discriminative training for ASR Oral Thu-O-9-1-6 221 Discriminative Autoencoders for Acoustic Modeling Ming-Han Yang, Hung-Shin Lee, Yu-Ding Lu, Kuan-Yu Chen, Yu Tsao, Berlin Chen, Hsin-Min Wang
2017-08-24 10:00-10:20 D8 Speech Recognition: Multimodal systems Oral Thu-O-9-8-1 85 Combining Residual Networks with LSTMs for Lipreading Themos Stafylakis, Georgios Tzimiropoulos
2017-08-24 10:20-10:40 D8 Speech Recognition: Multimodal systems Oral Thu-O-9-8-2 106 Improving computer lipreading via DNN sequence discriminative training techniques Kwanchiva Thangthai, Richard Harvey
2017-08-24 10:40-11:00 D8 Speech Recognition: Multimodal systems Oral Thu-O-9-8-3 421 Improving Speaker-Independent Lipreading with Domain-Adversarial Training Michael Wand, Jürgen Schmidhuber
2017-08-24 11:00-11:20 D8 Speech Recognition: Multimodal systems Oral Thu-O-9-8-4 799 Turbo Decoders for Audio-visual Continuous Speech Recognition Ahmed Hussen Abdelaziz
2017-08-24 11:20-11:40 D8 Speech Recognition: Multimodal systems Oral Thu-O-9-8-5 939 DNN-based Ultrasound-to-Speech Conversion for a Silent Speech Interface Tamás Gábor Csapó, Tamás Grósz, Gábor Gosztolya, László Tóth, Alexandra Markó
2017-08-24 11:40-12:00 D8 Speech Recognition: Multimodal systems Oral Thu-O-9-8-6 502 Visually grounded learning of keyword prediction from untranscribed speech Herman Kamper, Shane Settle, Gregory Shakhnarovich, Karen Livescu
2017-08-24 13:30-15:30 E397 Show & Tell 8 Show&Tell Thu-S&T-10-B-1 10033 Reading validation for pronunciation evaluation in the Digitala project Aku Rouhe, Reima Karhila, Peter Smit, Mikko Kurimo
2017-08-24 13:30-15:30 E397 Show & Tell 8 Show&Tell Thu-S&T-10-B-2 10040 Visual Learning 2: Pronunciation app using ultrasound, video, and MRI Kyori Suzuki, Ian Wilson, Hayato Watanabe
2017-08-24 13:30-15:30 E397 Show & Tell 8 Show&Tell Thu-S&T-10-B-3 10046 SIAK – A Game for Foreign Language Pronunciation Learning Reima Karhila, Sari Ylinen, Seppo Enarvi, Kalle Palomäki, Aleksander Nikulin, Olli Rantula, Vertti Viitanen, Krupakar Dhinakaran, Anna-Riikka Smolander, Heini Kallio, Maria Uther, Katja Junttila, Perttu Hämäläinen, Mikko Kurimo
2017-08-24 13:30-15:30 E397 Show & Tell 8 Show&Tell Thu-S&T-10-B-4 10053 MetaLab: A repository for meta-analyses on language development, and more Sho Tsuji, Christina Bergmann, Molly Lewis, Mika Braginsky, Page Piccinini, Michael C. Frank, Alejandrina Cristia
2017-08-24 13:30-15:30 E397 Show & Tell 8 Show&Tell Thu-S&T-10-B-5 10058 MoPAReST – Mobile Phone Assisted Remote Speech Therapy Platform Chitralekha Bhat, Anjali Kant, Bhavik Vachhani, Sarita Rautara, Ashok Kumar Sinha, Sunil Kumar Kopparapu
2017-08-24 10:00-10:20 B4 Spoken Term Detection Oral Thu-O-9-4-1 1328 A Rescoring Approach for Keyword Search Using Lattice Context Information Zhipeng Chen, Ji Wu
2017-08-24 10:20-10:40 B4 Spoken Term Detection Oral Thu-O-9-4-2 601 The Kaldi OpenKWS System: Improving Low Resource Keyword Search Jan Trmal, Matthew Wiesner, Vijayaditya Peddinti, Xiaohui Zhang, Pegah Ghahremani, Vimal Manohar, Yiming Wang, Hainan Xu, Dan Povey, Sanjeev Khudanpur
2017-08-24 10:40-11:00 B4 Spoken Term Detection Oral Thu-O-9-4-3 1212 The STC Keyword Search System For OpenKWS 2016 Evaluation Yuri Khokhlov, Ivan Medennikov, Aleksei Romanenko, Valentin Mendelev, Maxim Korenevsky, Alexey Prudnikov, Natalia Tomashenko, Alexander Zatvornitskiy
2017-08-24 11:00-11:20 B4 Spoken Term Detection Oral Thu-O-9-4-4 480 Compressed time delay neural network for small-footprint keyword spotting Ming Sun, David Snyder, Yixin Gao, Varun Nagaraja, Mike Rodehorst, Sankaran Panchapagesan, Nikko Strom, Spyros Matsoukas, Shiv Vitaladevuni
2017-08-24 11:20-11:40 B4 Spoken Term Detection Oral Thu-O-9-4-5 904 Symbol sequence search from telephone conversation Masayuki Suzuki, Gakuto Kurata, Abhinav Sethy, Bhuvana Ramabhadran, Kenneth Church, Mark Drake
2017-08-24 11:40-12:00 B4 Spoken Term Detection Oral Thu-O-9-4-6 1273 Similarity Learning Based Query Modeling for Keyword Search Batuhan Gundogdu, Murat Saraclar
2017-08-24 08:30-09:30 Main hall Keynote 3: Björn Lindblom Keynote Thu-Keynote-1-1 3004 3004 Re-inventing speech – the biological way

 

The mapping of the Speech Chain has so far been focused on the experimentally more accessible links – e g, acoustics – whereas the brain’s activity during speaking and listening has understandably received less attention. That state of affairs is about to change now thanks to the new sophisticated tools offered by brain imaging technology. At present many key questions concerning human speech processes remain incompletely understood despite the significant research efforts of the past half century. As speech research goes neuro we could do with some better answers. In this paper I will attempt to shed some light on some of the issues. I will do so by heeding the advice that Tinbergen once gave his fellow biologists on explaining behavior. I paraphrase: Nothing in biology makes sense unless you simultaneously look at it with the following questions at the back of your mind: How did it evolve? How is it acquired? How does it work here and now? Applying the Tinbergen strategy to speech I will, in broad strokes, trace a path from the small and fixed innate repertoires of non-human primates to the open-ended vocal systems that humans learn today. Such an agenda will admittedly identify serious gaps in our present knowledge but, importantly, it will also bring an overarching possibility: It will strongly suggest the feasibility of bypassing the traditional linguistic operational approach to speech units and replacing it by a first-principles account anchored in biology. I will argue that this is the road-map we need for a more profound understanding of the fundamental nature spoken language and for educational, medical and technological applications.


 

2017-08-22 08:30-09:30 Main hall Keynote 1: James Allen Keynote Tue-Keynote-1-1 3002 3002 Dialogue as Collaborative Problem Solving

 

I will describe the current status of a long-term effort at developing dialogue systems that go beyond simple task execution models to systems that involve collaborative problem solving. Such systems involve open-ended discussion and the tasks cannot be accomplished without extensive interaction (e.g., 10 turns or more). The key idea is that dialogue itself arises from an agent’s ability for collaborative problem solving (CPS). In such dialogues, agents may introduce, modify and negotiate goals; propose and discuss the merits possible paths to solutions; explicitly discuss progress as the two agents work towards the goals; and evaluate how well a goal was accomplished. To complicate matters, user utterances in such settings are much more complex than seen in simple task execution dialogues and requires full semantic parsing. A key question we have been exploring in the past few years is how much of dialogue can be accounted for by domain-independent mechanisms. I will discuss these issues and draw examples from a dialogue system we have built that, except for the specialized domain reasoning required in each case, uses the same architecture to perform three different tasks: collaborative blocks world planning, when the system and user build structures and may have differing goals; biocuration, in which a biologist and the system interact in order to build executable causal models of biological pathways; and collaborative composition, where the user and system collaborate to compose simple pieces of music.


 

2017-08-23 08:30-09:30 Main hall Keynote 2: Catherine Pelachaud Keynote Wed-Keynote-1-1 3003 3003 Conversing with social agents that smile and laugh

 

Our aim is to create virtual conversational partners. As such we have developed computational models to enrich virtual characters with socio-emotional capabilities that are communicated through multimodal behaviors. The approach we follow to build interactive and expressive interactants relies on theories from human and social sciences as well as data analysis and user-perception based design. We have explored specific social signals such as smile and laughter, capturing their variation in production but also their different communicative functions and their impact in human-agent interaction. Lately we have been interested in modeling agents with social attitudes. Our aim is to model how social attitudes color the multimodal behaviors of the agents. We have gathered a corpus of dyads that was annotated along two layers: social attitudes and nonverbal behaviors. By applying sequence mining methods we have extracted behavior patterns involved in the change of perception of an attitude. We are particularly interested in capturing the behaviors that correspond to a change of perception of an attitude. In this talk I will present the GRETA/VIB platform where our research is implemented.


 

2017-08-21 09:45-10:15 Main hall ISCA Medal 2017 Ceremony Keynote Mon-Keynote-1-1 3001 3001 ISCA Medal 2017 Ceremony

 

Fumitada Itakura was born in Toyokawa, in Japan, in August 1940. He studied electronic engineering at Nagoya University,1958-1963. He advanced to its graduate school and studied information engineering such as statistical optical character recognition and time series analysis of cardiac rhythmicity. After finishing his master degree in 1965, he he has been working on speech signal processing using statistical approach. He received the Doctor of engineering from Nagoya University in 1971 for his work a statistical method for speech analysis and synthesis. Itakura’s early work on speech spectral envelope and formant estimation using the maximum likelihood methods (1967) laid the ground work for much of the research work in speech signal processing in the three subsequent decades, ranging from vocoder designs for low bit-rate transmission to distance measures(Itakura-Saito distance) for speech pattern recognition. He introduced the concepts of the auto-regressive model and the partial auto-correlation to the speech area and developed a first mathematically tractable formulation of the speech recognition problem based on the minimum prediction residual principle, providing a solid framework for integrating speech analysis, representation, and pattern matching into a complete engineering system. His work on the autoregressive modeling of speech is used in almost every low- to-medium bit rate speech transmission system. The Line Spectral Pair (LSP) representation, which he developed in the 1975, is now used in nearly every cellular phone system and handset. Itakura and Hong Wang’s recent work in sub-band dereverberation algorithms has also become the foundation of many new breakthroughs. His singular and yet broad contributions to speech signal processing earned him the IEEE Morris Liebmann Award in 1986, the most prestigious Society Award from the IEEE Signal Processing Society in 1996, IEEE Fellow in 2003, the Purple Ribbon Medal from Japanese government in 2003 and the Distinguished Achievement and Contributions Award from IEICE in 2003. These technical achievement was performed mainly at Nagoya University(1965-68), the 4th research section of Musashino Electrical Communication Laboratory of NTT(1963-73, 1975-1983) and Acoustic research laboratory (1973-75) of Bell Telephone laboratories, Murray Hill, Nagoya university (1983-2003 again), and Meijo University (2003-2011).