ISCA Medal for Scientific Achievements 2017

Fumitada Itakura

 

Fumitada Itakura

Prefessor, Nagoya University, Japan

 

 

Biography

Fumitada Itakura was born in Toyokawa, in Japan, in August 1940. He studied electronic engineering at Nagoya University,1958-1963. He advanced to its graduate school and studied information engineering such as statistical optical character recognition and time series analysis of cardiac rhythmicity. After finishing his master degree in 1965, he he has been working on speech signal processing using statistical approach. He received the Doctor of engineering from Nagoya University in 1971 for his work a statistical method for speech analysis and synthesis.
Itakura’s early work on speech spectral envelope and formant estimation using the maximum likelihood methods (1967) laid the ground work for much of the research work in speech signal processing in the three subsequent decades, ranging from vocoder designs for low bit-rate transmission to distance measures(Itakura-Saito distance) for speech pattern recognition. He introduced the concepts of the auto-regressive model and the partial auto-correlation to the speech area and developed a first mathematically tractable formulation of the speech recognition problem based on the minimum prediction residual principle, providing a solid framework for integrating speech analysis, representation, and pattern matching into a complete engineering system. His work on the autoregressive modeling of speech is used in almost every low- to-medium bit rate speech transmission system. The Line Spectral Pair (LSP) representation, which he developed in the 1975, is now used in nearly every cellular phone system and handset. Itakura and Hong Wang’s recent work in sub-band dereverberation algorithms has also become the foundation of many new breakthroughs. His singular and yet broad contributions to speech signal processing earned him the IEEE Morris Liebmann Award in 1986, the most prestigious Society Award from the IEEE Signal Processing Society in 1996, IEEE Fellow in 2003, the Purple Ribbon Medal from Japanese government in 2003 and the Distinguished Achievement and Contributions Award from IEICE in 2003. These technical achievement was performed mainly at Nagoya University(1965-68), the 4th research section of Musashino Electrical Communication Laboratory of NTT(1963-73、1975-1983) and Acoustic research laboratory(1973-75) of Bell Telephone laboratories, Murray Hill、 Nagoya university(1983-2003again), and Meijo University(2003-2011).

Keynote Speakers

James Allen

James Allen

Professor of Computer Science, University of Rochester
Associate Director of the Institute for Human and Machine Cognition in Pensacola Florida

 

 

 

 

Biography

James Allen is the John H Dessauer Professor of Computer Science at the University of Rochester, and Associate Director of the Institute for Human and Machine Cognition in Pensacola Florida, He is a Founding Fellow of the American Association for Artificial Intelligence (AAAI) and a Fellow of the Cognitive Science Society. He was editor-in-chief of the journal Computational Linguistics from 1983-1993, and authored the well-known textbook “Natural Language Understanding”. His research concerns defining computational models of intelligent collaborative and conversational agents, with a strong focus on the connection between knowledge, reasoning and language comprehension and dialog.

 

Catherine Pelachaud

 Catherine Pelachaud

Director of Research CNRS at ISIR, University of Pierre and Marie Curie

 

 

 

 

Abstract – Conversing with social agents that smile and laugh

Our aim is to create virtual conversational partners. As such we have developed computational models to enrich virtual characters with socio-emotional capabilities that are communicated through multimodal behaviors. The approach we follow to build interactive and expressive interactants relies on theories from human and social sciences as well as data analysis and user-perception based design. We have explored specific social signals such as smile and laughter, capturing their variation in production but also their different communicative functions and their impact in human-agent interaction. Lately we have been interested in modeling agents with social attitudes. Our aim is to model how social attitudes color the multimodal behaviors of the agents. We have gathered a corpus of dyads that was annotated along two layers: social attitudes and nonverbal behaviors. By applying sequence mining methods we have extracted behavior patterns involved in the change of perception of an attitude. We are particularly interested in capturing the behaviors that correspond to a change of perception of an attitude. In this talk I will present the GRETA/VIB platform where our research is implemented.

Biography

Catherine Pelachaud is a Director of Research at CNRS in the laboratory ISIR, University of Pierre and Marie Curie. Her research interest includes embodied conversational agent, nonverbal communication (face, gaze, and gesture), expressive behaviors and socio-emotional agents. With her research team, she has been developing an interactive virtual agent platform GRETA that can display emotional and communicative behaviors. She has been involved and is still involved in several European projects related to believable embodied conversational agents, emotion and social behaviors. She is associate editors of several journals among which IEEE Transactions on Affective Computing, ACM Transactions on Interactive Intelligent Systems and Journal on Multimodal User Interfaces. She has co-edited several books on virtual agents and emotion-oriented systems. She participated to the organization of international conferences such as IVA, ACII and AAMAS, virtual agent track. She is recipient of the ACM – SIGAI Autonomous Agents Research Award 2015.

 

Björn Lindblom

 

Björn Lindblom

Professor emeritus University of Stockholm Sweden
Professor emeritus University of Texas at Austin USA

 

 

 

 

Biography

I began by studying for a medical degree but gradually my focus shifted to music and languages. Planning to make a living as a foreign language teacher I attended classes that happened to include two lectures on acoustic phonetics by Gunnar Fant at KTH in Stockholm. ‘Anyone interested in s summer job? We could use people with a linguistics background’. He then went on to describe the project. Although I cannot honestly say that I had understood much of the lectures, I volunteered and got lucky. I was completely blown away by the dynamics of the KTH lab and its research activities. This was the early sixties – the post-World War II era with lavish funding on communications and computer technology.

Later in life, I came across an anecdote about Richard Feynman, famous physicist who is said to have left the following formulation permanently on the blackboard of his office: ‘What I cannot create I do not understand!

Bingo! Was he referring to the acoustic theory of speech production and copy speech synthesis? In a way, he could have been. More importantly I believe, in this short phrase, he managed to capture the ultimate essence of good science – general knowledge based on first principles. It has been at the back of mind for over fifty years as I have studied how spoken language works on-line, how it is learned and how it came to be.

Applying the Feynman criterion to our own broad field shows that we still have a long way to go. There would be nothing wrong with embarking on that voyage equipped with the tools of Big Data and modern hi-tech neuroscience – on the contrary. But ultimately the quality of our applications – e g clinical, educational –will be a function of how well we really understand how humans do it.

End of sermon. Chop, chop.