报 告 人：Dr Li DENG, Microsoft Research, Redmond, USA
邓力博士，美国微软研究院首席研究员，深度学习技术中心首席研究经理，语音识别与深度学习著名专家、IEEE Fellow、 国际语音通信协会Fellow、美国声学学会Fellow，IEEE Transactions on Audio, Speech and Language Processing主编，IEEE Signal Processing Magazine (IEEE所有期刊中影响因子最高的期刊)前主编，
报告题目：Achievements and Challenges of Deep Learning - From Speech Analysis And Recognition To Language And Multimodal Processing
报告摘要：Artificial neural networks have been around for over half a century and their applications to speech processing have been almost as long, yet it was not until year 2010 that their real impact had been made by a deep form of such networks, built upon part of the earlier work on (shallow) neural nets and (deep) graphical models developed by both speech and machine learning communities. This talk will first reflect on the path to this transformative success, sparked by speech analysis using deep learning methods on spectrogram-like raw features and then progressing rapidly to speech recognition with increasingly larger vocabularies and scale. The role of well-timed academic-industrial collaboration will be highlighted, so will be the advances of big data, big compute, and the seamless integration between the application-domain knowledge of speech and general principles of deep learning. Then, an overview will be given on sweeping achievements of deep learning in speech recognition since its initial success in 2010 (as well as in image recognition and computer vision since 2012). Such achievements have resulted in across-the-board, industry-wide deployment of deep learning. The final part of the talk will look ahead towards stimulating new challenges of deep learning --- making intelligent machines capable of not only hearing (speech) and seeing (vision), but also of thinking with a “mind”; i.e. reasoning and inference over complex, hierarchical relationships and knowledge sources that comprise a vast number of entities and semantic concepts in the real world based in part on multi-sensory data from the user. To this end, language and multimodal processing --- joint exploitation and learning from text, speech/audio, and image/video --- is evolving into a new frontier of deep learning, beginning to be embraced by a mixture of research communities including speech and spoken language processing, natural language processing, computer vision, machine learning, information retrieval, cognitive science, artificial intelligence, and data/knowledge management. A review of recent published studies will be provided on deep learning applied to selected language and multimodal processing tasks, with a trace back to the relevant early connectionist modeling and neural network literature and with future directions in this new exciting deep learning frontier discussed and analyzed.
报告人简历: Li Deng received Ph.D. from the University of Wisconsin-Madison. He was a tenured professor (1989-1999) at the University of Waterloo, Ontario, Canada, and then joined Microsoft Research, Redmond, where he is currently a Principal Research Manager of its Deep Learning Technology Center. Since 2000, he has also been an affiliate full professor at the University of Washington, Seattle, teaching computer speech processing. He has been granted over 60 US or international patents, and has received numerous awards and honors bestowed by IEEE, ISCA, ASA, and Microsoft including the latest IEEE SPS Best Paper Award (2013) on deep neural nets for speech recognition. He authored or co-authored 4 books including the latest one on Deep Learning: Methods and Applications. He is a Fellow of the Acoustical Society of America, a Fellow of the IEEE, and a Fellow of the ISCA. He served as the Editor-in-Chief for IEEE Signal Processing Magazine (2009-2011), and currently as Editor-in-Chief for IEEE Transactions on Audio, Speech and Language Processing. His recent research interests and activities have been focused on deep learning and machine intelligence applied to large-scale text analysis and to speech/language/image multimodal processing, advancing his earlier work with collaborators on speech analysis and recognition using deep neural networks since 2009.