Prof Mubarak Shah
Director Center for Research in Computer Vision, University of Central Florida, USA
Biography: Dr. Mubarak Shah, Trustee Chair Professor of Computer Science, is the founding director of the Center for Research in Computer Vision at UCF. His research interests include: video surveillance, visual tracking,
human activity recognition, visual analysis of crowded scenes, video registration, UAV video analysis, etc. Dr. Shah is a fellow of IEEE, AAAS, IAPR and SPIE. In 2006, he was awarded a Pegasus Professor award, the highest award at UCF.
He is ACM distinguished speaker. He was an IEEE Distinguished Visitor speaker for 1997-2000 and received IEEE Outstanding Engineering Educator Award in 1997. He received the Harris Corporation’s Engineering Achievement Award in 1999,
the TOKTEN awards from UNDP in 1995, 1997, and 2000; Teaching Incentive Program award in 1995 and 2003, Research Incentive Award in 2003 and 2009, Millionaires’ Club awards in 2005 and 2006, University Distinguished Researcher award in 2007,
honorable mention for the ICCV 2005 Where Am I? Challenge Problem, and was nominated for the best paper award in ACM Multimedia Conference in 2005. He is an editor of international book series on Video Computing; editor in chief of Machine Vision
and Applications journal, and an associate editor of ACM Computing Surveys journal. He was an associate editor of the IEEE Transactions on PAMI, and a guest editor of the special issue of International Journal of Computer Vision on Video Computing.
Title: Video Capsule Networks: Human Action Detection, Video Object Segmentation and Video Segmentation Conditioned on Text
Abstract: Employing deep learning, tremendous progress has been made in a very short time in solving difficult computer vision problems and very impressive results have been obtained. This has been achieved mainly employing convolutional neural networks (CNN). Traditional CNN can be used to extract good set of features, but the issue is they do not explicitly model the entities present in the input data. A capsule is a group of neurons which can model different entities or parts of entities. A capsule network provides an effective way to model part-to-whole relationships between entities and allows to learn viewpoint invariant representations. Good results in classifying small images (MNIST) using capsules have been obtained. However, capsules have not been demonstrated on high dimensional data like large images or videos. In this talk, I will present Video Capsule Networks for solving three important problems: Human Action Detection, Video Object Segmentation and Video Segmentation Conditioned on Text.