The past decade has seen great strides taken in the field of image recognition in video content. Artificial intelligence and machine learning have given us systems that can ‘watch’ video clips and identify activities taking place in them. AI can accurately identify simple activities such as cycling, doing a pull-up or ice skating in short video clips featuring a limited number of people. But there is as yet no way to reach the same level of accuracy in longer video streams where multiple people interact simultaneously and are involved in more complex causal activities. Professor Cees Snoek and his team at the University of Amsterdam are working on a range of projects in this area, from those that make the best use of the current state of the art systems, to those which will help push the possibilities of this field into the future.
With so many potential applications for automated video understanding - for example in self-driving cars, cashier-free retail, or content-moderation in social media – researchers around the world are hard at work on optimising these technologies, which are increasingly finding their way into our everyday lives. At the UvA, Snoek and his team are currently working on various projects that illustrate the way these technologies can be used in practice.
The team’s ‘healthcare’ project involves the UvA spin-off Kepler Vision Technologies. Kepler makes use of the world’s first-ever body language recognition software. The software looks into video streams and can recognise a human’s body language, poses and actions. Kepler uses this ability to produce applications intended for use in care for the elderly. Elderly care is intensive for nurses, with skilled care workers often thoroughly overworked. Kepler’s software helps by monitoring the clients and recognising when they need care. For example, the Kepler Night Nurse can recognise when a client is struggling, or cannot get out of bed, or, equally, if a client has remained in the bathroom for a concerning amount of time. It can also distinguish between someone lying on the floor because of a fall and someone lying on a couch to rest. If the Night Nurse spots one of these potential problems it sends a notification, enabling care home nurses to avoid unnecessary control rounds. In the future, the software will also be able to monitor whether someone is eating and drinking enough or is at risk of becoming socially isolated.
Deep machine learning
Much of the progress made in image recognition so far has been driven by so-called deep machine learning. Loosely inspired by the human brain, deep neural networks learn to associate pixels to labels, so as to predict what happens in previously unseen pixels. Yet it has also become clear that deep learning is reaching the limits of its usefulness for video understanding because it heavily depends on labeled examples - in the worst case per pixel labelling - and huge amounts of computing power. Snoek: ‘The problem is that as video understanding becomes more and more specific, it becomes harder and harder to find examples to teach the systems new activities. For example, a person stealing a bike. This is a common enough occurrence in Amsterdam, but not one that is commonly filmed. So even if you had the manpower to label hundreds of videos and feed them into the system, the problem would be the videos simply don’t exist.’
Safety at Schiphol
So, while systems based on deep learning can be taught to monitor one elderly client in one location, if you multiply the activities in the videos by thousands of people undertaking all sorts of actions, then label supervision becomes near impossible. The knowledge base of the system must therefore come from another source. The team’s ‘safety’ project, which is taking place at Schiphol, is a prime example of this.
Contrary to the popular belief, which is fuelled by depictions in film and TV, today’s video surveillance systems still depend on expensive, daunting and error-prone manual inspection. Automation is challenging because activities of interest are rare, scenes are over-crowded, and the computing demands are enormous.
Our Schiphol project studies these research challenges by exploring AI-tactics that are less demanding in terms of labelled examples, for example by leveraging voice commands instead of pixel labels or by learning from computer graphics generated examples. We will also study new high-performance computing architectures that are scalable, privacy-preserving and secure in terms of their video processing capabilities. All of our research results will be eventually be integrated into a real-time video surveillance search engine to support the human operators in the control room at Schiphol.Professor Cees Snoek
Another of the team’s projects takes an entirely different approach to solving the input conundrum. Their ‘social distancing’ project uses a multi-disciplinary, mixed-methods approach to monitor the effectiveness of the governments’ Covid-19 measures. The instrument the team has developed uniquely combines objective measurements of social distancing behaviour (captured by video surveillance and artificial intelligence technologies), with media content and survey analysis of social attitudes towards the measures. This allows for dynamic tracking and optimisation of the anti-corona measures and could be of use to policymakers, public health officials and disease-simulation scholars.
Snoek: ‘Our monitoring system can easily be applied internationally as well as in the Netherlands, basically in any public or semi-public environments where surveillance cameras are installed. We believe it could be extremely helpful in guiding government policy decision, not only in the current pandemic, but in any similar emergencies in the future.’