SCORE is a broad initiative covering the North Sea region in the field of AI and Data Science for Smart Cities. Under this initiative, Amsterdam and other participating cities focus on sharing experiences and open data to solve metropolitan problems. Such projects could be about sustainability, logistics or some other area. The universities involved in SCORE are Bradford, Aarhus and UvA.
In her PhD thesis, Inske Groenen examines the application of deep learning networks to panoramic images to give us a better idea of the urban environment’s degree of liveability. Groenen works at the Multimedia Analytics Lab Amsterdam (MultiX) within the Informatics Institute (IvI). Deep learning networks are systems that learn by seeing examples. This differs from other systems, such as those based on decision trees, which can be pre-programmed so that specific actions are always carried out in the same way.
Training deep learning networks on image data
The data sets used by Groenen, called PanorAMS-gt and PanorAMS-noisy, are limited to images of Amsterdam, at least for now. ‘Amsterdam City Council has its own vehicle driving around, taking panoramic pictures. But, for other cities, we can essentially draw on images from Google Street View as well.’ An advantage of working with the Amsterdam data is that geographical information about objects in the streets, such as rubbish bins and lamp posts, is well documented. The goal is to train deep learning systems on the image data and to use the geographical information to label the images. ‘One of the things this will allow us to do is to combine the photos with a contour map to determine how tall a building is.’
As far as Amsterdam goes, a development pipeline has now been created for 24 types of objects together with their 3D world coordinates and a camera module to convert these to 2D coordinates in panorama images. ‘What you then get from the data are so-called “bounding boxes”, indicating the position of objects in the panorama images. You can do this on a large scale. For Amsterdam, we have a dataset with more than 14 million boxes, to which you can apply various deep learning methods. A challenge here, however, is that the automatic labeling of such a dataset results in many inaccuracies. Sometimes objects are missing because the information is not up to date, or there are errors because information was once incorrectly measured by the municapility. This gives an unclear signal and reduces the performance of the network trained on the data.’ To identify any weaknesses and discover how a network can, in fact, be trained properly, a subset with clean labels was needed. Such a subset, consisting of 7,000 images together with 150,000 boxes all labelled correctly, was achieved via crowdsourcing. This involved large numbers of people worldwide who helped to carry out the work in exchange for payment.
With the aid of the Amsterdam council, a crowdsourcing campaign was initiated with an annotation application on Amazon Mechanical Turk. ‘That’s quite standard in this line of work. It basically needs to be set up properly, people need to be given clear instructions and the execution of tasks should be monitored continuously.’ The subset of correctly labelled images enables many applications. ‘This opens up new research possibilities. From a council perspective, the accurate recording of the positioning of objects is an important task and efforts are being made to find smart ways of optimising the process. But from a scientific perspective, it would also be worthwhile to train specific networks that can handle the noise and distortion in panoramic images, present in the PanorAMS-noisy dataset, better.’
We use our dataset in the context of improving quality of life through public services. But the possible applications go further.Inske Groenen
Starting shot for innovation
So, what’s next? ‘The large PanoramaAMS-noisy dataset with over 14 million noisy bounding boxes in nearly 800,000 panorama images can be used to train deep learning networks. For example, in object recognition and localisation. The smaller PanorAMS-gt dataset that was manually labelled through crowdsourcing helped us evaluate how well existing image recognition systems can learn from noisy bounding boxes. This dataset can also be used separately as a stand-alone dataset for supervised object recognition and localisation. And so, for example, to train specific networks that can better deal with the panorama distortion. We use our dataset in the context of improving quality of life through public services. But the possible applications go further. The datasets can also be used, for example, in the context of autonomous driving.'
Groenen will now embark on what she considers to be the most interesting part of her thesis. 'We can now unleash experiments and various networks on our data set. SCORE and Amsterdam City Council would also like us to zoom in on the quality of city life. One approach would be an analysis of specific objects to identify characteristics that affect a neighbourhood’s liveability, like the presence of parks or particular architectural styles. We are now investigating exactly how we’re going to do that.'
Groenen's PhD thesis supervisors are Dr Stevan Rudinac and professor Marcel Worring.