Statistical Analysis
Visual identification of instances of plumes will need to be performed over the dataset in order to generate a labeled dataset for training the model. However, due to the large size, the process of labeling is very time-consuming. In order to improve the efficiency of this process, statistical heuristics will be used to find points of separation between images that contain plumes and those that do not.
The polluting plumes emitted by buildings evolve in shape and optical depth on small time scales. This evolution can be exploited to identify them. Background subtraction removes the static components of an image set, everything that does not change through the image series, and leaves only pixels that have changed in color. The generation of a background image is complex in the ever evolving cityscape, where changes in illumination due to clouds and reflections of sunlight by buildings, as well as moving objects like cars, also evolve throughout an image series. We plan on creating background images by applying a gaussian-weighted average to the surrounding images and taking the difference between the target image and the averaged baseline. Once subtracted from the target image, we expect the plumes to be visible in the resulting frame, as well as cars, clouds, and other features which will constitute our contaminants (potential false positives).
For removing small scale noise from small perturbations in the images, a low pass filter, such as a gaussian kernel, can be used. The main sources of confusion for the algorithm will be clouds and glare reflecting off of buildings. Blob detection will be used to pick out the discrete regions of motion. Blob size can be a reasonable indicator for discriminating between larger clouds, but smaller clouds that are the size of plumes will be very difficult to separate out. To address this, the training data will also include clouds (a significant source of false alarms), and non-activity (true negatives) collected through statistical methods, human vetting, and simulations.
Image Labeling
In order to apply labels to the images and extract plumes to generate the training set, a web-based tool was built using d3.js, a data visualization library in Javascript \cite{Bostock:2011:DDD:2068462.2068631}, and Flask, a web application framework in Python \cite{microframework2010}, as is seen in Figure \ref{623440}. The tool allows the user to browse consecutive images in small batches and draw bounding boxes around any instances of plumes that are observed. The bounding boxes are then saved to disk and can be reviewed later. The images are served using Flask, meaning that they can be manipulated in Python and OpenCV \cite{opencv_library} before displaying the images on screen. This makes it possible to apply various image filters which make spotting plumes easier, such as background subtraction. Other heuristics that are found in further exploratory analysis could be added to the web app trivially. Depending on the success of the statistical separation methods, the results could be used for queries, returning the images most likely to contain plumes, further reducing the amount of image labeling required.