Overfeat

Who

Oliver Kanders (okanders), John Manning (jmannin2), Tobias Tettamanti (ttettama), Ding Ding Wei (dwei5)

Introduction

We are implement the keystone paper, Overfeat: An Integrated Recognition, Localization, and Detection using Convolutional Networks. The paper is one of the well-known references in the field of object localization problem. The objective of the paper is to produce an algorithm that is able to recognize the existence of objects of interest in an image, classify the objects, and output where the objects are located via bounding boxes (that is, output the coordinates defining a box on the image and classification of the item contained in the box). This objective can be broken into difference smaller objectives: classification, localization, and recognition. Recognition is the ability to detect the presence of objects of interest, which is not guaranteed within each input. Localization is the identification of where the objects of interest are. And classification is to define what an object is given that it exists. The paper takes on supervised training approach.

Related Work

Other Papers that have been inspired by Overfeat and Object Detection:

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks by Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun, which introduces the Region Proposal Network (RPN) to improve the speed and accuracy of object detection. Single-Shot Detection for Real-Time Object Detection by Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, and Scott Reed, which introduces the Single Shot MultiBox Detector (SSD), a method for object detection that is both accurate and computationally efficient.

Precursor Paper: Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, Yann LeCun: "OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks", presented at the International Conference on Learning Representations (ICLR 2014) in April 2014.

Data

We will be using data sourced from Microsoft COCO (https://cocodataset.org/#home). The database contains large collection of images containing different common objects. The dataset contains image segmentation, which may be processed to produce bounding box representation.

Methodology

The model is built on top of various CNN and Max Pool layers, the architecture of which is described in the paper for Overleaf. We will first train the classifier, and subsequently train the regressor. We will be training the model using Colab and other computing resources (OSCAR depending on availability). The most difficult aspect would perhaps be coordinating the different training stages, as the model requires training a feature extractor as a classification problem, which is subsequently used to train the bounding box problem.

Metrics

Success will be informally defined as "ability to detect the presence of an object of interest and correctly locate and classify the object." Formally, the metric for localization and classification will be deemed successful if the PASCAL (this criterion will be based on Union over Intersection) overlap between the correct bounding box and predicted bounding box exceeds 50% AND the classification is correct. The metric for recognition will be the logical generalization of the previous metric to the possibility of having multiple (or 0) objects of interest. The accuracy metric is therefore well-defined and is appropriate for our project.

Our base goal: Ability to recognize the existence of one object of interest 30% of the time (that is, for N copies of the same object, correctly producing bounding box as defined above for N/2 of them)

Our meet goal: Ability to recognize the existence of one object of interest >50% of the time

Our stretch goal: Ability to recognize the exist of two objects of interest >70% of the time (that is, for N copies of object 1, and M copies of object 2, correctly producing bounding boxes for L objects, where L / (N+M) >= 0.7, as defined above. Note classification is required as well)

Ethics

The OverFeat paper does not explicitly discuss ethical issues related to object detection and localization, but issues of privacy may arise. The power of object detection can extend into the realm of surveillance, where any image-capturing device has the capacity to recognize your identity. Along these lines, if the model is trained on biased data, there may be concerns of discrimination against protected groups in the deployment of these models.