目的：调研无人机网络项目的相关资料。
TOPICS: Efficient methods for object detection,object recognition and event/scene understanding,
including dangerous objects/events like fire and heavy smoke
要求：2015-8-18下午前，英文整理文档，简要文字加上直观图片说明相关领域内代表性的方法以及最好的方法，包含至少15篇参考文献。

OBJECT DETECTION & RECOGNITION

1.HISTORY&OVERVIEW [1] [2]

Targets:There are about 10,000 to 30,000 visual object categories. Including Scene categorization(city, outdoor), Image-level annotaion(are there people), Object detection(where are the people) and Image parsing(people building).

P.Perona[3] discerns five levels of tasks of increasing difficulty in the recognition problem:
1.Verification: Is a particular item present in an image patch?
2.Detection and Localization: Given a complex image, decide if a particular exemplar object is located some-where in this image, and provide accurate location information on this object.
3.Classification: Given an image patch, decide which of the multiple possible categories are present in that patch.
4.Naming: Given a large complex image (instead of an image patch as in the classification problem) determine the location and labels of the objects present in that image.
5.Description: Given a complex image, name all the objects present in the image, and describe the actions and re-lationships of the various objects within the context of this image. As the author indicates, this is also sometimes referred to as scene understanding.
The components used in a typical object recognition system:The feature extraction, followed by feature grouping, followed by object hypothesis generation, followed by an object verification stage.But nowadays methods have blurred the distinction between the mentioned component.
Timeline of recognition
- Late 1980s:Alignment,Geometric Primitives
- Early 1990s:Invariants,Appearance-based Methods
- Mid-late 1990s:Sliding Windows Approaches
- Late 1990s:Feature-based Methods
- Early 2000s:Parts-and-shape Models
- 2003-late 2000s:Bags of Features
- Present Trends:Machine Learning,Deep Learning,Combination of local and global Methods,Modeling Context,Emphasis on “Image Parsing”
  (you can see the detail of methods in the Slides)
2.Efficient Methods & Relative Papers
Alignment & Geometric Primitives
1.Alignment:Transformation between pairs of features matches in two images

e.g.《Object Recognition Using Alignment》[4] based on the assumption and the method that the position, orientation and scale of an object in three-space can be determined from three pairs of corresponding model and image points.

2.Geometric Primitives:Decribed model-based system with Volumn Models

e.g.《Symbolic reasoning among 3-D models and 2-D images》[5] Describe model-based systems in models,prediction of image features,description of image features and interpretation which relates image features to models.
Invariants & Appearance-based
1.Geometric invariants:Used to probide an efficient indexing mechanism for object recognition system.
e.g.《Geometric hashing: an overview》[6] Typical deformations discussed in the literature include 2D translations,rotations and scalings.

Limits:The above method only suit for monocular viewpoint invariants.

2.Appearance-based:Including Eigenfaces,Color Histograms and appearance manifolds.
e.g.《Face Recognition Using Eigenfaces 》[7] treats face recognition as a two-dimensional recognition problem and makes that the face images are projected onto a feature space which best encodes the variation among known face images.(实现原理)

e.g.《Color Indexing》[8] demonstrates that color histograms of multicolored objects provide a robust,efficient cue for indexing into a large database of models

e.g. 《Visual learning and recognition of 3d objects from appearance》[9] used the manifolds for object detection.

Limits:
1.Require global registration of patterns
2.Not robust to clutter,occlusion,geometric transformations
Sliding Window Approaches
e.g. 《Rapid Object Detection using a Boosted Cascade of Simple Features》[10] It is prominent and milestone in face detection，more than 11500 citations and widely used solution for the real-time Object Detection.The very detailed of the method can click on this (Strongly Recommend)
Limits:Can not handle clutter and occlusion well
Feature-based Methods
e.g. 《Distinctive Image Features from Scale-Invariant Keypoints》[11] the sift feature by Lowe,object detection via the feature points matching.The keypoints have been shown to be invariant to image rotation and scale and robust across a substantial range of affine distortion,addition of noise, and change in illumination.

Limits:Can not real-time with large computation
Part-based Methods
- Object as a set of parts
- Relative locations between parts
- Appearance of part
  e.g. 《Object Detection with Discriminatively Trained Part Based Model》[12] use Hog Features,Part Model and Latent SVM to work.

Bag-of-features Models

e.g. 《Local features and kernels for classification of texture
and object categories: A comprehensive study》[13] achieved very impressive result in the PASCAL Visual Object Classes Challenge

Limits:Ignore the spatial relationships among the patches
Neural-network models
e.g. 《ImageNet Classification with Deep Convolutional
Neural Networks》[14]

e.g. 《Rich feature hierarchies for accurate object detection and semantic segmentation》[15]
Scene/Event Recognition
Scene: 《Modeling the shape of the scene: a holistic representation of the spatial envelope》[16] performs good at scene recognition.
Event: 《Video-based event recognition:activity representation and probabilistic recognition methods》[17]

3.Applications for nowadays

4.References
[1] Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce. “Object Recognition:History and Overview.” CS.UNC.EDU , 2011
[2] Andreopoulos, Alexander, and John K. Tsotsos. “50 Years of object recognition: Directions forward.” Computer Vision and Image Understanding 117.8 (2013): 827-891.
[3] P. Perona, “Object Categorization: Computer and Human Perspectives, chap.” Visual Recognition circa 2008, Cambridge University Press,
55–68, 2009.
[4] Huttenlocher, Daniel P., and Shimon Ullman. “Object recognition using alignment.” Proc. ICCV. Vol. 87. 1987.
[5] Brooks, Rodney A. “Symbolic reasoning among 3-D models and 2-D images.” Artificial intelligence 17.1 (1981): 285-348.
[6] Wolfson, Haim J., and Isidore Rigoutsos. “Geometric hashing: An overview.” Computing in Science & Engineering 4 (1997): 10-21.
[7] Turk, Matthew, and Alex P. Pentland. “Face recognition using eigenfaces.” Computer Vision and Pattern Recognition, 1991. Proceedings CVPR’91., IEEE Computer Society Conference on. IEEE, 1991.
[8] Swain, Michael J., and Dana H. Ballard. “Color indexing.” International journal of computer vision 7.1 (1991): 11-32.
[9] Murase, Hiroshi, and Shree K. Nayar. “Visual learning and recognition of 3-D objects from appearance.” International journal of computer vision 14.1 (1995): 5-24.
[10] Viola, Paul, and Michael Jones. “Rapid object detection using a boosted cascade of simple features.” Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on. Vol. 1. IEEE, 2001.
[11] Lowe, David G. “Distinctive image features from scale-invariant keypoints.” International journal of computer vision 60.2 (2004): 91-110.
[12] Felzenszwalb, Pedro F., et al. “Object detection with discriminatively trained part-based models.” Pattern Analysis and Machine Intelligence, IEEE Transactions on 32.9 (2010): 1627-1645.
[13] Zhang, Jianguo, et al. “Local features and kernels for classification of texture and object categories: A comprehensive study.” International journal of computer vision 73.2 (2007): 213-238.
[14] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet classification with deep convolutional neural networks.” Advances in neural information processing systems. 2012.
[15] Girshick, Ross, et al. “Rich feature hierarchies for accurate object detection and semantic segmentation.” Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on. IEEE, 2014.
[16] Oliva, Aude, and Antonio Torralba. “Modeling the shape of the scene: A holistic representation of the spatial envelope.” International journal of computer vision 42.3 (2001): 145-175.
[17] Hongeng, Somboon, Ram Nevatia, and Francois Bremond. “Video-based event recognition: activity representation and probabilistic recognition methods.” Computer Vision and Image Understanding 96.2 (2004): 129-162.

OBJECT DETECTION & RECOGNITION

Alignment & Geometric Primitives

Invariants & Appearance-based

Sliding Window Approaches

Feature-based Methods

Part-based Methods

Bag-of-features Models

Neural-network models

Scene/Event Recognition