Evaluating Classification Strategies in Bag of SIFT Feature Method for Animal Recognition

: These days automatic image annotation is an important topic and several efforts are made to solve the semantic gap problem which is still an open issue. Also, Content Based Image Retrieval (CBIR) cannot solve this problem. One of the efficient and effective models for solving the semantic gap and visual recognition and retrieval is Bag of Feature (BoF) model which can quantize local visual features like SIFT perfectly. In this study our aim is to investigate the potential usage of Bag of SIFT Feature in animal recognition. Also, we specified which classification method is better for animal pictures.


INTRODUCTION
In Content Based Image Retrieval (CBIR) (Qi and Snyder, 1999) proposed in the early 1990s, images are automatically indexed by extracting their different low level features such as texture, color and shape.Semantic gap is a well-known problem among Content Based Image Retrieval (CBIR) systems.This is caused by humans tendency to use concepts, such as keywords and text definitions, to understand images and measure their resemblance.Although low-level features (texture, color, spatial relationship, shape, etc.) are extracted automatically by computer vision techniques, CBIR often fails to describe the high-level semantic concepts in user's mind (Zhou and Huang, 2000).These systems cannot effectively model image semantics and have many restrictions when dealing with wide ranging content image databases (Liu et al., 2007).
Another problem caused by using low level features like texture, color and shape is that they need image digestion.But Scale-Invariant Feature Transform (SIFT) (Lowe, 1999) is a robust feature in scaling, rotation, translation, illumination and partially invariant to affine distortion.Also, there is no need to digest images.The only thing we need to do is to quantize SIFT features by well-known Bag of Feature (BoF) technique.
Furthermore, in most of the previous works we observed that there isn't any appropriate investigation on animal annotation and animal picture recognitions because they have the same environments which caused low accuracy.For this reason, our objective in this study, is to investigate the potential usage of bag of SIFT feature in animal recognition.And find out which kind of classification is more suitable to our animal recognition system.

LITERATURE REVIEW
At the starting point of BoF methodology we must identify local interest regions or points.Then we can extract features from these points, both of which described in the following section.
Interest point detection: There are several distinguished methods which are listed below (Mikolajczyk et al., 2005).

Harris-Laplace regions:
In this method corners are detected by using Laplacian-of-Gaussian operator in scale-space.
Hessian-Laplace regions: Are localized in space at the local maxima of the Hessian determinant and in scale at the local maxima of the Laplacian-of-Gaussian.

Maximally Stable External Regions (MSERs):
Are components of connected pixels in a threshold image.A water-shed-like segmentation algorithm is applied to image intensities and segment boundaries which are stable over a wide range of thresholds that define the region.

DoG regions:
This detector is appropriate for searching blob-like structures with local scale-space maxima of the difference-of-Gaussian.Also it is faster and more

Salient regions:
In circular regions of various sizes, entropy of pixel intensity histograms is measured at each image position.
In our study we used Harris-Laplace for finding key points.
SIFT feature descriptors: After interest Points are detected we can describe them by their features like SIFT.SIFT is an algorithm published by for detecting and describing local features in images.Each SIFT key point is a circular image region with an orientation.It is described by four parameters: key point center (x and y coordinates), its scale (the radius of the region) and its orientation (an angle expressed in radians).SIFT detector is invariant and robust to translation, rotations, scaling and partially invariant to affine distortion and illumination changes.Four steps involved in SIFT algorithm.
Scale-space extrema detection: Which i locations and scales that are identifiable from different views (Gaussian blurring and sigma) of the same object.
Keypoint localization: Eliminate more points from the list of keypoints by finding those that have low contrast or are poorly localized on an edge.
Orientation assignment: Assign a consistent orientation to the keypoints based on local image properties.
Keypoint descriptor: Keypoint descriptors typically uses a set of 16 histograms, aligned in a 4×4 grid, each with 8 orientation bins, one for each of the main com pass directions and one for each of the mid these directions.This result come up in a feature vector containing 128 elements.
In other words, each pixel in an image is compared with its 8 neighbors as well as 9 pixels in next scale and 9 pixels in previous scales.If that pixel is a local extrema, it means that the keypoint is best represented in that scale.
Figure 1 shows 2 examples of SIFT features of Harris-Laplace key points which are generated by our experiment.
Laplace key points compact (less feature points per image) than other In circular regions of various sizes, entropy of pixel intensity histograms is measured at Laplace for finding After interest Points are detected we can describe them by their features like SIFT.SIFT is an algorithm published by Lowe (1999) for detecting and describing local features in images.
point is a circular image region with an orientation.It is described by four parameters: key point center (x and y coordinates), its scale (the radius (an angle expressed in radians).SIFT detector is invariant and robust to translation, rotations, scaling and partially invariant to affine distortion and illumination changes.
Which identify those locations and scales that are identifiable from different views (Gaussian blurring and sigma) of the same Eliminate more points from the list of keypoints by finding those that have low contrast Assign a consistent orientation to the keypoints based on local image Keypoint descriptors typically uses a set of 16 histograms, aligned in a 4×4 grid, each with 8 orientation bins, one for each of the main compass directions and one for each of the mid-points of these directions.This result come up in a feature vector In other words, each pixel in an image is compared with its 8 neighbors as well as 9 pixels in next scale and 9 pixels in previous scales.If that pixel is a local extrema, it means that the keypoint is best represented Figure 1 shows 2 examples of SIFT features of Laplace key points which are generated by our Visual word quantization: After extracting features, images can be represented by sets of keypoint descriptors.But they are not meaningful.problem Vector Quantization techniques (VQ) are presented to cluster the keypoint descriptors into a large number of clusters by using the K algorithm and then convert each keypoint by the index of the cluster to which it belongs.By using Bag of Feature (BoF) method we can cluster similar features to visual words and represent each picture by counting each visual word.This representation is similar to the bag-of-words document representation in terms o semantics.There is a complete definition of BoW in the next part.
Bag of Words (BoW) model: Bag of Words (BoW) model is a popular technique for document classification.In this method a document is represented as the bag of its words and features are extracted from frequency of occurrence of each word.Recently, the Bag of Words model has also been used for computer vision (Perona, 2005).Therefore instead of document version name (BoW) Bag of Feature which is described below.
Bag of Feature (BoF) model: These days, Bag of Feature (BoF) model is widely used for image classification and object recognition because of its excellent performances.
Steps of BoF method are listed as follows: • Extract Blobs and features (e.g., SIFT) on training and test Blobs of images • Build visual vocabulary using a classification method (e.g., K-mean) and descriptor quantization • Represent images with BoF histograms • Image classification (e.g., SVM) The related works in this area by Choi presented a method for creating fuzzy multimedia ontologies automatically.They used SIFT feature extraction for their feature extraction and BoF for their feature quantization.Zhang et al. (2012) aspects of the various Automatic Image Annot (AIA) method, including both feature extraction and semantic learning methods.Also major methods are discussed and illustrated in details.Tousch re-viewed structures in the field of demonstration and analyzed how the structure is used.They first demonstrated works without structured vocabulary and then showed how structured vocabulary started with introducing links between categories or between features.Then reviewed works which used structured vocabularies as an input and analyzed how the structure is exploited.Jiang et al. (2012) Diffusion (SD) approach which enhanced the previous annotations (may be done manually or with mach After extracting features, images can be represented by sets of keypoint descriptors.But they are not meaningful.For fixing this problem Vector Quantization techniques (VQ) are presented to cluster the keypoint descriptors into a large number of clusters by using the K means clustering algorithm and then convert each keypoint by the index belongs.By using Bag of Feature (BoF) method we can cluster similar features to visual words and represent each picture by counting each visual word.This representation is similar to the ument representation in terms of s a complete definition of BoW in the Bag of Words (BoW) model is a popular technique for document classification.In this method a document is represented as the bag of its words and features are extracted from ncy of occurrence of each word.Recently, the Bag of Words model has also been used for computer .Therefore instead of document Feature (BoF) will be used These days, Bag of Feature (BoF) model is widely used for image classification and object recognition because of its Steps of BoF method are listed as follows: Extract Blobs and features (e.g., SIFT) on training Build visual vocabulary using a classification mean) and descriptor quantization Represent images with BoF histograms Image classification (e.g., SVM) The related works in this area by Choi et al. (2010) presented a method for creating fuzzy multimedia ontologies automatically.They used SIFT feature extraction for their feature extraction and BoF for their (2012) analyzed key aspects of the various Automatic Image Annotation (AIA) method, including both feature extraction and semantic learning methods.Also major methods are discussed and illustrated in details.Tousch et al. (2012) viewed structures in the field of demonstration and analyzed how the structure is used.They first demonstrated works without structured vocabulary and then showed how structured vocabulary started with introducing links between categories or between atures.Then reviewed works which used structured vocabularies as an input and analyzed how the structure proposed Semantic (SD) approach which enhanced the previous annotations (may be done manually or with machine learning techniques) by using a graph diffusion formulation to improve the stability of concept annotation.Hong et al. (2014) proposed Multiple-Instance Learning (MIL) method by performing feature mapping MIL to change it to a single-instance learning problem for solving the problem of MIL method.This method is able to explore both the positive and negative concept correlations.It can also select the effective features from a large and diverse set of low-level features for each concept under MIL settings.Liu et al. (2014), presented a Multi-view Hessian Discriminative Sparse Coding (MHDSC) model which mixed Hessian regularization and discriminative sparse coding to solve the problem of multi-view difficulties.Chiang (2013) offered a semi-automatic tool, called IGAnn (interactive Image Annotation), that assists users in annotating textual labels with images.By collecting related and unrelated images of iterations, a hierarchical classifier related to the specified label is built by using proposed semi-supervised approach.Dimitrovski et al. (2011) presented a Hierarchical Multi-label Classification (HMC) system for medical image annotation, where each case can be in multiple classes and these classes/labels are organized in a hierarchy.In most of the reviewed literature, BoF with SIFT feature has the key role in feature extraction and quantization and shows better results in comparison with using other low level feature like color or texture alone (Tsai, 2012).
Figure 2 and 3 depict the stages of Animal recognition using BoF model for training and testing, respectively.

METHODOLOGY
In this study we will investigate the potential and accuracy of BoF model with SIFT feature, K method for clustering and quantization of words and 6 different kinds of classification (NN L2, NN Chi linear, SVM LLC, SVM IK and SVM chi domain (animal) to find which one is more effective.
Because of the variety of animal pictures and natural environment, our dataset is Caltech 256 et al., 2007).We investigate 20 different animals or 20 concepts from different kinds of animals ( butterfly, camel, dog, house fly, frog, giraffe, goose, gorilla, horse, humming bird, ibis, iguana, octopus, ostrich, owl, penguin, starfish, swan different environments (lake, desert, sea, sand, jungle, bushy, etc.).For each animal, 40 images are randomly selected for training and 10 images are randomly selected for testing.The total number of images is 800 for training and 200 for testing, The number of extracted code words is 1500 and for evaluating the accuracy of each concept we used a well formulas Precision, Recall and Accuracy 2012; Chiang, 2013;Fakhari and Moghadam, 2013;Lee et al., 2011).
Although, we have just focused on 20 different animals, this method can be used for other Fig. 4: Visual word example in animal BOF Appl. Sci. Eng. Technol., 10(11): 1266-1272, 20151269 In most of the reviewed literature, BoF with SIFT feature has the key role in feature extraction and quantization and shows better results in comparison with using other low level feature like color or texture stages of Animal recognition using BoF model for training and testing, In this study we will investigate the potential and accuracy of BoF model with SIFT feature, K-mean method for clustering and quantization of words and 6 ferent kinds of classification (NN L2, NN Chi 2 , SVM linear, SVM LLC, SVM IK and SVM chi 2 ) in a special domain (animal) to find which one is more effective.
Because of the variety of animal pictures and natural environment, our dataset is Caltech 256 (Griffin .We investigate 20 different animals or 20 concepts from different kinds of animals (bear, butterfly, camel, dog, house fly, frog, giraffe, goose, gorilla, horse, humming bird, ibis, iguana, octopus, ostrich, owl, penguin, starfish, swan and zebra) in different environments (lake, desert, sea, sand, jungle, bushy, etc.).For each animal, 40 images are randomly selected for training and 10 images are randomly selected for testing.The total number of images is 800 testing, The number of extracted code words is 1500 and for evaluating the accuracy of each concept we used a well-known formulas Precision, Recall and Accuracy (Tousch et al., Fakhari and Moghadam, 2013; e have just focused on 20 different other animals or other categories rather than animals.All we need to do is to separate the folder of new concept and change its name.Then all the stages can be automatically don our algorithm.False negatives (fn): Items which were not labeled as belonging to this class but should have been.

True negative (tn):
The number of items correctly not labeled as belonging to this class:

DISCUSSION
Normalized confusion matrix is a n×n matrix for showing how many test images are correctly classified and how many are misclassified in other classes.Which means it can find in each concept how many of them are classified by the others.Therefore by using this matrix we can analyze and find the reason for the misclassification of some pictures and find a good solution for it.Figure 5 shows our final experimental results for 20 concepts (bear, butterfly, camel, dog, house fly, frog, giraffe, goose, gorilla, horse, humming bird, ibis, iguana, octopus, ostrich, owl, penguin, starfish, swan and zebra), 40 images are randomly selected for training and 10 images are randomly selected for testing.It means the total number of images is 800 for training and 200 for testing.The number of extracted code words is 1500 and for computing the accuracy of each concept, we used well-known formulas Precision, Recall and Accuracy in six kinds of image classification methods (NN L2, NN Chi 2 , SVM linear, SVM LLC, SVM ik, SVM chi 2 ).All of them are respectively depicted in Fig. 6 to 8.Although we have just focused on 20 different animals, this method can be scalable to other concept.And all we need is to separate the folder of new concept and change its name to that new one.Then all the stages can be automatically done by our experiment.
Clearly, the results of SVM Chi-square are better than other ones which are shown in Fig. 5. Therefore SVM Chi-square is a better classifier.The running of our code provides better results for three specific animals: zebra, horse and starfish.This is probably the result of a better distinguishing pattern in these animals.So if we can omit the unimportant parts of our dataset pictures, we will get more accurate results.

CONCLUSION
Our objective in this research was to find the potential usage of bag of feature in animal recognition and other concepts within recognition category.After implementation of our experiment we got reasonable results which show BoF is a good selection for finding animals in nature.Also, SVM Chi-square has a better accuracy in comparison with NN L2, NN Chi 2 , SVM linear, SVM LLC, SVM IK.But most of the animals are the same as their environment because nature wants to protect them against enemies.In future if we omit the background parts we can definitely get better result.Therefore in future we want to extract regions for addressing the location of objects and extract other features as well (Color, Texture, Shape and Spatial location etc.) to get better results.

Fig. 1 :
Fig. 1: Detected SIFT features of Harris-Laplace key points as circles compact (less feature points per image) than other detectors.

Fig. 2 :
Fig. 2: Animal recognition using BoF model training stages Figure 2 Illustrates training model of Bag of SIFT Feature in animal pictures which was implemented by MATLAB 2014.Then we tested Bag of SIFT Feature with test model which is shown in Fig. 3.All the pictures for both models are generated by our experiment.Accuracy: For measuring the accuracy we used 2 famous methods: Precision, Recall and accuracy which are used in Tousch et al. (2012), Chiang (2013), Fakhari and Moghadam (2013) and Lee et al. (2011).Their formulas are in (1), (2) and (3) and also the definition of tp, tn, fp and fn are as follows.True positives (tp): The number of items correctly labeled as belonging to this class.False positives (fp): Items incorrectly labeled as belonging to this class.