Image Segmentation with the EM and the BYY Learning

The aim of the study is to present an image segmentation method based on feature space clustering with the Gaussian Mixture Model (GMM) and the Expectation Maximization (EM) algorithm. To address the issue of selection of determining the number of clusters, the Bayesian Ying-Yang (BYY) learning is employed. Moreover, to improve the performance of the segmentation on noisy images, both intensity and spatial position information are employed as features to describe a pixel in the image. The simulation results on images with and without noises validate the performance of the proposed method.


INTRODUCTION
One crucial step in image processing and pattern recognition is the segmentation which continues to be a challenging research area and greatly determines the quality of many existing techniques including object tracking, image recognition and object-based image compression.Image segmentation is a process of dividing an image into different regions such that each region is, but the union of any two adjacent regions is not, homogeneous (Cheng et al., 2001).In the past decades, many techniques have been proposed to deal with the image segmentation problem, including thresholding, feature space clustering, edge-based techniques, region-based techniques, graph-based techniques and hybrid techniques (Cheng et al., 2001;Tao et al. 2007).
The image segmentation based on feature space clustering are termed as unsupervised classification methods which organize unlabeled feature vectors into clusters or "natural groups" such that samples within a cluster are more similar to each other than samples belonging to different clusters (Tang et al., 2009).One popular technique for feature space clustering is the model-based technique which has received much attention during the last decades.The most commonly used statistical models are Markov random field models (Cohen et al., 1991;Zoltan and Pong, 2006;Li, 2009), hidden Markov models (Ibrahim et al., 2006) and Gaussian mixture models (Deng and Clausi, 2004;Carreira-Perpinan, 2007;Kim and Tang, 2007;Liu et al., 2008;Tang et al., 2009;Nguyen and Wu, 2012).
In this study the Gaussian mixture model (GMM) is adopted and discussed.Let x be a d-dimensional random variable, the goal of clustering is to assign x to a cluster according to the criterion of minimizing the measure between x and the center of the cluster, with x has the following probability density: (1) where, the weights α y >0 and ∑   = 1

𝑘𝑘 𝑦𝑦=1
and the finite mixture p(x|θ y ), y = 1, 2,…, k is a Gaussian density given by: .(2) with m y and Σ y are the mean and the covariance, respectively and θ y = (m y , Σ y ).
It can be clearly seen from Eq. (1) and Eq. ( 2) that two key issues in GMM-based image segmentation are the evaluation of parameters θ y = (m y , Σ y ) and the selection of the number of clusters.Generally speaking, the first issue can be solved by an approximate Maximum A Posteriori (MAP) estimation or the maximum likelihood estimation which can be obtained by the Expectation Maximization (EM) (Dempster et al., 1977;Redner and Walker, 1984).However, the second problem has not been solved efficiently yet.In addition, it is inevitable to map an image into a feature space.Thus, besides the above two issues, another problem that have to be solved for GMM-based segmentation is the selection of features to represent an image.
First proposed Xu (1995) and developed for over a decade, Bayesian Ying-Yang (BYY) learning provides a general framework that accommodates typical learning algorithms from a unified perspective and improved model selection criteria.BYY learning consists of two subcategories.One is featured with Ying-Yang best matching for developing typical learning algorithms, which is one major focus of Xu (1995) and Xu (1998), while the other is Ying-Yang best harmony featured with its favorable nature for model selection (Xu, 2004).
Since the studies have shown that BYY is capable of selecting the clusters of GMM (Xu, 1997), this study adopts the BYY learning to solve the problem of the selecting the number of clusters associated to the GMM-based image segmentation.Considering that one important difference between the objects in an image may be the intensity (or color), adopting intensity information in image segmentation is useful and effective.Besides, the spatial position is also an important factor when human beings perform segmentation.Thus, this study adopts the features consisting of both intensity (color) and spatial coordinates to represent a pixel.

TEXT IMAGE SEGMENTATION BASED ON EM
Before performing the segmentation, we have to map a gray-level image to a d-dimensional feature space via representing each pixel by a feature vector x ε R d .Assume x has the probability density given by Eq. ( 1) and (2).To evaluate the parameters θ y = (m y , Σ y ), the following EM algorithm which is capable of updating the parameters of the GMM with the sample data algorithm can be used.The EM algorithm consists of two iterative steps, i.e. the E-Step and the M-Step.

M-
Step: maximizing the likelihood function, obtaining the new value of parameter set Θ.For a GMM, the new value of Θ are given by: (4) where, N is the number of pixels in the image.
The EM algorithm iteratively runs E-step and Mstep until the following condition being stratified: (5) where, ε>0 is a small real number.After getting the optimal parameter set Θ* = (θ * 1 , θ * 2 ,…, θ * k ) of the GMM by using the above described EM algorithm, each pixel which is featured by vector x will be classified into a class  �(x) ∈ {1, 2,…, k} according to: Finally, a image is segmented into different regions by labeling each pixel according the class it assigned.
Though above EM-based segmentation algorithm is theoretically efficient and easy to be implemented, one critical parameter that have to be predefined is that of the number of clusters (or number of classes), i.e., the parameter of k.A bad estimate of k will lead to serious problems (Xu et al., 1993).

Selection of the pptimal number of clusters with byy learning:
In the BYY learning (Xu, 1997), there are two primary elements: the external observations x and the output action y.The x are known (visible), but the y is unknown (invisible).All of these elements are treated as random variants and the joint distribution p(x, y) can be calculated in two ways: Practically, the results of these two equations are always not equal unless we can find the optimal solution of p(y), p(x|y), p(x) and p(y|x).The core of the BYY learning is that the specification of a Ying-Yang pair above enhances best the so-called Ying-Yang harmony by minimizing a harmony measure, F s (M yang , M ying ), which is defined as follows: (8) Here, the Kullback divergence is adopted and getting: Since the architecture and the parameter set Θ of the EM based GMM has been fixed, the remaining task is to find the optimal number of clusters (or the optimal number of classes).According to the BYY learning, the selection of the number of clusters, i.e., the parameter k, can be determined in two ways.The first one is according to the following function: (10) Since the architecture has been fixed, F s (M yang , M ying ) is only the function of k and the parameter set Θ. Assume the parameter set obtained by EM algorithm is Θ * , then we have F s (M yang , M ying ) = F s (k, Θ * ).Thus Eq.
(1) can be rewritten as: (11) The other way to get the optimal k is: In the special case for GMM, the above function J 1 and J 2 degrade to (13) and ( 14) where, α * y , Σ * y and Θ * are the parameters of the GMM determined with the EM algorithm.

Image segmentation with EM and BYY:
Before performing the segmentation, we have to map a grayscale image into d-dimensional feature, i.e. representing each pixel by a vector x ∈ R d .Considering that one major difference between the objects in an image may be the intensity (or color), adopting intensity (or color) information in image segmentation is useful and effective.Besides, the spatial position information is also an important factor when human beings perform segmentation.Thus, this study adopts the features consisting of both intensity (color) and spatial coordinates to represent a pixel.Given a gray-scale image I, we use the following function to map I into 3dimensional feature space.(15) where, I(m,n) is the intensity of the gray-scale image I at the position (m,n), N=W×H is the total pixels of I and W and H are the width and height of the image, respectively.The feature vector x i is constructed by: ( 16) where, the weight λ i >0 is used to emphasis the contribution of the ith feature component to the classification.Since the criterion function J 2 (k) is Ushape and the k corresponding to the bottom of the U-Shape is the optimal number of clusters.
With Eq. ( 15) and Eq.( 16), a gray-scale image I is represented by {x i ∈ R 3 |i=1,2,…N}.Assume the probability density of x i is the GMM given by Eq. ( 1) and Eq. ( 2).Then, the following iterating procedure is employed to perform the segmentation.
Step 1: Let the number of classes k = 1.
Step 2: Employ the EM algorithm described by Eq.

SIMULATION RESULTS AND ANALYSIS
Results on noise-free images: The proposed image segmentation method based on the EM and the BYY learning has been implemented in MatlabR2007b.Two images have been used to evaluate the performance of the proposed method.The first one is a synthetic image consisting of three gray levels as shown in Fig. 1a.The second image is the bench mark "House" picture as shown in Fig. 1b.The image size is 128 by 128.
The feature weights λ i are 10, 5 and 5 for the features of intensity, y-position and x-position, respectively.The J 2 (k) obtained from Eq.( 14) for the three images shown in Fig. 1 are plotted in Fig. 2. From Fig. 2 one can see that the J 2 (k) has U-shape.where J 2 (k) has a minimum value.From Fig. 3, one can see that by using the proposed method given by Eq. ( 17) and Eq. ( 18), a satisfied segmentation results can be obtained.that for the noise-free images.J 2 (k) is also U-shape, meaning that the Eq.( 17) and Eq.( 18) are still effective.

Results on noise images
The segmentation results with k * corresponding to the minimum of J 2 (k) are shown in Fig. 6, which clear shows that the proposed method is robust to noise.Evaluation on the proposed feature representation method: One highlight of the proposed method is that not only the intensity of a pixel, but the x-position and y-position of the pixel are also used to construct a feature vector as described in Eq. ( 15) and Eq. ( 16).
Figure 7 shows the segmentation results of the images shown in Fig. 5 with the optimal k * obtained from the proposed method but use only the intensity feature, i.e., the feature vector for each pixel x i = (x i1 ).Comparing Fig. 6 to 7, one can see that though the method employing intensity feature can find the correct k * , it is non-robust to noise.

CONCLUSION
The EM and BYY based image segmentation has been proposed.Two contributions can be claimed.One is that the BYY learning is introduced to the EM-based image segmentation to address the selection of the number of clusters.Another contribution is that both the intensity and the spatial position information are employed as features to describe a pixel in the image, with which the proposed method has a better performance on the noisy images.
Figure 3 draws the segmentation results corresponding to the k *

Fig. 7 :
Fig. 7: Segmentation results on noisy image with intensity feature, (a) Local variance is 0.001 and (b) Local variance is 0.01 (