Quality Enhancement of Synthesized Video by Improvement of VLC Using Artificial Neural Network

,


INTRODUCTION
Todays, increasing grow of using wireless channels and mobile telecommunication and tend to serve the achievements of modern telecommunications for realtime multimedia data transmission such as video involves advance research in this area (Flierl and Girod, 2004).In wireless and mobile telecommunication due to nature of wireless environment and channel errors the quality of video is affected at the receiver.Several methods have been proposed to reduce the errors caused by the channel (Moreira and Farrell, 2006).Channel coding is an efficient method which is generally used to detect and correct the errors.However, channel coding was performed without considering source coding rate.In recent years, different methods as combination of channel and source coding have been introduced.These methods are known Joint Source and Channel Coding (JSCC).JSCC is generally based on channel estimation (Bystrom and Modestino, 2000;Cheung and Zakhor, 2000;Kondi et al., 2002;Zhai et al., 2006).This means that in order to have higher PSNR, higher channel coding rate is required.In this study, we propose a new JSCC method which increases channel coding rate but total transmission rate is maintained still constant.In fact by increasing channel coding rate more robustness of video frames can be provided.One of the most important problems in general methods is that the channel information is required for modification of source and channel coding rates.However, in the proposed method the channel information is not needed.In other word, the proposed method is able to be applied on every source coding rate, independently and to improve the quality of the reconstructed video frames at receiver.In this study, Peak Signal to Noise Ratio (PSNR) and also Bit Error Rate (BER) are used as metrics measures to indicate the quality of the reconstructed video.

VIDEO FRAMES CODING
In this section, a Group of Pictures (GOP) is shortly introduced.Then video frames coding and decoding are explained using MPEG-4.In MPEG standard, three types of pictures are introduced by I, P and B briefly.GOP is made by combination of these pictures.As an example, a GOP structure is shown in Fig. 1.
A picture which is coded by its information is called I-picture.Thus JPEG compression algorithm is used for coding I-pictures (Salomon, 2004).The picture which is estimated by the nearest previous I or P pictures is called P-picture.For P-picture, motion compensation predictor is also used.The picture which is estimated by the nearest previous and next, I or P pictures is called B-picture.For B-picture, motion compensation predictor is also used, similar to Ppicture.We use two forms of pictures; P-pictures and I-Fig.1: An example of the GOP pictures, through GOP for simulation, because using Bpictures (Bidirectional-pictures) leads to a delay in the system.The delay is created because prediction is performed on the next and previous frames.Thus, system is forced to wait until the next frames are entered and buffered such that the estimation of Bpicture is possible to be performed.Thus we do not use B-picture in he proposed method.

Overview of the MPEG-4 encoding method:
In this study, we use the MPEG-4 part 2. The structure of the encoder and decoder of I and P, are shown in Fig. 2 and 3, respectively.In following section some operations used in the encoder and decoder are described.
Reorder: After quantization of DCT coefficients, due to having many zero values, the obtained coefficients are arranged in zigzag form.Therefore the values related to different frequencies are grouped (Richardson, 2003).Zigzag arrangement is shown in Fig. 4.

Data packetization:
Occurring errors in data transmission may cause loss of synchronization between transmitted video frames.To overcome to this problem, a resynchronized mechanism is required.One method for synchronization is to use data Packetization.Thus, a number of markers are inserted in each packet for synchronization.Encoder divides each frame into packets and inserts markers related for Synchronization at the beginning of each packet.Encoder and decoder perform resynchronization when the markers located in the beginning of each packet are received.This procedure causes synchronization between video frames and avoids additive errors (Shi and Sun, 1999).

Data partitioning:
In this model, the data packet is divided into two areas.The main idea is to separate data with higher values (DC coefficients of DCT matrix, coding model information and motion vectors) from data with lower values (AC coefficients of DCT matrix and residual errors).For I-picture transmission, the first area contains source coding information and DC coefficients.However the second area containing lower values has AC coefficients.For P-picture transmission, the first area contains source coding information and motion vectors whereas the second area contains the DCT data (texture, DC and AC coefficients).In data partitioning the markers (which are unique) are used for synchronization.

Motion compensation block:
Motion compensation algorithm is based on the Minimum Mean Square matching Error (MMSE) between two successive frames using the following equations: arg min ( , ) In Eq. (1), the third parameter, (b), including (k) and (k-1), calls the pixels of two consecutive frames and two variables (m) and (n) show the pixel location in two dimensional spaces for each frame.The variable, (W), represents the search window and motion prediction vector is obtained by searching at this range.W = 15 is considered in this study.This means that the search window size is 16*16.Also prediction window size is considered 8*8.In other words, two consecutive frames within the search window are compared with each other and the least value is stored as a member of motion prediction matrix.After prediction of a frame completely and construction of the motion prediction matrix, the predicted value is subtracted from the previous frame.Next, the obtained value is applied to DCT block.After passing through the DCT block, the obtained value is quantized by 64 levels.At the receiver, as shown in Fig. 4, the received data is firstly dequantized and then the dequantized information is passed through IDCT block.Finally, using motion prediction matrix, the input frame is added to the previous frame and therefore the transmitted frame is reconstructed.In this study, we propose a new method to estimate and send Probability Density Function (PDF) based on the Huffman code and we use neural network to memorize transmitted bit stream for increasing the compression after passing through the VLC block.Next, we apply a secondary channel encoder which its rate depends on the difference between the bits made by VLC block encoder and the bits made using compression method proposed in this study.In fact, by using memorize able data stream, we can implement Huffman algorithm on data stream in several times and therefore we can compress data and reconstruct bit stream at receiver.

Variable Length Coding (VLC):
Variable length coding is final step before transmission of information on each frame.The VLC tries to reduce the transmission rate.This type of coding is performed in two different procedures: In first procedure, due to existing many one values after quantization of DCT matrix as shown in Fig. 5, for instance, a few bits are considered for transmission.For example, only 3 bits are assigned to send the number of one value which represents as 11s.This code, 11s, is determined using specified table for the MPEG standard (Kondi et al., 2002).The value of s is determined by the sign of one value.If the sign of the one value is positive (+1) then s = 0 and otherwise s = 1.On the hand, since the occurrence probability of the sample values except ±1 is low, the number of allocated bits for the these samples are higher than those bits allocated for ±1.For example, for sending the sample value of 7, 11 bits are required (Kondi et al., 2002).
The second procedure in VLC uses this reality that the numbers of zero values in quantized DCT matrix is quite high, as shown in Fig. 5, for instance.After arrangement of DCT matrix elements in zigzag form, this procedure searches the next element is whether or not zero.If this element is zero, the next element value is checked.In each stage, the number of zeros is counted until reaching to a non-zero element or the entire DCT matrix elements are ended.After determining the number of repetition of zero elements between two non-zero elements, a set of binary code is given by the established table for standard MPEG (Kondi et al., 2002).For example, to send the number of one which has 5 zeros look ahead, instead of allocation of 8 bits to each element (which leads to 48bits) only 7 bits are sent which is represented by 000111s.The variable's' is described as before.This procedure is based on two parameters used in the MPEG standard table (Kondi et al., 2002) named "Run" and "Level".The value of Level is equal to the nonzero DCT matrix elements and the value of Run indicates the number of zeros after the selected element before reaching to the other non-zero element (MPEG Video Group, 1998).

CHANNEL CODING METHOD
In this study, Reed-solomon coding is used in which is a class of cyclic BCH code (Shannon, 1986;Carlson, 1986), linear and non-binary and it is made by GF (q) field (Shannon, 1948).The reason for using this type of coding is non-binary characteristics.Therefore, the information of each block can be directly coded without converting to binary.Thus, speed of video frame coding increases.In this method, assuming that the encoded message 'C (X)' is transmitted and the noisy message 'r (X)' is received, then: r (X) = r 0 + r 1 X + … + r n-1 X n-1 (3) Also, the error polynomial 'e (X)' and its relation with transmitted and received messages are given as: Thus, according to Eq. ( 4), it can be shown that r (α i ) = c (α i ) + e (α i ) = e (α i ).In this type of coding, both error and value location error are calculated.Due to having variable of 't' and β i = α ji , the location and error value are obtained using Eq. ( 5).This type of coding is able to correct any pattern of errors with length of 't' or less than 't': In the simplest case, in this coding, assuming t = 1, the value and location error are calculated using Eq. ( 6): Channel coding rate is given by: In Eq. ( 7), 'k' is the number of coded bits related to frame information, n is length of stream which contains the frame information and added information by channel encoder.The channel encoder is able to detect and correct (n -k) /2 errors.Therefore, more n-k bits (the number of parity bits) results in more correction of message bits by channel encoder.

PDF FOR A STREAM WITH ARBITRARY LENGTH
In order to obtain and to present PDF, many bits are needed.For this aim, we first separate arbitrary length of bits from information.Next, we put every 8 bits of arbitrary stream into another buffer (Fig. 6).Therefore, buffer includes different numbers between 0-255 in decimal base instead of 0 and 1.In next stage, the number of repetition of the values at bit stream is calculated.We next create a histogram according to the After this stage, we should obtain PDF using created histogram (Fig. 7), where it is needed the following conditions to be satisfied: It could be shown that the condition 8 is satisfied by investigation of histogram.Because the numbers of repetition of value always is positive and in case of no repetition, its value at histogram curve (Fig. 7) is zero.In order to satisfy the conditions 2 and 3 the numbers of repetitions for each point should be divided to the all existed repetitions Eq. ( 11): In this situation, the conditions 2 and 3 are satisfied and therefore we could obtain PDF curve (Fig. 8) using histogram (Fig. 7).So far we have achieved the PDF related to the numbers in a stream.Since the PDF has different shape related to the stream values in order to code a stream, using Huffman algorithm, it is needed the information representing the PDF related to stream to be sent.In order to send the PDF, we fit a curve on the relative PDF and then the information representing the curve is sent with the related stream.Since the PDF changes are rapid and nonlinear (Fig. 8) a question arises.Which kind of curve could be adapted to result in minimum error?In order to answer to this question we use artificial neural network detailed in following section.

ARTIFICIAL NEURAL NETWORK
Artificial neural network contains a series of layers and nodes.Each network has input and output layers necessarily (Vemuri, 1988;Gardner and Dorling, 1998).We could add several derivative layers arbitrarily.Every layer consists of arbitrary nodes which are known as neurons.The relation between layers is constructed by the links connected to the neurons.These links called network weights.In Fig. 9, a structure of e neural network with 7 entrances and 1 derivative layers and 3 output layer is observed.Note that in addition to weights, each neuron is under effect of another independent quantity which called bias.The way of impressibility of weights and biases at neurons of each layer to the subsequent layer is specified by a function named transfer function.This function can be linear or nonlinear in each layer and is given by user.Therefore, the important factors in network are weights and biases related to each neuron, whose value would affect the outputs value of neural network.Thus, in various methods used to improve the accuracy of neural network, it is attended to choose suitable values of weights and biases such that they correspond to least errors in neural network.

Neural network for description of PDF:
In this part, we fit a function on probability density curve (Fig. 8) using neural network (Cobourn et al., 2000).For this aim, first, a neural network with an untrained and hidden layer is created.Then, all weights and biases are introduced as optimum variables.Also, target function is introduced as Mean Square Error (MSE) between input matrix (containing integer numbers between 0-255) and output matrix (probability of samples occurrence).Therefore, in every step of optimization, weight values and net biases are updated such that the network errors decrease.Then, the resulting biases are used for construction of network.We have used perceptron neural network with a hidden layer containing 60 neurons.Transfer function of hidden neurons layers is "tansig" and output function is purelin.Also, input and output layers are mono neuron and all neurons in each layer have biases.For instance, Fig. 10 indicates the input and output of the neural Fig. 10: Display the inputs and outputs used in the neural network network, for instance.Neural network inputs are the values of horizontal axis and the objective functions of neural network are vertical values corresponding to horizontal values.Network weights are separated until the error is lower than 10 -10 .The transfer function is selected by leave-one-out method (Fukunaga and Hummels, 1989;Stone, 1947) after testing on several PDFs.In this method, we are intended to teach neural network in spite of usual teaching network because in usual teachings, some information is discarded from test data.Test data is then given to network as input at end of each teaching and then weights and biases are obtained.Then, the output is calculated and compared with the correct values.If resulting error is less than specified extent, then calculated weights and biases are assigned as network's weights and biases.If error is more than specified extent, then teaching is repeated again.By performing this procedure, we can prevent over learning or inordinate teaching.However, in this study we try to retain the transfer function and permit over learning to be happened.This means that all points are given to the net as trained points and therefore we have the best fitting at the end.

Implementation of neural network for PDF:
Since PDF is time variable because of up and down points in probability density curves, it is impossible to achieve a function for representation of these curves by using classic methods.Thus, we have used neural network for this purpose, which is performed for memorize able PDFs.For example, if we teach a neural network which is made by 60 neurons, 120 weights and 61 biases (Number of 60 biases for hidden layer and a single bias for the output layer) for an information stream, it possess PDF as shown in Fig. 8, outputs in the best case with supposing average errors less than 10 -10 are presented in Fig. 11 and 12.In Fig. 11, target function and output values are drawn by using neural network.As observed, the output values have been fitted (dispersion of the target values from the output values) on target values as shown in Fig. 12, which is a proof on accuracy of inconsiderable errors on output data.However, it should be mentioned when the curve descends with sharp declivity and value tends to zero, it is possible the estimated function provides values less than these points and the condition 8 is violated.In this situation, negative values are replaced by the least positive values of estimated function.Because the estimated points with negative values are inconsiderable values and therefore we can replace them with the least positive values of estimated function.

THE CONCEPT OF MEMORIZE ABLE INFORMATION STREAM
According to the above, the PDF can be represented by using artificial neural network and in some cases where most values of PDF are restricted to finite samples; the PDF can be described using these samples.Regarding to previous section, it is obvious that we can describe PDF by using an artificial neural network or finite samples.Thus, it is possible to describe PDF using weight and biases obtained from intelligence teaching.Also, we can save the PDF behavior by using a series of weights and biases of the neural networks or finite samples.The required memory to save each obtained weigh, bias or sample is only 2 bytes.We put PDF information during each iteration of compression at the end of resulted bit stream and therefore this information is saved and then transmitted.In fact, we memorize PDF information related to bit stream in each iteration which is required during the reconstruction process.By memorizing data stream, we can implement Huffman algorithm on data string in several times.In this case, the PDF is approximated in every time and Huffman coding is then applied by the calculated PDF.By repetition of this method, we can reduce information string bulk.Information related to the PDF is added to the beginning of compressed bit stream.By repetition of this method, it is possible to bring down the amount of information for transmission.

PROPOSED ALGORITHM
The proposed algorithm encoder: The Flowchart of the proposed encoder algorithm is shown in Fig. 13.The proposed method is detailed as follow.After preprocessing, the VLC output data is coded using Huffman code.Then, the secondary channel coding with the rate obtained from difference between bits made by VLC and bits made using Huffman coding is

Next
Step is performed according to following explanations about the Huffman coding characteristics: Huffman coding encodes the data stream such that the maximum number of zeros is provided (Huffman, 1952;Proakis, 1995).If the operation explained in step 1 is performed on bit stream made as output of Huffman coding, as shown in Fig. 14 and 15 (Fig. 15 is a magnification of Fig. 14), low value numbers have high repetition whereas larger value numbers have only a few repetitions.This issue has been obtained experimentally, therefore, it is needed a condition given in Eq. ( 12) to be checked and if this condition is satisfied and the next step is performed: • Step 7: The bit stream created in step 6 converted to byte stream and then binary converted to decimal basis again (PDF for a stream with arbitrary length).• Step 8: Calculation of histogram regarding to the numbers of repetition between 0 to 255 in calculated byte stream in step 7 (PDF for a stream with arbitrary length).• Step 9: Calculation of the probability density curve by using predetermined histogram to satisfy the conditions 8, 9 and 10 (section 4).• Step 10: If condition 12 is true, the process is transferred to next step, otherwise, it is returned to step 4: 0 ( ) 0.9 ; 1, 2, 3, 4 • Step 11: The probability values 0 to j are saved.
• Step 12: In this stage, a new PDF is constructed according to the fact that the most PDF values are appeared in low value range.Since the other PDF values are negligible, it is assumed that all the samples have equal PDF value.This value is given by: 0 • Step 13: Performing of the Huffman coding using the created PDF in Step 12. • Step 14: The number of 'j' first PDF values stored in step 11 and also, the value of 'j' is converted from decimal to binary basis and added to the end of compressed data stream for transmission.• Step 15: If the number of repetition of the algorithm for the construction of PDF without using neural network is more than 4 times, the process is conducted to step 16, otherwise it returned to step 7 again.encoder codes the bit stream using coding rate calculated by Eq. ( 14) and then coding rate is added to the end of stream: In Eq. ( 14), R' c is secondary channel coding rate, N b is the number of bits generated by the VLC blocks (The number of final bits, should be equal N b ) and N' c is the number of bits after compression using the Huffman coding.

• Step 18:
The constructed bit stream is transmitted.
The proposed algorithm decoder: The Flowchart of proposed algorithm decoder is shown in Fig. 14.The process of the proposed method is described as follow: • Step 1: Secondary channel coding rate is calculated from received bit stream and then the channel decoder decodes data using the coding rate.

• Step 2:
The values of 'f' and 'd' are calculated from end of the decoded bit stream with the sequence of execution.• Step 3: If the PDF is estimated using neural network (this means that the condition 12 has not been satisfied.)Q number of bits obtained using Eq. ( 15), separated from end of the data stream: In Eq. ( 15), N n is the number of hidden layer neurons and because the output layer has a single neuron the number of weights in the hidden layer is equal to 2*N n .Also, since each layer has a bias, we have N n bias in hidden layer.Thus, for hidden layer weights and biases information, 3*N n Numbers should be sent.Since output layer is single neuron, only 2 numbers, one for the output weights and the other one for bias is sent.Also, for each weight and bias 2 bytes are considered: • Step 4: Using weights and bias generated from step 3 and applying the neural network, the PDF is reconstructed.
• Step 5: According to the PDF obtained in step 4, the data stream is decoded using Huffman coding.• Step 6: We check whether during the encoding process of the PDF has been estimated using neural  16) is separated from end of bit stream constructed in step 6.In Eq. ( 16), 'j' is the number of values which satisfies the condition given by Eq. ( 12): 2 * byte 8 * bit P j* * neuron byte = (16) • Step 8: In this stage, another PDF values are calculated using Eq. ( 13) and thus the PDF is constructed.
• Step 9: According to the obtained PDF in step 8, the data stream is decoded by using Huffman coding.• Step 10: The steps 1 to 9 are repeated 'f+d' times.
If the number of iterations is less than 'f+d', then the decoding process is transferred to step 6.Otherwise, the decoder algorithm is ends (Fig. 16).

EVALUATION OF THE PROPOSED METHOD
First, we introduce several parameters used in our simulation and then we compare the experimental results of the proposed method with the results of a new method indicated by Farooq et al. (2009).In this study, the channel is Binary Symmetric Channel (BSC), the occurred error is Additive White Gaussian Noise (AWGN) and fading has Rayleigh PDF.We have used two video files named Foreman and Walk (Video Test Media, https://media.xiph.org/video/derf).The number of Foreman video frames is 123 that 3 frames are coded as I-picture and the rest frames are divided into three parts.Each part has 40 frames which are coded and transmitted as P-pictures.It should be noted that 3 frames coded by I-picture, at the beginning of each part, are transmitted.The numbers of Walk video frames are 105 in which 5 frames are coded as I-picture and the rest frames are divided into five parts.Each part contains 20 frames which are coded as P-picture.It should be noted that 5 frames coded by I-picture, at the beginning of each part are transmitted.Frame size is equal to 288*352.Also frame transmission rate is 25 frames/sec.This simulation has been performed for three source transmission rates.Since the number of transmission frames is constant for a fixed source coding rate, higher transmission rate corresponds to higher channel coding rate and therefore results in more robustness against noise for video frames.Channel encoder rate, R c , also the secondary channel encoder rate, R ' c , for two video files with different rates are shown in Table 1.
As shown in Table 1, due to the high compression rate of bit sequences, the secondary channel coding rate is less than the original channel coding rate.This means that the number of bits used in the secondary channel encoder is more than these bits used in the original channel encoder.Therefore, the transmitted frames are significantly more robust against channel errors.
As shown in Table 1, by increasing transmission rate the channel encoding rate increases.This means that the energy of the received frames increases.After simulation of the proposed method, the obtained results have been compared with the results of Farooq et al. (2009)

CONCLUSION
In this study, we proposed a novel method in which the received video frames contain higher PSNR in comparison with the new method indicated by Farooq et al. (2009).Therefore, the reconstructed frames have higher quality.In the proposed method, due to compression of the obtained bits by VLC block (obtained from difference between bits made by VLC block encoder and bits made using Huffman method) and replacing the bits created by the secondary channel encoder, we could increase quality of the received video frames in comparison with the new methods.The proposed method through increasing the channel coding rate and therefore more protection of transmitted bits and maintaining overall transmission rate constant, could improve the quality of the reconstructed video frames.

Fig. 6 :
Fig.6: Separating the bits from information stream for making stream as bytes

Fig. 11 :
Fig. 11: Target function values with achieved output values using neural network

Fig. 14 :
Fig. 14: A new probability density function curves obtained from the curves in Fig. 8 after the implementation of Huffman coding on strings data

Fig. 16 :
Fig. 16: Proposed method decoder network.If neural network has been used, the decoding process jumps to step 3, otherwise, it goes to next step.• Step 7: P-bits (calculated by Eq. (16) is separated from end of bit stream constructed in step 6.In Eq. (16), 'j' is the number of values which satisfies the condition given by Eq. (12): Fig. 17: Comparison of PSNR for 123 frames of Foreman between the proposed method and Farooq algorithm, with source coding rate = 384 Kbps

•
Step 16: The values 'f' and 'd' which indicate the number of repetition Huffman coding using neural network for PDF estimation and the sequence of execution of 'f' or 'd' are added to the end of constructed bit stream.The sequence of execution means whether neural network has been used.Note that the value of 'd' or 'f' and the relative sequence are presented by 3 bits and 2 bits, respectively.•Step 17: In this stage, the secondary channel

Table 1 :
Channel encoder rate Rc and also R ' c secondary channel encoder rate for two video sequences foreman and walk with difference transmission rates