Reinforcement Learning with FCMAC for TRMS Control

: This study proposes an intelligent control scheme that integrate reinforcement learning in Fuzzy CMAC (FCMAC) for a Twin Rotor Multi-input and multi-output System (TRMS). In the control design, fuzzy CMAC controller is utilized to compensate for PID control signal and the reinforcement learning refines the compensation to the control signal. CMAC with fuzzy system has better performance than the conventional CMAC in TRMS attitude tracking control. With reinforcement learning, the proposed control scheme provides even better performance and control for the TRMS.


INTRODUCTION
Reinforcement learning is used to the problem that faces an agent who wishes to learn its action selection function that can lead to an optimal reward, through trial-and-error interactions in a dynamic environment (Yeh, 2009).The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them (Sutton and Barto, 1998).In the most interesting and challenging cases, actions may affect not only the immediate reward but also the next situation.Reinforcement learning is defined not by characterizing learning methods, but by characterizing a learning problem.The basic idea is simply to capture the most important aspects of the problem facing a learning agent interacting with its environment to achieve a goal.The agent must be able to sense the state of the environment to some extent and must be able to take actions that affect the state.To obtain better reward, a reinforcement learning agent must prefer actions that it has tried in the past and found to be effective in producing reward.Reinforcement learning has been successfully applied to many control problems.It is one of the best learning algorithms for systems without desired values (Anderson et al., 2007;Valasek et al., 2008).
In recent years, intelligent system research has become main stream in the field of control engineering.Control design uses the concept of artificial intelligence is often called intelligent control.Although the intelligent controller has been widely applied to many aspects, but the PID controller is still the most common controller in the industry.PID control structure is simple and parameter adjustment method is easy (Kuo, 1995;Krohling and Rey, 2001).But the disadvantage is that suitable values for control parameters are not easy to obtain.Sometimes the optimization process is very time consuming.With the development of intelligent control, the shortcomings of the PID control can be overcome.There are many types of intelligent algorithms, such as neural networks, fuzzy logic systems and genetic algorithms (Juang et al., 2011).In artificial neural networks, CMAC (Albus, 1975) has been used for many applications.Conventional neural networks and CMAC are different on weight updating rule.CMAC has fast convergence speed and low computation time.Hardware implementation of CMAC neural network is easy to achieve.In this study, a fuzzy reasoning mechanism to obtain the memory mapped values of CMAC is applied, which is called FCMAC.Adaptive learning rate of CMAC can be obtained (Yang and Juang, 2009) by the Lyapunov stability theory.This study presents several control methods to control a TRMS (Feedback Co, 1998).Controller design is based on reinforcement learning, CMAC, FCMAC and PID control.The controlled plant, TRMS, is a high order nonlinear system with cross coupling effect.The goal of this study is to make TRMS move fast, accurate and stable to the desired attitude.

METHODOLOGY
motors.The horizontal and vertical angles and two corresponding angular velocities are measured by position sensors in the pivot.The two propellers are perpendicular to one another and they are built on a beam pivot that can rotate freely on the horizontal plan and vertical plane.By changing the input voltage on beam pivot we can control the propeller's rotational speed and make it achieve the desired goal.The reinforcement learning is applied to compute a compensation for FCMAC command.The reinforcement learning supervises the FCMAC controller by error between system output and reference and then gives rewards to FCMAC command.This study presents two ways of obtaining the rewards.

Addition and subtraction:
where, e : System error In A, to set an error threshold, if the system error is greater than a preset threshold, the control gain will be punished by ρ.On the contrary, if the error is less than the threshold, the gain will be rewarded by ρ.This method is based on the relationship between the reference and system output.If control command is too small, then the control command will have the addition operation and vice versa, if control command is too large, then the control command will have the subtraction operation.In B, a RGA is used to search appropriate gain value.The most important thing is how to determine the fitness function.We choose RGA with appropriate initial setting: pattern number, generation number, mating, mutation, then the genetic algorithm can replace trail-and-error used in A, to obtain optimal control gain.
Cerebellar model articulation controller (Glanz et al., 1991) is a complex nonlinear expression of the form of lookup-table and can be considered as a kind of neural network.This network may change form content through the learning algorithm.CMAC makes system's input state as an indicator and stores related information in a group of memory cell.It is essentially a kind of mapping the complex nonlinear function into a look-up table.Specifically, the input space is divided into many sub-blocks.Each block specifies a real memory location.Each block is used to learn the information distributed to the adjacent block's storage position.CMAC has been recognized as class associative memory neural network.It can learn any multidimensional nonlinear mapping.CMAC algorithm can be effectively used for nonlinear function approximation, dynamic modeling, control system design, etc.Compared with other neural networks, CMAC is based on local learning.Its information is stored in partial structure and only few weights are updated at each iteration and this makes it appropriate for real-time control.As a nonlinear approximation, it is insensitive to the order of learning data.The input maps FCMAC includes two kinds of operations that calculate the output result and adjust the weights.The mapping A→Y is to compute the output as: where, W j : The weight of the j-th storage hypercube a j (x) : A factor obtained from fuzzy reasoning mechanism indicating whether the j-th storage hypercube is addressed by the input x.It is to modify the weight of storage hypercube according to the error between the desired output and the real output, in the stage of network learning in FCMAC.Its weight updating rule is: where, i : The i-th iteration In order to improve the shortcoming of conventional CMAC on the crisp relation, adaptive learning rate is introduced.The adaptive CMAC uses a set of adaptive learning rates that are derived from the Lyapunov stability theory: where, a is the vector of ) (x a j .The conventional CMAC has good performance in decoupled system of the TRMS, but the control of cross-coupling system can not be achieved.Therefore this study presents a FCMAC and PID control scheme to achieve cross-coupling control.The FCMAC realizes the feed-forward control, which is used to compensate for the control signal that is applied to the TRMS.The PID controller realizes the feedback control that guarantees system's stability and disturbance resistance.

EXPERIMENTAL RESULTS
In cross-coupled condition, the task of control is to overcome the cross-coupling influence between vertical and horizontal axes.Figure 5 to 7 show the step, sine wave and square wave responses in cross-coupled condition using CMAC controller, respectively.

CONCLUSION
This study presents the use of conventional CMAC, fuzzy CMAC and reinforcement fuzzy CMAC controllers to control a twin rotor MIMO system.Simulation results show that the proposed reinforcement FCMAC controller has better performance in system stability and attitude tracking control.In reinforcement learning control, compensation gain obtained by the addition and subtraction method provides worse compensation to the system, even worse than the CMAC control.By the use of genetic algorithm to search the reward the control system performance is improved dramatically.

System description :Fig. 1 :
Fig. 1: The laboratory set-up TRMS h Figure 2 shows a block diagram of the TRMS model: where, h α : The horizontal position of the TRMS beam v α : The vertical position of the TRMS beam h Ω : The horizontal angular velocity of the TRMS beam v Ω : The vertical angular velocity of the TRMS beam h U : The horizontal DC-motor voltage input v U : The vertical DC-motor voltage input h G : The linear transfer functions of the tail rotor DC-motor v G : The linear transfer function of the main DCmotor h : The nonlinear part of the DC-motor with tail rotor v : The nonlinear part of the DC-motor with main rotor h w : The rotational speed of the tail rotor v w : The rotational speed of the main rotor h F : The nonlinear function of the aerodynamic force from the tail roto v F : The nonlinear function of the aerodynamic force from the main rotor h l : The effective arm of aerodynamic force from the tail rotor v l : The effective arm of aerodynamic force from the main rotor h J : The nonlinear function of the moment of inertia with respect to the horizontal axis v J : The nonlinear function of the moment of inertia with respect to the vertical axis h M : The horizontal tuning torque v M : The vertical tuning torque h K : The horizontal angular momentum v K : The vertical angular momentum h f : The moment of friction force in the horizontal plane v f : The moment of friction force in the vertical plan hv J : The horizontal angular momentum from the main rotor vh J : The vertical angular momentum from the tail rotor f function of the reaction tuning moment Controller design: Figure 3 shows the proposed reinforcement learning with FCMAC control scheme.

Fig. 4 :
Fig. 4: The conceptual diagram of FCMAC to the actual memory of the N units.Each stored unit with corresponding weight value, CMAC output of the N units is a weighted sum of the actual memory cell.The difference between FCMAC and CMAC is in the mapping part, as shown in Fig. 4. The mapping of FCMAC uses fuzzy reasoning mechanism to get mapping value.The mapping value can help CMAC controller to reduce computing times.The fuzzy reasoning mechanism does not increase CMAC updating time and computing time.On the contrary, it reduces updating time and computing time.FCMAC includes two kinds of operations that calculate the output result and adjust the weights.The mapping A→Y is to compute the output as:

dy:
The desired output m : The number of addressed hypercube α : The learning rate

Table 1 ,
the proposed reinforcement FCMAC control scheme has better average performance than other controllers.