Cardio Vascular Detection with Neuro Computing and Genetic Algorithm

For human the most fundamental requirement is having a healthy life, which is being difficult to maintain day to day as we are getting more progress in technological era. Among the possible reasons of unnatural death, heart disease based causes are showing very significant part. The diagnosis of heart diseases is a vital and intricate job. The recognition of heart disease from diverse features or signs is a multi-layered problem that is highly sensitive with respect diagnostic tests and establishing the relationship with multiple parameters is very difficult. In result decision is not free from false assumptions and is frequently accompanied by impulsive effects. This encourages developing a more reliable and cost effective knowledge based algorithmic approach to detect the heart disease. From engineering point of view, solution for detecting the presence of heart diseases is developed with the concept of artificial intelligence in data mining in this study. Feed forward architecture of neural network technology is taken as platform of computation to generate the intelligence in association with well established field of genetic algorithm (GA). A comparative performance has presented between both learning concepts with various different size of architecture.


INTRODUCTION
Cardiovascular diseases (CVDs) are a group of disorders of the heart and blood vessels and include: coronary heart disease-disease of the blood vessels supplying the heart muscle; cerebrovascular diseasedisease of the blood vessels supplying the brain ;peripheral arterial disease-disease of blood vessels supplying the arms and legs; rheumatic heart diseasedamage to the heart muscle and heart valves from rheumatic fever, caused by streptococcal bacteria; congenital heart disease -malformations of heart structure existing at birth; deep vein thrombosis and pulmonary embolism -blood clots in the leg veins, which can dislodge and move to the heart and lungs.Globally, almost 2% of deaths from cardiovascular diseases are related to rheumatic heart disease, while 42% of deaths from cardiovascular diseases are related to ischemic heart disease and 34% to cerebrovascular disease.Heart attacks and strokes are usually acute events and are mainly caused by a blockage that prevents blood from flowing to the heart or brain.The most common reason for this is a build-up of fatty deposits on the inner walls of the blood vessels that supply the heart or brain.Strokes can also be caused by bleeding from a blood vessel in the brain or from blood clots.
The most important behavioural risk factors of heart disease and stroke are unhealthy diet, physical inactivity, tobacco use and harmful use of alcohol.
Behavioural risk factors are responsible for about 80% of coronary heart disease and cerebrovascular disease.The effects of unhealthy diet and physical inactivity may show up in individuals as raised blood pressure, raised blood glucose, raised blood lipids and overweight and obesity; these are called 'intermediate risk factors' or metabolic risk factors.There are also a number of underlying determinants of CVDs, or "the causes of the causes".These are a reflection of the major forces driving social, economic and cultural change-globalization, urbanization and population ageing.Other determinants of CVDs include poverty, stress and hereditary factors.
Often, there are no symptoms of the underlying disease of the blood vessels.A heart attack or stroke may be the first warning of underlying disease.Symptoms of a heart attack include: pain or discomfort in the centre of the chest; pain or discomfort in the arms, the left shoulder, elbows, jaw.In addition the person may experience difficulty in breathing or shortness of breath; feeling sick or vomiting; feeling light-headed or faint; breaking into a cold sweat; and becoming pale.Women are more likely to have shortness of breath, nausea, vomiting and back or jaw pain.The most common symptom of a stroke is sudden weakness of the face, arm, or leg, most often on one side of the body.Other symptoms include sudden onset of: numbness of the face, arm, or leg, especially on one side of the body; confusion, difficulty speaking or understanding speech; difficulty seeing with one or both eyes; difficulty walking, dizziness, loss of balance or coordination; severe headache with no known cause; and fainting or unconsciousness.

Cardiovascular diseases a development issue in lowand middle-income countries:
• Over 80% of the world's deaths from CVDs occur in low-and middle-income countries.At macro-economic level, CVDs place a heavy burden on the economies of low-and middle-income countries.Heart disease, stroke and diabetes are estimated to reduce GDP between 1 and 5% in low-and middle-income countries experiencing rapid economic growth, as many people die prematurely.

Statistical key facts published by WHO in 2011:
Some statistical data for present status of heart disease and its related information is presented in the report of World Health Organisation (2011): • CVDs are the number one cause of death globally: more people die annually from CVDs than from any other cause.In general manner how different sectors can involve in the improvement of health has shown in Fig. 1.

Human experts vs. machine experts in health care
diagnosis: Almost all the physicians are confronted during their conformation by the task of learning to diagnose.Here, they have to solve the problem of deducing certain diseases or formulating a treatment based on more or less specified observations and knowledge.The diagnosis of disease is a significant and tedious task in medicine.The detection of heart disease from various factors or Symptoms is a multi-layered issue which is not free from false presumptions often accompanied by unpredictable effects.With the complexity of information available from health care domain, human intelligence alone is not good enough to ascertain the proper diagnosis.The problems associated with human expert in the diagnosis procedure can considered broadly as: • Not having very high accuracy in decision • Shortage of expertise • Difficulties in knowledge up gradation • Time dependent performance Because of these problems there is necessity to develop the expert system to provide the assistance mechanism in diagnosis procedure.The conclusion is clear: humans cannot ad hoc analyze complex data without errors.

LITERATURE REVIEW
Usage of compressed ECG for fast and efficient telecardiology application is crucial, as ECG signals are enormously large in size (Sufi and Khalil, 2011) Hedeshi and Abadeh (2011) to extract a set of rules for diagnosis of coronary artery disease is presented.The boosting method considers the cooperation between fuzzy rules that generate with PSO meta-heuristic.However, detecting heart disease by using only ECG has some disadvantages, so detecting heart disease by using other resources is better way.In order to support detection system which uses several resources, knowledge of each resource domain should be defined (Ki-Hyeon and Ho-Jin, 2007).Authors in Mahmood and Kuppa (2010) have proposed a pruning method which is a combination of pre-pruning and post-pruning, aiming on both classification accuracy and tree size.Based upon this method, they have induced decisiontree.A fused hierarchical neural network (FHNNs) is proposed for applications mainly related to diagnosis and fault detection in Sekar et al. (2012).The benefit of such a model is well demonstrated by applying FHNNs for cardiovascular disease (CVD) diagnosis hierarchically using hemodynamic parameters (HDPs) derived from noninvasive sphygmogram (SPG).Variance analysis is used to categorize HDPs according to the importance/relevance based on their influence on discriminating diseases.Different neural networks structures are tested in diagnosing CVD so as to choose the optimal sub neural networks (sub-NNs) for the proposed FHNNs.Finally FHNNs with fused sub-NNs for CVD diagnosis is presented.In Belloni et al. (2007) a medical instrument for phono-cardiac signals acquisition and analysis is described.The system proposed is based on an electronic stethoscope with enhanced performances in environmental noise reduction and a software tool able to reproduce, visualize and analyze the heart sounds.The recorded data can be collected and stored in patient-associated files, to build up a clinic history and to allow further consultation.In Alptekin and Akan (2010) and Oresko et al. (2010), unite the portability of Holter monitors and the real-time processing capability of state-of-theart resting ECG machines to provide an assistive diagnosis solution using Smartphone.Specifically, they have developed two Smartphone-based wearable CVDdetection platforms capable of performing real-time ECG acquisition and display, feature extraction and beat classification.Furthermore, the same statistical summaries available on resting ECG machines are provided.For purpose of detecting cardiovascular diseases (CVDs) hierarchically via hemodynamic parameters (HDPs) derived from sphygmogram, a hierarchical fuzzy neural networks (HFNNs) scheme is proposed in Shi et al. (2010), which provides a noninvasive way to detect CVDs.To deduce conclusion via HFNNs using HDPs as evidences, method of variance analysis is used to categorize and reduce the dimension of feature space.A unique setting of this work is introducing age factor to adjust fuzzy membership function.Categorized HDPs sets are inhaled at different sub-FNNs of HFNNs according to their importance and necessity, so that HFNNs have higher accuracy, especially in dealing with mixed CVDs.It becomes imperative to develop novel computational tools to mine quantitative parameters from imaging data for early detection and diagnosis of asymptomatic cardiovascular disease.In Kakadiaris et al. (2009), authors present progress towards developing a computational framework to mine cardiac imaging data and provide quantitative measures for developing a new risk assessment method.Specifically, they have presented computational methods developed for the detection of coronary calcification and segmentation of thoracic aorta in non-contrast cardiac computed tomography and detection of neovessels in plaques in intravascular ultrasound imaging data.

Problem solving as search: AI and creativity:
There are three main areas of science that are commonly involved in AI-based creative research: knowledge based system, grammers and search.Knowledge-based systems (KBSs) come in very forms, but their characteristics feature is their incorporation of expert knowledge in some domain, usually in the form of rules.Obviously, since KBS needs experts' rules about a domain, the rules themselves need to be acquired from experts.According to many definitions, this would seem to rule out the possibility of a KBS being creative.
Grammars are an alternative way of representing knowledge in a particular domain.Grammer embodies rules about languages.Grammer has played a part in the science of AI for various reasons, usually concerning the attempt to computationally understand natural languages or compositions as statements in a language.
Finally, search is a foundational concept in AI and refers to the journey (made speeder via computation) through an immense space of possibilities in search of something suitable.However, quite different approaches to search have long been a mainstay of AI research.The most explored search technique in AI search is heuristic search, in which a solution (perhaps a design, perhaps a schedule and so forth) is constructed gradually, bit by bit, with heuristics (rules of thumb) employed to decide how to choose each successive part.Heuristic searches tend to be quite fussy about the next move, concentrating on exploring areas that are sanctioned by the heuristics in use.Although this may seem limiting from the viewpoint of creativity, it remains the case that search-based system can potentially aid the creative process by helping designers and artists see more of the space and possibilities than they otherwise would.
Evolutionary computation is all about search.In computer science and in artificial intelligence, when we use a search algorithm, we define a computational problem in terms of a search space, which can be viewed as a massive collection of potential solution of the problem.Any position, or point, in search space defines a particular solution and process of search can be viewed as the task of navigating the space.A commonly used term in this context is "optimization", which just means "finding the best".
Evolutionary search and most other search methods, all make use of somehow of previously visited solutions to help decide where to look next.Sometimes, the space includes a massive collection of terrible solutions that must be slowly waded through until we eventually find ourselves in a decent neighbourhood.In other cases, we may already have a good start.
There are many types of search algorithms.Evolutionary search is a recent and rapidly growing subset and distinguishes itself from other methods in that the way it works is both inspired by and based upon the mechanism of evolution in nature.These algorithms typically use an analogy with natural evolution to perform search by evolving solutions to problems.Hence, instead of working with one solution at a time in the search space, these algorithms consider a large collection or population of solution at once.In contrast of local search, where at any point in time, we just have a single" current" solution and we gradually update this with improvements if and when we find any better solution nearby, evolutionary algorithms work with populations of solutions.At any point in time, we have in mind a population of potential solutions.We use the population as a whole (or at least we use more than one constituent of it) to help us determine where to go next in space.
Although evolutionary algorithms do make computers evolve solutions, this evolution process is not explicitly specified in an EA; it is an emergent property of algorithm.In fact, the computers are not instructed to evolve anything and it is currently not possible for us to explicitly "program in" evolution.Since we do not fully understand how evolution works.Instead, the computers are instructed to maintain populations of solutions, allow better solutions to "have children" and allow worse solution to "die".The "child solution" inherits their parents' characteristics with some small random variation and then the better of these solutions are allowed to have children themselves, while the worse die and so on.This process causes evolution to occur and after a number of generations the computer will have evolved solutions that are substantially better compared to their long-ancestors at the start.By considering the search space itself, it is possible to get an idea of how evolution manages to find good solutions.
All EAs require some form of guidance to direct evolution toward better areas of the search space.In general EAs receive guidance by means of evaluating every solution in the population, to determine its fitness.The fitness of a solution is a score based on how well the solution fulfils the problem objectives, calculated by a fitness function.Typically fitness values are positive real number, where a fitness of zero is a perfect score.EAs are used to minimize the fitness of, by allowing the fitter solution to have more children than less fit solutions.Fitness values are often plotted in search space, giving mountainous fitness landscape, where a high peak corresponds to solutions in that part of search space that have optimal fitness.If the problem has many separate optima (i.e., if the fitness function is multimodal), finding a globally optimal solution in the landscape can be extremely difficult.
There are four main families of evolutionary algorithm in use today, these algorithms are the Genetic algorithm (GA), created by John Holland and made famous by David Goldberg; evolutionary programming (EP), created by Lawrence Fogel and developed further by David Fogel; evolution strategies (ES), created by Ingo Rechenberg and today strongly promoted by Thomas Back and fourth evolutionary algorithm is Genetic programming (GP), a variation of GA created by John Koza.The field of evolutionary computation has gown up around these techniques, with its root still firmly in evolutionary biology and computer science.Today researchers examine every conceivable aspect of EAs, often using knowledge of evolution from evolutionary biology in their algorithms.
Evolution-based algorithms have been found to be some of the most flexible, efficient and robust of all search algorithms known to computer science.Because of these properties, these methods are now becoming widely used to solve a broad range of different problems.
Artificial neural networks: Advantages and present challenges: Neural networks are usually seen as a method to implement complex non-linear mappings using simple elementary units interrelated through connections with adaptive weights.Neural networks are non-linear systems whose structure is based on principles observed in biological neuronal systems.A neural network may be considered as a system capable of answering queries or providing inputs to given outputs.The in/out combination, i.e. the transfer function of the network is not programmed but obtained through a "training" process on empiric datasets.The network builds the function that relates "input" to "output" by processing correct input/output pairs.For each input the network returns an output which is not exactly the desired output, so the training algorithm modifies some parameters of the network in the desired direction.Hence, every time an example is input, the algorithm adjusts its network parameters to the optimal values for the given solution: in this way the algorithm tries to reach the optimal solution for all the examples.These parameters are essentially the weights or linking factors between each neuron that forms our network.Neural networks application fields are typically those where classic algorithms fail because of their inflexibility (they need precise input datasets).Usually problems with imprecise input datasets are those whose number of possible inputs datasets is so big that they cannot be classified.For example, in the image recognition field.

Genetic algorithm based learning:
The genetic algorithm is perhaps the most well known and popularized of all evolution-based search algorithms, although it is fair to say this is partly a result of the term "genetic algorithm' being often used to denote each of the four main family of method.Today probably, the most widely used of the four main kinds of EA.More experimental and theoretical analyses have been made on the working of the GA than on other EAs.Also, the genetic algorithm resembles natural evolution more closely than do most other methods; but, the extent to which these matters in applications will vary greatly.
The GA has been described as a "search algorithm with some of the innovative flair of human search."GA'sare also very forgiving algorithms-even if they are badly implemented, or poorly applied, they will often still produce acceptable results.GA's are today renowned for their ability to tackle a huge variety of optimization problems and for their consistent ability, given at least some thought into a suitable setup for the application in hand, to provide excellent results; that is they robust.It is fruitful to view a GA as making use of two separate spaces: the search space and solution space.The search space is now a space of coded solution to the problem and the solution space is the space of actual solutions.Coded solutions or genotype must be mapped onto actual solutions, or phenotypes, before the quality or fitness of each solution can be evaluated.GA maintains a population of individuals, where each individual consists of a genotype and its corresponding phenotype.Phenotypes usually consist of collections of parameters.Genotype consists of coded version of these parameters.A coded parameter is normally referred to as gene, with the values a gene can take being known as alleles.Collection of genes in one genotypes is often held internally as string and is known as chromosome.
The algorithm works as follows: the genotype of every individual in the population is initialized with random alleles.The main loop of the algorithm then begins, with the corresponding phenotypes of every individual in the population being evaluated and given a fitness value according to how well it fulfils the problem objective or fitness function.These scores are then used to determine how many copies of each individual are placed into a temporary area, often termed the "mating pool' (i.e., the higher the fitness, the more copies that are made of an individual).twoparents are then randomly picked from this pool.Offspring are generated by the use of the crossover operator, which randomly allocates genes from each parent's genotype to each offspring's genotypes.Mutation is then occasionally applied (with low probability) to offspring.When it is used to mutate an individual, typically a single allele is changed randomly.Using crossover and mutation, offspring are generated until they fill the population.This entire process of evolution and reproduction then continues until either a satisfactory solution emerges or the GA has run for a specified maximum number of generations.
The randomness involved in the genetic operators can give the illusion that the GA and other EAs, is nothing more than parallel random search algorithms, but this is far from the case.Evolutionary search has a random element to its exploration of the search space, but the search is unquestionably directed by the "survival of the fittest" principle.In the simple example GA algorithm just described, this principle comes into play when we decide which chromosomes can join the mating pool and hence be parents for the next generation.This decision process is called selection and as long as the decision is made in such a way that better fitness gives more chance of being selected, the survival of the fittest principle is in operation and hence there is "selection pressure" towards areas in the search space that contain better solutions.Flow steps involved in genetic algorithm has shown in Fig. 2. Pseudo code corresponding to how to apply genetic algorithm for optimizing connection weights of three layer feedforward architecture has shown in Fig. 3.Here concept of node crossover in which all incoming weights of a node from one parent solution is exchanged with all incoming weights of other node from other parent solution.Mutation operator performs the operation by adding Gaussian distributed random number to all incoming weights of a node.Tournament selection applied to have unbiased selection of solution for next generation.Experimental setup Summaries of data: Data set has taken from publically available UCI repository (http://archive.ics.uci.edu).Data set contains 270 patient's record and each patient condition defined by 13 parameters, 100 patient's record taken for training data set and remaining 150 for test data set.Linear normalization method applied by Eq. ( 5) to keep the data in the range of [0 1] at preprocessing stage so that proper functioning of neural network can be obtain.Without this pre-processing operation of sigmoid function will move in saturation region and proper learning will not happen.Training: For both algorithms, three feedforward network architectures having size of [13 6 1], [13 4 1] and [13 2 1] are used separately (Fig. 4).The three architectures for each case included one hidden layer with 6, 4 and 2 hidden nodes.Ten trails of independent learning have given for each size of architecture.Trails are differed by changing the random seed for drawing the initial solutions.Thus, there are total of 10 trained networks for each architecture.The transfer function in all cases was the standard sigmoid.

RESULTS
Performances for both learning algorithms are shown in Table 1 and 2 for all the different cases, learning outperform the GA based learning.Even though for training cases GA based learning seems superior but technically it is representing the more memorize learning.It is expected that any new patient record is not going to match in practical point of view so having the better results for test by GA (Table 3).

CONCLUSION
Challenge of detecting the heart disease by computation point of view is always a tedious task for scientist and researchers.There is no algorithm which could be universally accepted as final solution but efforts have applied to develop the more realistic solution with help of neural network, genetic algorithm.It is never easy to explore the data to acquire the hidden knowledge but concept based on presented work in this study can give good alternative for designing any solution related to automated decision support system.The present status of heart disease with root causes and some possible remedies are also discussed in the study to understand the importance and necessity of solution for heart disease.

Fig. 1 :
Fig. 1: Possible involvement of different sectors in health care . In this study, they have demonstrating a data mining technique that performs real-time classification of CVD.With the help of this real-time classification of CVD, the ECG signals demonstrate the electrical activity of the heart and they are the most important data used for the diagnosis and treatment of heart diseases.A perspective for the detection of Premature Atrial Contraction (PAC) and Premature Ventricular Contraction (PVC) disorders in ECG signals.In Palaniappan and Awang (2008) advanced data mining techniques has applied to developed a prototype Intelligent Heart Disease Prediction System (IHDPS) namely, Decision Trees, Naïve Bayes and Neural Network.Particle Swarm Optimization (PSO) has been successfully applied in data mining field to extract rule based classification systems.A ensemble PSO-based approach in

Fig. 3 :
Fig. 3: Psudo code of GA based learning of feedforward architecture and x_min indicate respectively the maximum value and the minimum value of corresponding parameter.

Fig. 4 :
Fig. 4: Performance of learning in connection weights by GA for [13 2 1] architecture For r =1:2*popsize; 24.pick P number of Challengers randomly, where P = 10% of popsize; 25. arrange the tournament w.r.t fitness between rth solution and selected P challengers.; 26.define score of tournament for rth solution 27.End 28.Arrange score of all solution in ascending order; 29.sp = pick up the best half score position ; 30.Select next generation solution as solution corresponding to position sp; 31.Repeat the process from step 5 until terminating criteria does not satisfy 32.Final solution = solution with maximum fitness in last generation.

Table 1 :
GA-ANN performance with 6 hidden nodes Training data