Secure Data Aggregation Using Reliable Nodes for Wireless Sensor Networks

Generally, aggregation techniques in Wireless Sensor Networks (WSNs) are defenseless against various attacks. The aggregator and aggregated data has to be secured to assure integrity and confidentiality. In this study, we propose a secure data aggregation technique with reliable nodes using key predicate test protocol for sensor network. This technique specialize some nodes as Reliable nodes (R-nodes) to monitor the process of aggregation. Initially, for each node, a secret key is shared between base station and neighboring nodes. Then, an aggregation tree is constructed for transmitting data to the base station in a hierarchical fashion. The aggregator encrypts the data using secret key and forwards to a level up aggregator in aggregation tree. By enhancing broadcasting feature of R-nodes, the aggregated value is verified for ensuring integrity. As keys are shared between neighboring nodes, the nodes are validated using self-key predicate test. The proposed technique is proved through simulation results. It increases the throughput by reducing the packet drops significantly.


INTRODUCTION
Wireless sensor networks: A wireless sensor network is an upcoming technology, which is being given major consideration by the research community.Several small, low cost devices constitute the sensor network, which is actually a self-organizing ad-hoc system.Its main function is to monitor the physical environment and consequently collects and dispatch information to one or more sink nodes (Dorottya and Attila, 2007).In wireless sensor network, the main operations performed are related to monitoring the physical environment, sensed information processing and result delivery to the particular sink nodes.To perform these tasks, the sensor nodes are powered by the batteries, which are resources of limited energy.Hence, designing an energy efficient protocol for increasing the network lifetime is the major dispute in this energy-constrained system (Cunqing and Tak-Shing, 2008).
Because of optimistic features of WSN, it is widely utilized in more applications such as home automation, medical applications, environmental monitoring, wildfire detection, traffic regulation and so on (Tamer and Daehun, 2009;Rodrigo and Javier, 2009;Jacques et al., 2010a;Suat and Yang, 2009).

Data aggregation:
A common function of sensor networks is data gathering.In data gathering, the information sampled at sensor nodes desires to be transported to the central base station for further processing and analysis.An important topic mentioned by the wireless sensor networks community is the innetwork data aggregation while focusing on the severe energy constraints of the sensor nodes and the limited transport capacity of multi-hop wireless networks.One of the basic distributed data processing procedures in the wireless sensor networks is data aggregation.It is used to save the energy and to reduce the medium access layer contention (Zhenzhen et al., 2007).
It involves merging the data from various sources along the route avoiding the redundancy, reducing the transmission numbers and hence saving the energy (Bhaskar et al., 2002).The efficiency and effectiveness of the sensor network can be improved considering data aggregation (Bartoli et al., 2010).Availability, confidentiality and flexibility are the important virtues offered by data aggregation (Tamer and Daehun, 2009).
In the dynamic scenarios, the benefits of the data aggregation can be compensated by the overhead of construction and maintenance of the structure.Some distributed approaches assume that there is a welldefined centre of event and the measured signal strength indicates the distance to the centre of the event.However, such approaches are not applicable for the applications with unstructured events like biological hazard, chemical hazard or fire detection, absence of an explicit center or any evident point of optimal aggregation (Kai-Wei et al., 2007).

Kinds of attacks on WSN aggregation:
• Denial of Service attack (DoS): It is familiar attack in wireless sensor networks that impede the radio frequencies by transmitting the radio signals in transmitting medium.This is generally referred as jamming.From the context of aggregation, DoS can take the form of aggregator that decline to aggregate the sensed data.Thereby, it makes the node not to reach the destination.• Sybil attack: In Sybil attack, the attacker disseminates multiple identities of the compromised node.By generating multiple identities, the attacker makes a way to give additional votes for malicious aggregator in aggregator selection process and elects the malicious node as an aggregator.This attack induces to the worst condition of network.
• Selective forwarding attack: In this attack, an adversary controls the forwarded and received messages in WSN by compromising a node.• Replay attack: In this kind of attack, an invader keeps track of network traffic at some part and later replays them in different part of the network.This attack misinforms the aggregator and thereby result will be exaggerated (Padmavathi and Shanmugapriya, 2009;Hani et al., 2008) Security requirements in sensor networks: • Data integrity: Reliability of data in sensor network can be assured by data integrity.In other words, data integrity is the process of authorizing that the transmitted message is not altered, changed or tampered by the third party.(Padmavathi and Shanmugapriya, 2009;Hani et al., 2008;Zhijun and Guang, 2008).

Problem identification:
In Wireless Sensor Networks (WSNs), the process of data aggregation is vulnerable to more security threats.An ideal data aggregation technique must provide reliability and fault tolerance at hand.Further, in our previous study (Mohamed Yacoab and Sundaram, 2010) we have proposed a cost effective compressive data gathering technique to enhance the traffic load, by using structured data aggregation scheme.The use of compressive data gathering provides a compressed sensor reading to reduce global data traffic and distributes energy consumption evenly to prolong network lifetime.However, our cost effective data aggregation technique does not provide any insight for reliability and fault tolerance.
In order to provide reliability and fault tolerance in data aggregation technique, as an extension to our previous work (Mohamed Yacoab and Sundaram, 2012), in this study we propose to develop a secure data aggregation technique with reliable nodes using key predicate test protocol for sensor networks.Prakash et al. (2009) have proposed privacypreserving data aggregation scheme for additive aggregation functions.The Cluster-based Private Data Aggregation (CPDA) leverages clustering protocol and algebraic properties of polynomials.The objective of their approach is to bridge the gap between collaborative data collection by wireless sensor networks and data privacy.And their approach has the advantage of incurring less communication overhead.

LITERATURE REVIEW
Jacques et al. (2010b) have proposed a secure endto-end encrypted-data aggregation scheme, which is based on elliptic curve cryptography that exploits a smaller key size.Additionally, their scheme allows the use of higher number of operations on cypher-texts and prevents the distinction between two identical texts from their cryptograms.Further, their proposed approach permits the generation of shorter encryption asymmetric keys, which is so important in the case of sensor networks.Claude et al. (2009) have introduced a simple and provably secure encryption scheme that allows efficient additive aggregation of encrypted data.The security of their scheme is based on the indistinguishability property of a Pseudorandom Function (PRF), a standard cryptographic primitive.To protect the integrity of the aggregated data, they have constructed an end-to-end aggregate authentication scheme that is secure against outsider-only attacks.Their proposed technique is well suitable for computing statistical values, such as mean, variance and standard deviation of sensed data, while achieving significant bandwidth savings.Shih-I et al. (2009) have proposed a secure encrypted-data aggregation scheme for wireless sensor networks.Their proposed approach for data aggregation eliminates redundant sensor readings without using encryption and maintains data secrecy and privacy during transmission.And the advantage of their proposed scheme provides security and privacy and duplicate instances of original readings will be aggregated into a single packet.Suat and Hasan (2010) have proposed a Data Aggregation and Authentication protocol, called DAA.Their DAA is proposed to integrate false data detection with data aggregation and confidentiality.And to support data aggregation along with false data detection, the monitoring nodes of every data aggregator conduct data aggregation and compute the corresponding small-size message authentication codes for data verification at their pair mates.And supports the confidential data transmission, the sensor nodes between two consecutive data aggregators verify the data integrity on the encrypted data rather than the plain data.
Xiaodong et al. ( 2010) have proposed a multidimensional privacy-preserving data aggregation mechanism for improving security and saving energy consumption in Wireless Sensor Networks (WSNs).Their mechanism integrates the super-increasing sequence and perturbation techniques into compressed data aggregation and has the ability to combine more than one aggregated data.
Zhijun and Guang (2010) have proposed a succinct and practical secure aggregation protocol by combining HMAC (associated with a cryptographic hash function) with Bloom filter, which then is defined as secure Bloom filter.Their approach is an effective aggregation protocol that is suitable for a specific but popular class of aggregation in wireless sensor networks.The advantage from secure Bloom filter, the protocol, without any unrealistic assumptions, fulfills the fundamental security objective of preventing outside adversaries and compromised inside nodes from harming the overall network result.Roberto et al. (2009) have proposed a protocol, which is based on the concept of delayed aggregation and peer monitoring and requires local interactions only.And this protocol provides both confidentiality and integrity of the aggregated data so that for any compromised sensor in the WSN the information acquired could only reveal the readings performed by a small, constant number of neighboring sensors of the compromised one and detects bogus data injection attempts and provides high resilience to sensor failures.
Haifeng ( 2009) has proposed a tree-sampling algorithm that directly uses sampling to answer aggregation queries and provides qualitatively improved functionality compared to existing secure aggregation protocols.The main advantage is that it solves a key challenge sampling, reducing the linear overhead to logarithmic overhead.Their protocol then leverages the nice/clean security property of sampling to achieve end goal.In addition, they have proposed a set sampling technique to overcome a key and wellknown obstacle in sampling.However, their technique is only effective when the predicate count or sum is large.

METHODOLOGY
Overview: In this study, we propose to implement a secure data aggregation technique with reliable nodes using key predicate test protocol for sensor networks.Our technique specialize some nodes as reliable nodes (R-nodes) to verify the process of aggregation.Initially, each node generates a pair wise key using stream cipher technique and shares between the base station and neighboring nodes.The aggregation tree is constructed to transmit data to the base station in a hierarchical fashion.The aggregated data is encrypted using secret key and forwards to a level up in an aggregation tree.Before forwarding data to the next aggregator, the aggregated value is compared with locally aggregated data by R-nodes.If compared data is same then the data is valid and the nodes are not malicious.Conversely, when the compared data is not equal, then the data is not valid and we can conclude that nodes are compromised by the attacker.As secret key is shared between neighboring nodes, there is a chance of nodes to be compromised by the attacker.Hence, nodes with key are secured using self-key predicate test.Node that satisfies predicate test are marked as red and other nodes as yellow.The marked yellow nodes are isolated from data aggregation and transmission.
Network architecture: Consider the sensor network distributed with n number of sensing nodes.Each node senses different events.The sensed data are aggregated and transmitted to the Base Station (BS) hierarchically.Tree based approach is used for data aggregation and tree is constructed as per tree construction algorithm given in our previous study (Mohamed Yacoab and Sundaram, 2010).The tree structure is shown in Fig. 1.Presume that sensors are equipped with loosely synchronized clocks; since, data aggregation requires all nodes to transmit data to BS within some random sample period.
During the deployment of nodes in the network, we have marked a set of nodes as R-nodes (Reliable nodes).We assume that the selected R-nodes are error free and cannot be compromised by any adversaries.In this study, the selected R-nodes are responsible for monitoring the neighbor nodes for any malicious activity.Further, R-nodes are employed to validate the aggregation process.
Secure key generation and distribution: In this phase, assume that each node generates a key stream using As sensor nodes are vulnerable to more security attacks, data that are transmitted over the network are protected by means of homomorphic encryption scheme.Consider PE () as probabilistic encryption scheme.Let D be the data or message that is to be transmitted and SK i be the secure key.Then, for any event in sensor network, the encryption scheme PE () can be given using modulus function m as: where, m is modulus function on which the sum is computed and the modular function is relatively small.This encryption scheme supports additive homomorphic encryption operation.Thus, the probabilistic encryption PE () can be described as follows: Aggregation with R-nodes: In our hierarchical tree based aggregation technique, each node can be either an aggregator (A i ) or an R-node.An A i collects data from many sensor nodes (S i ) in level a of aggregation tree and forwards to an A i of level a+1.In simple, the sensed data is transmitted from low level to the high level of the aggregation tree to reach the root.The Base Station (BS) is the root.
In addition to an aggregator (A i ) and R-node, we define another kind of node known as provider nodes (P i ), It is the set of nodes from where the aggregators (A i ) collects the information.Each provider node (P i ) encrypts the sensed data using its secret key SK i and forwards it to the aggregator.
At time interval t, from every provider node P i (i = 1, 2 … n), the data received by the aggregator will be in the format of three fold as (E Pi , R Pi , N Pi ).
where, E Pi = Encrypted sensed value of provider P i R Pi = Received aggregate value of P i N Pi = Number of provider nodes that generates corresponding aggregate value After the expiration of time interval t, the aggregator A i, discloses the three fold aggregated data received from every provider.By adding the data received from every provider node, it generates the new aggregated value NA i .The NA i is generated as follows: where, n denotes the number of providers and can be represented as n = n p1 + n p2 +…+ n pn .During the computation of NA i at time interval t, there is possible of taking place two circumstances as follows: • All provider nodes (P i ) at level a can transmit data to aggregator A i • Some provider nodes at level a will not transmit its data to aggregator A i The second circumstance can happen only when communication had not taken place during that particular time interval.In such cases, the aggregator A i adds Secret Keys (SK i ) of those sensors with new aggregate function.Then the NA i is encrypted using nonce value (N) and number of providers (n).Ultimately, the aggregator A i forwards NA i to the aggregator up in the hierarchy.

Construction of R-node set:
The process of R-nodes is to verify the correctness of aggregation function performed by the aggregators.The snap shot of network with R-nodes is shown in Fig. 2. R-node set is constructed during the construction of aggregation tree topology.The R-node set of aggregator A i is denoted by ℜ(A i ).It includes the neighboring nodes of A i , which are not take part in aggregation tree.The set ℜ (A i ) is made known to A i and all R-nodes.

Fig. 2: Network with R-nodes
Our technique presupposes that the provider nodes of A i are also made known to all R-nodes.
As one of the features of sensor node is radio broadcast nature, each R-node can overhear the operation done by its neighboring node.When we make use of this feature in aggregators, each R-node can aggregate the same data aggregated by its neighboring aggregators.Therefore, this overhearing feature help Rnodes to verify the aggregation process.
When A i is aggregating data, its corresponding Rnode performs the same aggregation and after time interval t, the aggregated value of A i and R i are compared.If the value aggregated by both A i and R i are same, then the verification is successful and the nodes are not malicious.On the other hand, if aggregated values are not equal then verification is unsuccessful and we can conclude that some nodes are malicious.

Algorithm-1:
Let NA i be the aggregated value of A i Let R i be the reliable node of A i and NR i be the aggregated value of R i (1) If (NA i = NR i ) Then (i) Verification is successful (ii) Nodes are not malicious Else (i) Verification is unsuccessful (ii) Nodes are malicious End if In the case when provider nodes of A i are within the range of R i , then aggregation verification process will follow the procedure given above.When R-node does not have direct communication (within transmission range) with provider nodes of A i , then R i performs the aggregation within atmost two hops.
Self-key predicate test: As we discussed in section 'secure key generation and distribution', each node includes SK i and it is shared between base station and neighboring nodes.Since, the key SK i is shared with neighboring nodes, there is possible of an attacker to compromise a node and exploits other nodes key.Thus, the key must be verified periodically by the BS.To achieve this, the BS makes use of predicate test termed as self-key predicate test.
A predicate is a function that is stably computable, it can be denoted as P: χ → {p, q}.Our technique considers the non-semi linear predicate to predicates the keys of nodes.The common predicate can be given as: P (S p ) = P (S q ) (4) where, p & q : Input symbols S : The number of sensors initialized in the network with symbols p and q Our approach takes red and yellow colors as two input symbols.Based on predicate results, nodes are marked with colors.Node that satisfies the predicate are marked by red color and nodes that did not satisfy the predicate are marked with yellow color.Initially, nodes are unmarked and nodes holding key SK i must satisfy the predicate.Assume, each sensor has unique name in accordance with their key.Instead of using the key explicitly, the BS makes use of unique key name.As the first step of predicate test, the BS broadcasts predicate message to all nodes in the network.The predicate message includes unique name of key SK i , the predicate, Nonce (N) and Message Authentication Code (MAC) of N:

All nodes
The trust worthy sensors that holds key and satisfies predicate will forward back reply to the BS.Nodes that transmit reply are marked as Red color and these nodes are valid sensors.After broadcasting predicate message, the BS will wait for l time, after the expiration of timer, it mark nodes that do not send reply with yellow color.The snapshot of network after selfkey predicate test is given in Fig. 3.
In Fig. 3, we can see the network diagram after self-key predicate test.Nodes that are marked with red color are included in the process of aggregation and data transmission.On the other hand, nodes with yellow color are excluded from data aggregation and data transmission.

Merits of our proposed technique:
• By making use of R-nodes, our technique finds subversive activity and malicious nodes as early as possible.Performance metrics: The performance of proposed SDAR technique is compared with the Secure Encrypted Data Aggregation (Shih-I et al., 2009) protocol.The performance is evaluated mainly, according to the following metrics.
Average packet delivery ratio: It is the ratio of the number of packets received successfully and the total number of packets transmitted.
Energy consumption: It is the average energy consumed by all the nodes in sending, receiving and forwarding operations Packet drop: It is the average number of packets dropped at the receiver due to malicious attacks.
The simulation results are presented in the next section.

Results:
Based on attackers: In our initial experiment, we vary the number of attackers as 2, 4, 6, 8 and 10.
When the attackers are increased from 2 to 8, the packet drop due to the attacks will increase, as we can see from Fig. 4.But we can see that Packet drop of our proposed SDAR protocol is less than the SEDA protocol, since the data is monitored by R-nodes and protected by the aggregators.
As packet drop is increasing, the packet delivery ratio is decreasing, when the attackers are increased.From Fig. 5, we can see that the packet delivery ratio of out proposed SDAR protocol is higher than the SEDA protocol.
From Fig. 6, we can see that the Energy consumption of our proposed SDAR protocol is slightly less than that of SEDA protocol, because of the multiple sink based data gathering approach.

CONCLUSION
In this study, we have presented a secure data aggregation technique with reliable nodes using key predicate test protocol for sensor network.Our technique specialize some nodes as reliable nodes (Rnodes) to monitor the process of aggregation.Initially, for each node, a secret key is shared between base station and neighboring nodes.Then, an aggregation tree is constructed for transmitting data to the base station in a hierarchical fashion.The aggregator encrypts the data using secret key and forwards to a level up aggregator in aggregation tree.By enhancing broadcasting feature of R-nodes, the aggregated value is verified for ensuring integrity.As keys are shared between neighboring nodes, the nodes are validated as valid and invalid nodes using self-key predicate test.Our technique has proved through simulation results.It increases the packet delivery ratio by reducing the packet drops significantly.

Fig. 1 :
Fig. 1: Aggregation tree structure SNOW (Patrik and Thomas, 2002), which is a new stream cipher technique.The computed secret key SK i is shared with Base Station (BS) and also with neighboring nodes.The pair wise key distribution technique such as Dijiang and Deep (2007) is used to share the SK i with neighbors.As sensor nodes are vulnerable to more security attacks, data that are transmitted over the network are protected by means of homomorphic encryption scheme.Consider PE () as probabilistic encryption scheme.Let D be the data or message that is to be transmitted and SK i be the secure key.Then, for any event in sensor network, the encryption scheme PE () can be given using modulus function m as: