Trusted Heartbeat Framework for Cloud Computing

In cloud computing environment, as the infrastructure not owned by users, it is desirable that its security and integrity must be protected and verified time to time. In Hadoop based scalable computing setup, malfunctioning nodes generate wrong output during the run time. To detect such nodes, we create collaborative network between worker node (i.e. data node of Hadoop) and Master node (i.e. name node of Hadoop) with the help of trusted heartbeat framework (THF). We propose procedures to register node and to alter status of node based on reputation provided by other co-worker nodes.


Introduction
Outsourcing computation to cloud can reduce IT expenditure spent by companies.Still, most of them are not willing to do so, due to security concerns with cloud computing environment and services.As per survey [1], it is found that despite of huge benefits, fear is still there about security threats like loss of control of data and integrity of systems.Computing nodes (virtual machines) can be tampered with or ill configured to produce wrong results.E.g.Assigned Hadoop task (related to financial data consolidations) may generate incorrect result due to few malfunctioning nodes [2].Due the large size of data and its processing, the error is very hard to identify in collective results, and it may result in huge loss.

Malfunctioning Nodes and Infrastructure Attacks
In a public cloud infrastructure, malfunctioning nodes may infringe security requirements specified by service consumer.They may produce malicious outputs, which may violate the privacy and integrity ofcomputation.This may result in disclosure of users' confidential data, and profile users' behaviors (and preferences) for privacy analysis.Moreover, software flaws, bugs and mis-configurations can lead to incorrect results or unintended information leakage.
Malicious or tempered nodes may eavesdrop the communication between other nodes, in order to disclose confidential data, enforce malicious privacy profiling [3], launch replay attacks [3], and Man-In-the-Middle attacks [2] [4] in the cloud system.They may also impersonate the Master to steal other node's data, or vice versa.Moreover, malicious node can launch Denial of Service attacks [5].Growing number of vulnerabilities uncovered in cloud platform has prompted the move towards implementing trust based solutions incorporated with hardware support.The Trusted Computing's (TC) [6] initiative and adoption of trusted platform module (TPM) [7] has been gaining attestion from industry as well as acadamic.Hardware manufacturer is also participating to accelerate the adoption of TC across varios platform.
We consider a cloud system, which takes a user task, distributes among computing nodes, and gathers its output as shown in Figure 1.We propose a framework for integrity verification of cloud.We use three procedure viz: for registration, for verification and for detection of virtual machine.TPM based node registration process will initially establish trust.Attested heartbeat procedure periodically verifies the trustworthiness of every node in the system.Tampered or misconfigured nodes can be identified quickly by reputation based decision procedure.
Rest of the paper is organized as follows: In Section 2, we discuss background and related work on Heartbeats and TPM.In section 3, we propose the Trusted Heartbeat infrastructure.Section 4 shows usage model of proposed framework with conclusion and references at the end.

Hadoop
Apache Hadoop [8] is framework that facilitates the data intensive distributed processing of massive data sets across clusters machines.It supports extension of processes from a single to thousands of machines.Designed with a fundamental assumption that hardware failure is common, making it the software's responsibility to identify and handle failures at the application layer.It replicates data across multiple nodes with rapid data transfer facility.Hadoop implementation essentially consists of two major components: (i) Hadoop Distributed File System (HDFS) [9]: A file system that manages all the nodes in a cluster for data storage, and (ii) Map-Reduce [10]: The framework that allocate work to nodes in a cluster.Hadoop Cluster can be designed in various ways.One of which includes a single master and multiple worker nodes.The master node consists of a Job-Tracker, TaskTracker, NameNode and DataNode.A worker node acts as both a DataNode and TaskTracker, it depends on avalability of physical or virtual resources (Figure 2).

Heartbeat in Hadoop Environment
Heartbeat [11] is a communication mechanism that provides a efficient; yet simple way for a Hadoop system to monitor performance and make that information available to external observers.Applications can use heartbeat  information to automatically add or subtract resources from their pool.HDFS replicates file blocks for fault tolerance.An application can specify the number of replicas of a file at the time it is created.The NameNode makes all decisions concerning block replication.Each DataNode sends heartbeat messages timely to its Na-meNode, so the later can identify loss of connectivity if it stops receiving these messages.The NameNode marks such node as dead DataNode (not responding to heartbeats) and desists from sending requests to it.Data stored on such node is no longer available to a client (Figure 3).

Trusted Platform Module (TPM)
The trusted computing group consortium has developed specifications for the trusted platform module.The TPM is a special purpose microcontroller on a motherboard.By incorporating a physical facility for secure generation and storage of cryptographic keys, the TPM becomes the core supporter for creating an interoperable "trusted computing" environment.These capabilities that every TPM provides include hashing by SHA-1 algorith, random number generation, asymmetric key generation as well as encryption and decryption by RSA algorithm.Following in Table 1; is the list of different types of keys can be created with TPM with their properties.
Integrity verification of the software components to support mitigation of security concerns related to cloud computing infrastructure.Though, it does not actually provide absolute assurance, trusted computing improves the complexity for attackers by operating at hardware level.With a correct implementation, an attacker would need physical access to the hardware in order to subvert the TPM [13].In our proposed work, we use TPM to prevent Man-In-The-Middle attack and verify the integrity of Virtual machine via attestation.

Related Work
There have been many attempts to enhance the fault tolerance and trust based mechanisms to preserve integrity of cloud system in open distributed environment [14].For sensitive data in open distributed systems, Airavat [15] is developed.It incorporates mandatory access control to detect privacy violation.Verification-based Integrity Assurance Framework [16] is based on the idea of replication and quiz related methods.It can detect malicious and normal task trackers in Hadoop system with the help of predefined set of questionnaires.Authors in article [17], proposed algorithm named Longest Approximate Time to End (LATE).LATE finds the slow tasks in a homogeneous environment.LATE first estimates the remaining time for each tasks, then assigns the speculative tasks for those with the longest remaining time to end and maintains integrity of the system.Terra [18] provides an attestation ability that allows a remote party to reliably detect whether the host is running a platform that the remote party trusts.As elaborated by Bercher et al. [19], for encrypted communication between all the nodes in the HDFS system, a key must be securely exchanged in advance.However, there are issues with how the key is shared.As seen in [20]; key exchange is done frequently by heartbeat messages and attacker can pretend to be a data node and can many chunks of data.Table 1.TPM Key types with their pupose [7].

Key Name Purpose Endorsement Key (EK)
A key-pair based on RSA algorithm; imposed by TPM manufacturer to identify uniquely TPM.

Storage Root Key (SRK)
A non-transferable key generated by the platform owner to serve as the root key in the hierarchy of keys associated with the TPM.

Attestation Identity Key (AIK)
Used for attestation and identification of a TPM (i.e.activated mode).Trusted third party can create identity certificate by signing public key part of AIK.

Signing Key
Used by the system to sign messages.

Storage Key
Used to encrypt and decrypt other keys.(using RSA)

Identity Key
Used for operations that requires TPM identity.

Binding Key
Used for Unbind operations to decrypt a data.
We propose a scheme to determine whether a particular VM is trustworthy or not.Only attested and trusted VMs can get the tasks and collaborate in network.Negative reputation is assigned if node does not generate output (or produce malicious output).

Trusted Heartbeat Framework
In this framework, we assume TPM communication cannot temper, and storage is not exposed.The main intention of TPM is to repel most of the attacks on the software, we presume that trusted platform can assess each and every software module loaded on platform in terms of hash code [7].In addition, we assume Master node works as a trusted party which performs attestations as suggested by TC [21].For better understanding of our framework, we denote job tracker as a master node and task tracker as simple node.Proposed framework is as shown in Figure 4.The Task scheduler present at every node executes tasks assign to them.Trust collector is attached to each node to manage assessments and support the attestation service.In master node, the task scheduler deploys jobs to nodes and collects their outcomes.Task manager stores node information and their assign task information.In addition, trust and reputation collector manages the trust information of nodes, and Trust Verifier performs attestations to them.The collected security properties (Endorsement Key and Attestation Identity Key) are stored in the Trust storage.
Trust manger binds evidence generated with accordance to TC's notation as trusted data.Moreover, users can get such information to assess the security properties of the worker node at any time, for the entire processing cycle.The Trust & reputation collector collects such properties of nodes and stores them with corresponding values.These values are kept in the trust storage for future score calculation.Following are the three main procedures for our proposed system.

(a) Initial Node Registration
Initially, when a data node joins a network, node registration takes place.It identifies a genuineness of TPM and exchanges keys for sealing and binding operations.The genuineness of TPM is identified by its public EK key.
Every time a worker node initiate connection request to the master node, an initial attestation procedure will be executed by master node.Verifier has collected all the properties of each node whose information is stored at the storage.Therefore, only registered node with allowed properties will be included to the list of the task Manager (for completing tasks).TC credentials and public session keys are stored at trust storage.
In Trusted Heartbeat framework, every node (N) is identified with its corresponding and unique AIK, and the Master (M) facilitates as the Privacy-CA defined by TC infrastructure [22], for registering and identifying all these AIKs.Whenever a new node is included to the Hadoop based cloud system, it is first get registered at the Master and assigned with AIK key credentials.Figure 5 shows steps for node registration.As suggested earlier, only registered node can communicate with master and can get tasks with genuine TC credentials.Node registration procedure is more elaborated in Figure 6.In our Trusted Heartbeat framework implementation, the Trust & Reputation Collector is added to Master node and Trust collector to every node.Their exchanged messages are incorporated into the heartbeat protocol via Heartbeat manager.To simplify our protocol, we assign manually AIK credentials to node.
Time to time collector and trust storage updates the nonce information, and initiate the attestation procedure by invoking the TPM Quote from TPM instruction with the fresh nonce.As shown in Figure 9(a), latest generated nonce and reputation is then added to the request and sent to the Master.The verifier collects and maintains the public credentials of individual nodes.When a request in heartbeat message; with genuine AIK credentials is received, verifier will first perform verification followed by registration of that node.The properties indicated from the worker node's Stored Message Logs (SML) are inspected with security policies which are defined earlier and stored in Trust storage (shown in Figure 9(c)), and only the expected worker node can be added in future.As shown in node registration procedure (Figure 6), Trust verifier maintains nonce information in a cache (i.e. for faster execution) and revise cache value by execution the SHA-1 hash operation to get the new value.Time interval between each received messages and the stored information of the cache together determines the longer time for a heartbeat to be valid (Figure 7).

(b) Verification of Heartbeats
The verifier from trust and reputation collector is invoked each time, when a heartbeat message with attestation request reaches to the master node.It examines the nonce value in the cache; received through recent heartbeat (last_nonce) message.If verifier does not find that nonce value, it invalidates the connection request through heartbeat message.Once more when worker node sends heartbeat message with valid new_nonce, it can continues to communicate the master and get the task.The trust verifier can verifies the received signature and quote of PCR values using the TPM_Verify [7] set of instructions of TPM.If the verifier finds any mismatch in hash value, it will put that node to gray list and that node has to again start with node initialization procedure (As shown in Figure 9).Difference in PCR values shows variation in node software status, therefore a new assessment requires to be initiated.After successful completion of verification, the new worker node's assessment information is updated for further communication.

(c) Reputation based detection
Reputations are gathered with each Heartbeat message received from Master.Calculated reputations, which are lower than a pre-defined threshold, master node, will unregister that node or mark it as a lost one.The threshold value can be computed based on the number of nodes and previously stored information available at trust storage.The Black list is one that contains a list of all such failed nodes.Similarly, Gray list is one that contains a probable list of nodes that have faced some decrements in reputations (Figure 8).Every susceptible node comes in Graylist first and then after inactivity it will be in Blacklist (shown in Figure 9(d)).Since reputations  are collected in same cluster only, detecting a failed or malicious node is faster compared to collecting all reputations from all the nodes as depicted in Reputation based decision procedure.
A worker node can receive its reputation or penalties through heartbeat messages.Master node increases reputation of a worker node each time when it gets heartbeat messages with hash values.The Trust & reputation based detector has a upper bound for the maximum reputation, After reaching that value, initialization process begins.However, when node comes in graylist then it starts receiving penalties if it does not reply.
Figure 9 shows general step by step working of Trust and Reputation Collector with the procedures.All tasks performed by the Master node are indicated in it.

Conclusion
In this paper, we propose Trusted Heartbeat framework; that creates a collaborative network among virtual machines.With remote attestations and heartbeat messages, a Master node can define the exact status (working or malfunctioning) of its nodes.This proposed framework identifies the genuine worker node using trusted computing facilities.Heartbeat interval time is very important parameter in our system.Trust and reputation based detector improve Hadoop like distributed systems in detecting malicious nodes quickly.This framework shows utilization of common messages to establish trust among all the corresponding nodes in distributed environment.

Figure 6 .
Figure 6.Step by step node registration procedure.

Figure 8 .
Figure 8. Procedure for reputation based decision.

Figure 9 .
Figure 9. Working of our framework.