Grid Scheduling with Qos Satisfaction and Clustering

The objective of the study is to device an Adaptive Machine Scoring Technique with Cluster (AMSTWC) to schedule the jobs/tasks in a grid environment which reduces the overall completion time (make span) and increases the resource Utilization. It also minimizes the execution time of the algorithm and with QoS satisfaction. The scheduling is done for computational as well as data grids. There are many heterogeneous Gridlets/machines which are geographically distributed. So, the searching time of the appropriate Gridlets, most suitable for the given job is more. This algorithm clusters the Gridlets depending on their configurations which reduces the search time of the Gridlets/machines which satisfies QoS. Task requirements are matched against the Machine capabilities available in Grid and AMSTWC selects the machine which has the highest resource score. AMSTWC result is compared with the existing algorithms in terms of make span, Resource Utilization, Flow Time and Execution time. AMSTWC performs better than the existing algorithms in most of the cases.


INTRODUCTION
The engineering and science problems in real world are complex and involves various complicated computation and transferring of big volume of data through the network.In order to solve these problems we need more powerful computers.Utilizing and combining the resources scattered around the world is a good approach.Hence, the concept of grid computing was proposed.Grid computing has emerged as the next generation distributed computing that aggregates dispersed heterogeneous resources under different administrative domain, for solving various kinds of computational and data intensive applications.Grid makes a virtual organization by grouping heterogeneous computers for specific problem solving.To complete the job scheduled in different machines, the underlying network plays a major role.So, we need high network bandwidth and reliable network connection.
Matching the resources for the user request and scheduling the job to the matched resource is an NP complete problem (Taura and Chien, 2000).Monitoring the progress of the job assigned is also difficult since the resources are across different administrative domains (Khateeb et al., 2009) and they are dynamic (Zomaya and Teh, 2001).Job scheduling and resource management in grid is a challenging job.Lots of heuristic algorithms adjust the scheduling strategies according to the nature of job (Kobra and Naghibzadeh, 2007).This study concentrates on QoS satisfaction which is done by getting task requirements like RAM, Budget, network Bandwidth, Operating System and deadline from the user and search the resource suitable for the user's task.QoS satisfaction is needed for the following reasons: • Multimedia applications require resources with high network bandwidth and RAM to transfer bulk of data.No need to have high computing power.• Problems involving partial differential equations to solve need computing power.• Any scientific/engineering problems involving complex computations need more computing power.
Load balance is also an important issue in grid scheduling.The main purpose of load balance is to balance the load of each resource in order to enhance the resource utilization and increase the system throughput.Many load balancing algorithms have been proposed in grid environment (Cao et al., 2005;Suri and Singh, 2010), but they may not be suitable for change in system status.Based on this opportunity for improvement, a new scheduling algorithm is proposed to balance the load of a grid system with adaptive machine scoring while trying to minimize the make span and flow time of job execution.We assign a job to a resource depending on the resource's characteristics while simultaneously considering the load of the machine and execution time of the algorithm.Execution time of the algorithm is reduced in searching of resources by clustering the machines with same configurations.
The objective of the proposed methodology is to minimize the overall completion time of the submitted tasks (make span) to the grid lets.It also maximizes the resource utilization for efficient usage of the available grid lets and searching of an appropriate resource (that satisfies the task requirements of OS, budget, Network Bandwidth and RAM) for the given job is minimized.

LITERATURE REVIEW
Different types of scheduling based on different criteria, such as static versus dynamic environment, multi-objectivity, adaptability, etc., are identified and heuristic and meta-heuristic methods for scheduling in Grids are proposed.The study reveals the complexity of the scheduling problem in Computational Grids when compared to scheduling in classical parallel and distributed systems and shows the usefulness of heuristic and meta-heuristic approaches for the design of efficient Grid schedulers.The requirements for modular Grid scheduling and its integration with Grid architecture is also proposed (Ajith and Fatos, 2010).Workflow scheduling is proposed.The problem of satisfying the QoS requirements of the user as well as minimizing the cost of workflow execution is proposed.On-demand resource provisioning, homogeneous networks and the pay-as-you-go pricing model is proposed.A two-phase algorithm which first distributes the overall deadline on the workflow tasks and then schedules each task based on its sub deadline is proposed (Saeid et al., 2013).
The study proposes resource scheduling in grid computing using a global optimization algorithm which is Bacterial foraging optimization.Main objective is to minimize make span and cost (Rajni, 2012).Bacterial Foraging optimization is used to schedule the resources in grid and it is used for the practical application of protein sequence analyzer.The study proposed Optimization (BFO) for finding similar protein sequences in the existing databases.Usage of BFO reduces the time taken by a resource to execute the user's requests and also the resources utilized are balanced (Vivekanandan and Ramyachitra, 2012).
In order to utilize the power of the grid completely, an Adaptive Scoring Job Scheduling algorithm (ASJS) is proposed.The main objective is to minimize the make span.The computational and data intensive applications were used for scheduling.ASJS selects the fittest resource to execute a job according to the status of resources.Local and global update rules are applied to get the newest status of each resource.Local update rule updates the status of the resource and cluster which are selected to execute the job after assigning the job and the Job Scheduler uses the newest information to assign the next job.Global update rule updates the status of each resource and cluster after a job is completed by a resource.It supplies the Job Scheduler with the newest information of all resources and clusters such that the Job Scheduler can select the fittest resource for the next job.However, the resource discovery tree is constructed for each attribute will take more time to schedule (Ruay-Shiung et al., 2012).A reliable scheduling algorithm is proposed to overcome the hardware failure, program failure and storage failure.A hierarchical-driven scheduling is proposed (Xiaoyong et al., 2012).
A Fault tolerant hybrid load balancing strategy which takes into account grid architecture, computer heterogeneity, communication delay, network bandwidth, resource availability, resource unpredictability and job characteristics is proposed.Objective is to arrive at job assignments that could achieve minimum response time and optimal computing node utilization (Jasma and Nedunchezhian, 2012).The study focuses on computing grid.The system load is taken as a parameter in determining a balance threshold and the scheduler adapts the balance threshold dynamically when the system's load changes.First, the scheduling algorithm balances the system load with an adaptive threshold and second, it minimizes the make span of jobs (Yun-Han et al., 2011).A new Priority based Job Scheduling algorithm (PJSC) in cloud computing is proposed using multiple criteria decision making model (Shamsollah and Mohamed, 2012).

Architectural diagram:
Users submit their jobs to grid portal.Task requirement block collects the requirements and gives them to Resource score calculator.Resource Score calculator gets Resource capability information from Grid Information Service (GIS) through Gird Broker and it calculates the resource score for all tasks of all resources.Then, Resource Score is passed to Grid job scheduler through Grid Broker to schedule.Finally, Grid broker assigns the task to the resources and execution is carried out in the resources.After completing the task, resource manager reports the result to the requested users (Fig. 1).

PROBLEM DEFINITION AND PROPOSED METHODOLOGY
The problem is to minimize the overall completion time as well as increase the resource utilization while allocating m resources to do n tasks where (n>m).Allocation is done in an offline manner.To formulate the problem, define Ti where i = {1, 2, 3,… n} as n independent tasks permutation and Rj where j = {1, 2, 3, … m} as m computing resources.Suppose that the Expected Time to Complete (ETCi, j) (Braun et al., 1999) is the processing for task i when computing The minimal C (x) represents the length of schedule of whole tasks working on available resources.

Methodology:
The resource is selected for a task using resource score.For each task the resource score is calculated as in Eq. (2): If the application is computationally intensive, then α = 1 else it is data intensive for which α = 0.The data intensive application can be found using the task requirement Network Bandwidth parameter.If the required network Bandwidth parameter is above the threshold, then the application is data intensive.Threshold is taken as 70% of the maximum bandwidth requested by the task.If the task network bandwidth request is above 70% of the maximum task request network bandwidth then α is 0 else α is 1.So, α decides whether the application is computationally intensive or data intensive.
Resource score is depending on make span Eq. ( 3).-if it is computational grid), network bandwidth satisfaction Eq. ( 4) -if it is data grid) QoS satisfaction Score Eq. ( 5) -QoS Score), Load Balance Score Eq. ( 6) LB Score) and Resource availability: Make span score is high if the expected time to complete is low, in turn the resource score is high: This can be avoided by replacing with the term: with the value 1 which is the highest possible value: LB Score j is depending on the Load Balancing parameter of the Resource j .The load balancing parameter of each resource is initially set to some value (may be 100).If any one Task i is assigned to the Resource j , then the load balancing factor value of the Resource j is decreased such that in next selection the Resource Score inturn is reduced.For each task the resource with the highest resource score is selected.The execution time of the algorithm is reduced because the hierarchical tree is constructed by clustering the resources as groups for the resource known as resource discovery tree as in Fig. 2 for the attribute Operating system.Algorithm which is using this tree is known as AMST with Clustering (AMSTWC).AMSTWOC algorithm is not using this clustering tree for searching resources.

Bitmap representation for resource discovery tree:
Let {R 1 , R 2… R n } be the set of resources and {A 1 , A 2 , … A n } be the set of attributes of the resources (Ruay-Shiung and Min-Shuo, 2010).In each level of the tree, one attribute is checked for the task requirement and the remaining tree is pruned from checking for further attributes in searching of resources method.The Bitmap Data structure used for the tree is in Fig. 2.
In searching process for QoS satisfaction, if the task requirement is Unix OS then, the sub tree 2 and sub tree 3 searching is pruned in such a way that the searching time is reduced.This inturn reduces the algorithm's execution time.

Proposed algorithm AMSTWC algorithm:
Step 1: Generate ETC matrix Step 2: Get the task requirement matrix and resource capability matrix from the Gird Information Service Step 3: Construct Resource capability tree Step For each task i do Find whether the task i is data intensive or computational intensive Calculate resource score of all resources for task i Select the resource j which has the highest resource score value Assign the task to the selected resources.Update the resource load Until all tasks are assigned Step 5: Calculate make span, flow time and resource utilization and record the algorithm execution time Step 6: End

Performance metrics:
• Make span: Overall completion time which is defined in Eq. ( 1  and machine heterogeneity (hi-high and lo-low).An ETC matrix is consistent when a machine is faster than others for all the jobs.Inconsistency means that a machine is faster for some jobs and slower for some others, while it is semi-consistent if it contains a consistent sub-matrix.The values are taken as an average for 100 runs for α = 1 (Computational Grid).
For different values of α, for example 0.25, 0.50 and 0.75 (i.e., 75, 50 and 25% of data grids in the task request) the comparison is given in Table 5 to 8.
In Table 5 to 8, our algorithm performs 100% better than AMSTWOC in all the metrics for all combinations of heterogeneity for various combinations of (75, 50 and 25%, respectively) data grids.Figure 3 and 4 show the make span comparison between the existing and the proposed algorithm (AMSTWC) for high task and high machine heterogeneity-consistent and inconsistent combinations.For both the combinations, as the number of tasks increases, our proposed algorithm gives reduced make span time.Compared to consistent combination, the inconsistent combination gives better performance in terms of make span.Figure 5 shows the flow time performance for high task and high machine heterogeneity-consistent combination comparison, our algorithm yields better results compared to FCFS. Figure 6 shows high task and high machine heterogeneity-inconsistent combination comparison; our algorithm has better resource utilization.Figure 7 shows the algorithm execution time comparison for high task and high machine heterogeneity-partial consistent combination, our method gives more or less the same performance as that of FCFS even though our algorithm has the matching process time of an appropriate resource for the task submitted.Sample graphs are given in Fig. 3 to 7.

CONCLUSION AND RECOMMENDATIONS
The focus of our study is on QoS satisfied task scheduling with load balancing of resources.The experimental results show that the proposed algorithm is performing well in terms of execution time, resource utilization and make span.Our algorithm is poor in flow time because the selected resources are QoS satisfied resources.So, the jobs are waiting until it gets the QoS satisfied resource.
In future, we will adjust the score and add more parameters for the realistic environments.In real environments, because of more dynamic nature of grid, many more factors like reliability have impact on make span of the scheduling process.Reliability model may be included in future.

Fig. 1 :
Fig. 1: Architectural diagram time on resource j is known.The completion time C (x) represents the total time of completion of all n tasks.The objective is to minimize C (X) in Eq. (1):

•
) • Resource utilization: Resource utilization defined as the degree of utilization of resources with respect to the schedule.It is defined as follows: i] is the completion time of last job on machine i and m is the number of machines.Objective is to maximize the resource utilization for all possible schedules • Flow time: Flow-time is the sum of the finishing times of jobs.Objective is to minimize the flow time.It is defined as: Algorithm execution time: Objective is to minimize the execution time of the proposed algorithm Benchmark description: The benchmark by Braun et al. (1999) is a frequently used benchmark that is very effective in simulating grid systems and capturing most important characteristics of the job scheduling problem.In it, instances are classified according to three parameters (job heterogeneity, machine heterogeneity and consistency) into 12 different types of ETC matrices, each of them consisting of 100 instances.All instances are composed from 512 jobs and 16 machines.They are labeled as u x yyzz where u means uniform distribution (in the matrix generation), x is the type of consistency (c-consistent, i-inconsistent and p means partially consistent), yy and zz indicate the job -------------------------------------

Table 5 :
Make span comparison

Table 7 :
Resource utilization comparison