50120130406041 2

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING &
ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME

TECHNOLOGY (IJCET)

ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 4, Issue 6, November - December (2013), pp. 378-385
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2013): 6.1302 (Calculated by GISI)
www.jifactor.com

IJCET
©IAEME

PRIORITY BASED DYNAMIC ADAPTIVE CHECKPOINTING STRATEGY
IN DISTRIBUTED ENVIRONMENT
Priya Deshpande

Sunayna Giroti

Assistant Professor -MITCOE Pune

ME IT Student

ABSTRACT
Dealing with a fault and its recovery in a distributed data is always a matter of concern from
many years. Recovering the lost data with checkpoints is one way out. But how to store snapshots in
checkpoint is a big concern as working with large number of checkpoints may result in poor system
performance. This paper deals with the dynamic adaptive checkpointing strategy in distributed
system, which takes into account an important issue i.e. storing checkpoints on namenode for failure
recovery on the basis of priority given to the datanodes. So that bandwidth consumption of the
network can be decreased. For this purpose we have suggested new architecture which will help us
define this strategy. This strategy optimizes the process of checkpointing by consuming lesser
bandwidth then the usual one.
Keywords: Dynamic checkpointing, Access calculator, Priority scheduler.
I. INTRODUCTION
With the increasing popularity of distributed environment, application such as Life science,
Telecommunication, nuclear research and many more are using the system for performing important
tasks. Therefore data of these applications must be stored in a secure way or should be easily
recoverable at the time of failure. One of the well known ways to recover faulty data in distributed
system is by using checkpoints. Checkpoints provide a system with ability to save its present state in
the form of snapshots, and tolerate failure by enabling a failed datanode to recover to a previous safe
state [5]. Whenever a fault takes place in the system, checkpoint is executed to recover it. Presently
most of the checkpointing strategy are periodic i.e. either checkpoints are stored in constant time
interval or they are stored in variable time dependent upon requirement of the system. But in both the
ways the bandwidth consumption of the system is high as they are storing each and every data
weather it is important or not. The checkpoints must be designed in such a way that they dynamically
take the snapshot of the memory structure which is important part of datanode and save them first.
378


But the concern is to recognize from the large amount of data, is which is more important and which
is less. As the answer of this question we have designed the system on the basis of the priority given
to the datanode dynamically. So that, the bandwidth consumption of the system is decrease. For this
we have calculated how many times the datanode is accessed by the users and on the basis of this
calculation we have provide the priority to the data. These priorities will decide which datanode get
the checkpoint first and which will get second. This will reduce the number of checkpoints hence the
bandwidth consumption of whole system will be low.
The rest of the paper is organized as follows. Section 2 describes the related work. Section 3 gives
the overview of proposed architecture. Section 4 describes the dynamic adaptive checkpointing. And
finally in section 5 we conclude.
II. RELATED WORK
In the distributed environment checkpointing strategies are common to for handling faults.
There is always a concern about data whenever we talk about fault tolerance. What so ever are the
strategies use to perform checkpointing; goal is always to recover the faulty data. But performing
checkpointing always reduces the system performance as we need to keep a copy of data in form of
snapshot which consume a lot of memory. Dealing with this concern there are many researches going
in this area.
John De Vale[5], provided a basis idea about checkpoints, faults, checkpointing and recovery
of the system using these methods.
One of most researched approach in the field of checkpointing is a diskless checkpointing.
Ge-Ming et al.[1] proposed a neighbor-based scheme on diskless checkpointing to achieve good load
balancing. Whereas Raphael Marcos et al.[2] used diskless checkpointing to increase system
performance by deceasing number of checkpoints using quasi-synchronous protocol. Similarly,
Leonardo et al.[9] overcome the drawbacks of disk-based model using diskless-based model to
increase the scalability of the system.
In [6], two time based checkpointing were compared i.e. full checkpointing and incremental
checkpointing to get the better model. In [4], a full checkpointing is used over incremental to
introduce dynamic adaptive fault tolerance model so that serviceability can be maximized. Whereas
in [8], two different checkpointing models are used i.e. local checkpointing and global
checkpointing, coordination between these two checkpointing models is used for increasing system
performance and decreasing time interval between two checkpoints.
Maria Chtepen et al.[7] proposed a periodic checkpointing to reduce system load and to
increase availability of system using heuristic approach.
III. PROPOSED ARCHITECTURE
Proposed architecture of our checkpointing strategy is shown in figure 1.

379


Secondary
NameNode

NameNode

Priority
Scheduler

CE

AC

AC

AC

CE

SE

SE

CE

SE

Figure 1: Proposed Architecture[10]
Component of our architecture are as follows:
•

NameNode- It is a master server which allows access to the data stored in it. It is responsible
for operation such as opening, closing and renaming files and directory [10]. In short it
consists of metadata of system. It is configured to support and maintain checkpoints. So, data
can be recovered at the time of failure. Any update in the files took place synchronously.

•

Secondary NameNode- It is a copy of namenode. The only purpose of secondary namenode
is to provide backup to the system. That is, when namenode get fail data can be recover from
the checkpoints store in secondary namenode.

•

DataNode- It is used to manage stored data at every node in the system. Datanodes are
responsible for serving read and write request from users [10]. Datanode is consist of an
access calculator (AC) which calculates access of particular data record in a datanode,
computing element (CE) and storage element(SE).

•

AC (Access Calculator)- It will calculate the total access of data record in a datanode
between time t1 and t2. It also calculate the time interval for sending snapshots to namenode
using value which will be send by priority scheduler.

•

CE (Computing Element)- Each datanode contain 1 or more computing elements for its
computing capability.

•

SE (Storage Element)- Each datanode contain 1 or more storage elements to represent its
storage capacity.

•

Priority Scheduler- It will calculate the priority of the data record in datanode on the basis of
the result of the access calculator i.e. access count of a file in datanode.
380


Whole process goes like this: All the data in forms of files is stored in the datanode. Each
datanode is embedded with the access calculator in it. These access calculators calculate how many
times a particular data record is accessed by the users. The access count of a data record will send at
constant time interval to the priority table. Priority table consists of list of records most recently
accessed by the users. Depending upon the result of the access calculator; priority scheduler
calculates the priority of each data record. Details of access calculator and priority scheduler are
described in section 4.
The snapshots of data record with the higher priority get the checkpoint first. This checkpoint
is then stored on the namenode. Namenode updates the record of each checkpoint synchronously.
The copy of each checkpoint is also stored on secondary namenode. So that, at the time of namenode
failure data can be recovered from there. The manner in which checkpointing is happening in our
proposed system should minimize the overall bandwidth consumption.
IV. DYNAMIC CHECKPOINTING
Here we have taken some assumptions. The failure occurred in the system are not transits and
failure can be detected at the run time also. The reason behind taking these assumptions are that
dynamic checkpoints, we are using in our system are always present to recover the fault and the
failure can be recovered from these checkpoints. The fault occurring in the system can always be
recoverable till it is not permanent [2]. The checkpoints are getting snapshots of the data record of
the higher priority first. These priorities can be taken from priority table. The priority of the datanode
is assigning on the basis of the result calculated by access calculator. Access calculator calculates,
how many times a datanode is being accessed by the system users. With accordance of these results
checkpoints get their respective snapshots. These checkpoints store on namenode as well as
secondary namenode.
Whenever we talk about implying checkpoints dynamically so that bandwidth can be
consumed less, there are always two things we have to keep in mind:
•
•

How to decide which data record is important?
How to provide priority to that data record?

The answer to both the questions lies in the two algorithms discussed below:
A. ACCESS CALCULATOR
It will calculate the access count on the particular datanode. That is how many times a data
record is accessed by users. The count is calculated for the most recent time duration i.e. between
current time and the time when last time access count were generated. Access count of each record is
sending in some predefined constant time interval. After every predefined constant time S an updated
access count will be send to the priority table. Every time a data record is accessed in between last
count set and current time, the access count will be increased by 1. Time interval for sending
snapshots to the namenode is also calculated here.
Below are the terms used in the following algorithm:
•
•
•
•

t1: Last count set time.
t2: Current time.
tn: Predefine constant time for sending snapshots.
p: Priority which will be send by priority scheduler.
381


Count=0
For each access entry in datanode between t1 and t2
{
Count++;
}
Send (datanode id, count)
tm =tn *((p+1)/2)

B. PRIORITY SHEDULER
Priority scheduler is calculating the priority of the data records in the datanode on the basis of which
we are assigning checkpoints for respective snapshots. The priority of a data record is decided on the
results of access calculator i.e. how many times a data record is access by the user. The priority
scheduler works on two different loops. First, for loop is to check whether the data record is present
in the priority table or not. If not then it adds a new data record in the table. Also if data record is
already present in the priority table is updating its value with the help of new access count calculated
in access calculator. Second, for loop check the already existing data record in the priority table i.e.
whether access count of a data record is same or its changing. If the value of access count of a data
record is same for more the three times that record will be delete from the table. Else it will provide
average priority to all the data records in the table.
Priority scheduler (access info list)
New priority=0, last access count=0
For each record in access info list
{
If record.access count != last access count
{
New priority++;
last access count= record.access count
}
If record in priorityTable

382


{
Update record
Set (absent count=0, priority= new priority, access count= new access count value)
Else
{
Add new record in priority table
}
}
For each record in priority table
{
If record is in access info list
{
If record.absent count <= 3
{
Average priority = average (priorities in priority table)
Update record in priority table
Set (absent count= absent count +1, priority = avg priority)
}
else

{
Remove Record from priorityTable
}
}
}

383


Terms used in algorithm
•

Access Info List- It is the list of all the data records in the datanodes with access count of
eachdata record. These records are sorted in reverse order.

•

Priority Table- This table contains the record of datanode id and absent count of each data
record. It also contains priority of each data record respectively.

•

Absent Count- It is used to calculate, how many times a particular data record has not been
accessed.

•

Record Id- It is a unique identification number given to each data record in the system.

V. CONCLUSION
In this paper, we proposed a dynamic adaptive checkpointing strategy which first calculates
how many times a data record is accessed. On the basis of access count of a data record a priority is
given to that data record. The data record with the higher priority will send their snapshots first and
its checkpoint is saved on namenode. If a data record is not access for a predefine time interval, it
will be removed from the priority table. Previously the snapshots were send on constant time interval
which consume large network bandwidth. But using our strategy may reduce the network bandwidth
consumption as checkpoints are stored dynamically on the basis of priority. This will also increase
overall performance of the system. But still there are many areas need to be considered for the
improvement of performance in distributed environment.
VI. REFERANCES
[1]

[2]

[3]
[4]

[5]
[6]

[7]

[8]

Ge-Ming Chiu and Jane-Ferng Chiu, A New Diskless Checkpointing Approach for Multiple
Processor Failures. Ieee Transactions On Dependable and Secure Computing, Vol. 8, No. 4,
July/August 2011.
Raphael Marcos Menderico and Islene Calciolari Garcia, Diskless Checkpointing with
Rollback-Dependency Trackability. 2010 29th Ieee International Symposium on Reliable
Distributed Systems.
Yibei Ling, Jie Mi, And Xiaola Lin, A Variational Calculus Approach to Optimal Checkpoint
Placement. Ieee Transactions on Computers, Vol. 50, No. 7, July 2001.
Dawei Sun, Et Al., Analyzing, Modeling and Evaluating Dynamic Adaptive Fault Tolerance
Strategies in Cloud Computing Environments. Springer Science+Business Media New York
2013.
John Devale, Checkpoint/Recovery. 18-849b Dependable Embedded Systems February 4,
1999.
N. Naksinehaboon, High Performance Computing Systems with Various Checkpointing
Schemes. Int. J. Of Computers, Communications & Control, Issn 1841-9836, E-Issn 18419844 Vol. Iv (2009), No. 4, Pp. 386-400.
Maria Chtepen, Et Al., Checkpointing and Replication: Toward Efficient Fault-Tolerant
Grids. Ieee Transactions On Parallel And Distributed Systems, Vol. 20, No. 2, February
2009.
Mehdi Lofti and Seyed Ahmad Motamedi, Adaptive Two-Level Blocking Coordinated
Checkpointing for High Performance Cluster Computing Systems. Journal of Information
Science and Engineering 26, 951-966 (2010).
384


[9]
[10]

[11]

[12]

[13]

Leonardo Bautista Gomez, Et Al., Distributed Diskless Checkpoint For Large Scale Systems.
2010 10th Ieee/Acm International Conference on Cluster, Cloud and Grid Computing.
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, Yahoo!, Sunnyvale,
California Usa, {Shv, Hairong, Sradia, Chansler}@Yahoo-Inc.Com, ”The Hadoop
Distributed File System”.
Preeti Gupta, Parveen Kumar and Anil Kumar Solanki, “A Comparative Analysis of
Minimum-Process Coordinated Checkpointing Algorithms for Mobile Distributed Systems”,
International Journal of Computer Engineering & Technology (IJCET), Volume 1, Issue 1,
2010, pp. 46 - 56, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.
Parveen Kumar and Poonam Gahlan, “A Minimum Process Synchronous Checkpointing
Algorithm for Mobile Distributed System”, International Journal of Computer Engineering &
Technology (IJCET), Volume 1, Issue 1, 2010, pp. 72 - 81, ISSN Print: 0976 – 6367,
ISSN Online: 0976 – 6375.
Priya Deshpande, Brijesh Khundhawala and Prasanna Joeg, “Dynamic Data Replication and
Job Scheduling Based on Popularity and Category”, International Journal of Computer
Engineering & Technology (IJCET), Volume 4, Issue 5, 2013, pp. 109 - 114, ISSN Print:
0976 – 6367, ISSN Online: 0976 – 6375.

385

50120130406041 2

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (7)

Similar to 50120130406041 2

Similar to 50120130406041 2 (20)

More from IAEME Publication

More from IAEME Publication (20)

Recently uploaded

Recently uploaded (20)

50120130406041 2