Weitere ähnliche Inhalte Ähnlich wie 50120130406042 (20) Mehr von IAEME Publication (20) Kürzlich hochgeladen (20) 501201304060421. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING &
ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME
TECHNOLOGY (IJCET)
ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 4, Issue 6, November - December (2013), pp. 386-393
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2013): 6.1302 (Calculated by GISI)
www.jifactor.com
IJCET
©IAEME
COMMUNICATION BETWEEN DISTRIBUTED SYSTEMS USING GOOGLE
INFRASTRUCTURE
Isak Shabani1,
1
Amir Kovaçi2
Asst. Professor, University of Prishtina “Hasan Prishtina”, Kosova
Master student, University of Prishtina “Hasan Prishtina”, Kosova
2
ABSTRACT
Distributed Systems are software systems in which the components installed on computer
networks communicate with each other by passing messages in order to perform interconnected
operations. Programs which run on the distributed systems are known as distributed programs and
are designed using distributed programming. Computer networks are spread everywhere, mobile
networks, enterprise networks and other kind of networks share same properties. Distributed
communication in Google is reached by using Google File System (GFS) which enables efficient and
reliable access on data. The main purpose of distributed system design is the share of resources
which are possible to be shared in computer networks.
Keywords: GFS, Google AppEngine, Datastore, SDK.
1. INTRODUCTION
Distributed systems are defined as those systems in which hardware and software
components communicate through message exchange, those systems can physically be located in
different locations but for those to communicate efficiently they should meet conditions such as
concurrent communication, synchronization of actions with the other computer nodes, independent
handle of failures within the system so that the other parts to be unaffected from the failure.
Google File System (GFS) is at the core of data processing and storage of the Google as
search engine. GFS files are separated in chunks similar with cluster or sectors in the traditional file
system. Further more GFS infrastructure contains multiple nodes; the Master Node is the main one
and multiple Chunk servers. Search and other operations on data are performed similarly like in the
relational database systems. Big tables enable data manipulation on the big data with high
performance outcomes. Through specific search strategies and data operation features Google
manages handling of big huge information, processing of many queries and producing great results
386
2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME
for the clients. Critical requirements that Google is able to fulfill are those related to reliability and
continuous data provide [3].
Different from the centralized systems where data processing resources are on the main data
processing center, at the distributed systems approach each of the components is self sustainable in
most of the case and the systems do not have a single point of failure. Recently many distributed
systems are designed using different technologies and programming language, enriched with many
namespaces designed specifically for the distributed systems. Important is identification of the
programming languages and tools which are suitable for designing of distributed systems using the
Google possibilities as a framework for developing such systems. In this work will be examined the
aspect of designing of distributed systems by using Google features together with the libraries in
Java programming language.
Today’s web based systems very often are based on complex distributed systems which itself
are built upon different software modules, developed by different teams, programmed in different
languages and spread in different machines.
Evolution and advancement of computer systems, the rapid development of LAN and WAN
networks and rapid growth of data exchange has enabled connection and information exchange
between each other of different machines (PC, server, mobile equipments). This development has
further taken to the advancement of computer networks known as distributed systems.
There are different definitions for the distributed systems; all of them have in common the
conclusion that with distributed systems in thought in a group of computer equipments which are
independent of each other while the end user has the impression that the system is a single one.
Likewise the centralized systems, distributed systems have their hardware and software
components. While the computers are autonomous from each other, from the software point of view,
applications are uploaded in the server; they should function in the manner so the users think that
they are using a single unique server.
2. DISTRIBUTED SYSTEMS COMMUNICATION
In distributed systems, usually are used three main communication models for the
components to interact with each other:
• Inter process communication
• Remote Invocation
• Indirect Communication
Interconnected process communication is performed through send-receive procedures
represented as a sequence of bytes. Locking mechanism is used in order to synchronize
communication between the sender and the receiver.
Sockets enable communication between two end points of the processes. Local ports and the
internet address enable the reception of the message by the given process.
Indirect communication is carried out with an intermediary between the sender and the receiver;
publish subscribe systems, message queues and shared memory approaches are typical cases of this
communication setup.
3. GOOGLE INFRASTRUCTURE
From the distributed perspective, Google is built from a number of distributed services which
provide the basic functionalit, in general these features can be grouped as following [1]:
•
Communication layer model including the service for remote procedure call and indirect
communication through request serialization of remote calls.
387
3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME
•
•
Coordination and service data providing data access by means of :
GFS, which handles the requests coming from different applications and services.
Chubby, provides coordination services and data storage option on small volumes of
data.
Bigtable, provide the database which provides semi structured access on data [8].
Services for distributed processing performing parallel operations in the physical layer
infrastructure.
3.1. Google as Cloud Service
Google nowdays plays a significant role in the technology of cloud computing which is
defined as a set of applications based on the internet, with the capability of storing and processing
most of the user requirements.
Main services provided by Google are Gmail, Google Docs, Google Talk, Google Calendar
which aim to replace the traditional software. Through the service platform, Google offers API for
distributed systems and internet services for hosting web applications. With the inception of Google
App Engine, Google has managed to go beyond software services and in provides the infrastructure
for distributed services as cloud services where the business and organization can run their web
applications through Google framework [2].
3.2.Google File System (GFS)
Similar with the file systems used for general purposes providing possibilities on file and
directory operations on different applications, GFS also is also a distributed file system with a variety
of abstractions and provides more advanced capabilities. The main aim is to process the growing
requirements of Google as a search engine and other request which come from other web
applications. There are a set of requirements which GFS must fulfill:
•
•
•
The first requirement is GFS to be executed in a reliable way in the hardware and software
architecture. The designers of GFS started with the assumption that components will fail (not
just hardware components but also software components) and that the design must be
sufficiently tolerant of such failures to enable application-level services to continue their
operation in the face of any likely combination of failure conditions
GFS is optimized for the patterns of usage within Google, both in terms of the types of files
stored and the patterns of access to those files. The number of files stored in GFS is not huge
in comparison with other systems, but the files tend to be massive. The patterns of access are
also atypical of file systems in general. Accesses are dominated by sequential reads through
large files and sequential writes that append data to files, and GFS is very much tailored
towards this style of access. These file patterns are influenced, for example, by the storage of
many web pages sequentially in single files that are scanned by a variety of data analysis
programs. The level of concurrent access is also high in Google, with large numbers of
concurrent appends being particularly prevalent, often accompanied by concurrent reads.
GFS must meet all the requirements for the Google infrastructure as a whole; that is, it must
scale (particularly in terms of volume of data and number of clients), it must be reliable in
spite of the assumption about failures noted above, it must perform well and it must be open
in that it should support the development of new web applications. In terms of performance
and given the types of data file stored, the system is optimized for high and sustained
throughput in reading data, and this is prioritized over latency. This is not to say that latency
is unimportant, rather, that this particular component (GFS) needs to be optimized for highperformance reading and appending of large volumes of data for the correct operation of the
system as a whole.
388
4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME
3.3. GFS Architecture
The most spread used design for storage in GFS is that based on fixed sized chunks, where
each of the chunks is of 64 MB. This is quite large compared to other file system designs. At one
level this simply reflects the size of the files stored in GFS. At another level, this decision is crucial
to providing highly efficient sequential reads and appends of large amounts of data [2]. The job of
GFS is to provide a mapping from files to chunks and then to support standard operations on files,
mapping down to operations on individual chunks. This is achieved with the architecture shown in
Figure 1which shows an instance of a GFS file system as it maps onto a given physical cluster. Each
GFS cluster has a single master and multiple chunkservers. The role of the master is to manage
metadata about the file system defining the namespace for files, access control information and the
mapping of each particular file to the associated set of chunks. When clients need to access data
starting from a particular byte offset within a file, the GFS client library will first translate this to a
file name and chunk index pair (easily computed given the fixed size of chunks) [5]. This is then sent
to the master in the form of an RPC request (using protocol buffers). The master replies with the
appropriate chunk identifier and location of the replicas, and this information is cached in the client
and used subsequently to access the data by direct RPC invocation to one of the replicated
chunkservers.
Control Flow
Client
GFS master data
GFS Client
Library
Data Flow
GFS
chunkser
ver data
......
GFS
chunkser
ver
Figure 1: GFS Architecture
4. GOOGLE APP ENGINE
Google App Engine represents the service to host the web application which usually are
accessed through a web browser which can be social networks, games, mobile application,
publications etc. The engine also can serve also to other traditional applications like documents,
images, videos, but the main aim purpose of the engine are the dynamic application running on real
time [4].
Especially the Google engine is designed to serve as a storage layer for the applications
which need multiple simultaneous user access, the engine handles this through adaption with the
changes in the environment (scalability). With the increase of the number of users the engine
allocates the necessary resources.
389
5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME
4.1. Execution Enviroment
The Google engine is composed from three main features [5]: application instances, data
storage and the services. The engine accept the request and identifies the application from the
domain name address, it follows with the selection of one of the servers, which in most of the cases
is the server with the most rapid reply. For the resources to be used as much as possible and
avoiding re-initialization, the engine allows the execution environment to be longer than the time
needed to handle the requests. Each of the instances has local memory which servers to save the
imported code and data structures. The engine creates and destroys the instances based on the traffic
needs.
Different from the traditional methods of accessing the application, the application code
cannot access in this way the server, the application can read the files from the file system but cannot
read on them, also the reading of the application files canon be performed.
Google App Engine supports the languages: Java, Python and the environment based in the Go
language.
4.2. Static File Server
The static files (html, CSS, images) which do not change during the operations are not
needed to be programmed and served by the application. These files are managed through the
dedicate server for the static content. The servers are optimized for the access for the internal
architecture and the network for processing the static resources [7].
4.3. Datastore
In the recent decades the most used approach for data storage of web application is that based
on Relational Databases setup. Tables, rows, column and queries are used for storing and display of
the data. Other types are those based on the hierarchical organization (file system, XML databases)
and object databases. The database system of GAE resembles more to object databases and it does
not support join operations on queries.
The queries run against the datastore return one roe more entities of a given type and they can
also return only the keys of the entities. Queries also can filter the data on the different conditions
based on the values of properties of the entities; data ordering also is possible to be made.
Distinct from the Relational database where the queries are planned and executed in the real
time against the tables which are stored in the way they are designed by the developer/designer, in
the Google App Engine queries are run differently, each of the queries is managed by an index which
itself is manged by the datastore. When the application runs a query, datastore finds the
corresponding index, it proceeds by scanning the first row which matches the index and returns the
entity for each of the rows linked to the index till the first row which does not match query. Google
App Engine provides some indexes for simple queries, while for the complex queries the application
itself must have additional information for indexes during the configuration phase. The engine itself
offers the possibility of creating a configuration file in which the indexes used during the test phase
can be used. With the transaction processes are performed in the way that the changes are done in
full or in case of any failure they are rolled back to the previous state so that during the multiple
simultaneous accesses the data are in coherent state. When the commands are called though the data
store API, the result is returned to the caller only after the transaction is performed successfully.
4.4. Big Table
Big tables are storage systems for saving structured data up to the levels of petabytes [PB]
and they can be distributed across thousands of servers. Applications such as Google Earth, Google
Finance, Orkut are typical cases of use of big tables. These applications have very extreme
requirements where is required that through asynchronous processes millions of operations to be
390
6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME
performed at thigh speed. Big tables store their data in multidimensional arrays sorted in rows,
columns and time stamp. The columns which are accessed more frequently are arranged to be as
family of columns. With the time stamp mechanism is arranged that multiple versions of the content
to be saved in the same cell and the last version to be accessible. The rows are sorted alphabetically
and in the groups, groups with similarities are stored in the same engine for easy access.
Big table help Google to have small increase of costs for new services and processing power.
Big tables are located above the GFS for data storage and log files and the system for task
management.
Chubby[2] is the server tasked to provide only one server for storing the location where the
program with the data runs, it also serve to store schema information.
To implement applications which employ big tables a master server and other secondary
server are needed. Google through its engine offers the possibility to install and run the applications
in the Google infrastructure. Applications developed through Google’s engine are easily developed
and maintained and are scalable. To develop such applications the programming language such as
Java, Javascript, Python, Rubyetj can be used [10]. Datastore stores entity objects with their
properties and support different data types; it also offers the means to execute multiple operations
through a single transaction which is very important for web applications which require simultaneous
access by many users.
Differently from the relational database systems, datastore uses distributed architecture
for big dataset management and it differs a lot how it describes the relationships between data
objects [6].
Two entities of the same kind can have different properties, whereas different entities can
have properties with same name but with different type.
Although there are quite similarities with relational databases, it has quite some differences
with them, joins cannot be performed, this is more because datastore is designed to process huge
data.
Each data record represents an entity and in the code is interpreted as an object; each entity
has a key which uniquely identifies among the other entities in the datastore. Each of the entities has
one or more named entities which are attributes of the object.
In the code below we have shown the case where a datastore of Student kind is created; it
contains some properties with different types of values and saves them in a new entity.
//Datastore initialization
Datastore Servicedatastore = new DatastoreServiceFactory().getDatastoreService();
Entitystu = newEntity("Student");
stu.setProperty("Emri", req.getParameter("emri_input"));
stu.setProperty("Mbiemri", req.getParameter("mbiemri_input"));
stu.setProperty("Email", req.getParameter("email_input"));
//Ruajtja e te dhenave ne tabele - datastore
datastore.put(stu);
Save, display and delete of the entities is done by using the corresponding commands of the
datastore. Data can be retrieved from the store by using getDatastoreService method
DatastoreServiceFactory class. To create or edit an entity in the datastore we call the mthod put() by
supplying as a parameter the name of the entity. Display and deletion of a record is performed using
the put() and get() methods. Queries of class are used to build the queries whereas PreparedQuery is
used to display the entities from the datastore write in Java Eclipse.
391
7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME
DatastoreServicedatastore = DatastoreServiceFactory.getDatastoreService();
//Paraqitja e te dhenave ne menyretabelare
//Marrja e te dhenave permes klases Query
Query q = newQuery("Student");
//Marrja e rezultateve permes interface Prepared query
PreparedQuery pq = datastore.prepare(q);
for (Entityresult : pq.asIterable())
{
//shkruarja e te dhenavepercdo rresht
writer.write("<tr>");
writer.write(" <td>"+result.getProperty("Emri") +"</td> ");
writer.write("<td>"+result.getProperty("Mbiemri") +"</td>");
writer.write("<td>"+result.getProperty("Email") +"</td>");
writer.write("</tr>"); }
Figure 2: Display of data of student entity from datastore viewer
5. CONCLUSION AND FUTURE WORK
In this paper we have presented some of the main features of Google as a distributed system
and the possibilities it offers as a platform for building distributed systems. Designing distributed
systems through Google capabilities such cloud computing results in distributed systems which are
stable, secure, reliable and easy maintainable.
With Google App Engine and with the application of languages such as Java and Python web
based applications can be built without investing too much time and money.
A big advantage of distributed applications developed with Gogole infrastructure (NoSQL
databases) is high scalability. NoSQL databases are designed specifically with the requirements of
the Internet as the focal point. To provide reliable access to millions of visitors around the world in a
few hundred milliseconds, you need functionality beyond that of relational databases.
Google App Engine offers the datastore as NoSQL storage. It allows you to store entities, each with
a set of key-value pairs. Among other benefits of the App Engine offering is that you need not worry
about system administration, and its APIs easily integrate with the rest of the platform.
392
8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME
REFERENCES
[1]
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike
Burrows, Tushar Chandra, Andrew Fikes and Robert E. Gruber, “Bigtable: A Distributed
Storage System for Structured Data”, Google, Inc, Journal ACM Transactions on Computer
Systems (TOCS), Volume 26 Issue 2, Article No. 4, June 2008
[2] Seth Gilbert and Nancy A. Lynch, “Perspectives on the CAP Theorem”, Volume 45 Issue 2,
February 2012
[3] Jason Baker, Chris Bond, James C. Corbett, JJ Furman, Andrey Khorlin, James Larson, Jean
Michel L´eon, Yawei Li, Alexander Lloyd and Vadim Yushprakh, “Megastore: Providing
Scalable, Highly Available Storage for Interactive Services”, CIDR, 2011
[4] Sergey Brin and Lawrence Page, “The Anatomy of a Large-Scale Hypertextual Web Search
Engine:, Computer Science Department, Stanford University.
http://infolab.stanford.edu/pub/papers/google.pdf
[5] Andrew Fikes, “Storage Architecture and Challenges Google”
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//
university/relations/facultysummit2010/storage_architecture_and_challenges.pdf
[6] Fay Chang and Jeffrey Dean, RESEARCH PAPER BASED ON BIGTABLE, “A Distributed
Storage System For Structured Data”,
http://resumegrace.appspot.com/pdfs/ResearchPaper_BigTable_Distributed.pdf
[7] Google App Engine: Using Static Files,
https://developers.google.com/appengine/docs/python/gettingstartedpython27/staticfiles
[8] http://bigtable.appspot.com.
[9] Preeti Gupta, Parveen Kumar and Anil Kumar Solanki, “A Comparative Analysis of
Minimum-Process Coordinated Checkpointing Algorithms for Mobile Distributed Systems”,
International Journal of Computer Engineering & Technology (IJCET), Volume 1, Issue 1,
2010, pp. 46 - 56, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.
[10] Nathwani Namrata, “Network Attached Storage Different from Traditional File Servers &
Implementation of Windows Based NAS”, International Journal of Computer Engineering &
Technology (IJCET), Volume 4, Issue 3, 2013, pp. 539 - 549, ISSN Print: 0976 – 6367,
ISSN Online: 0976 – 6375.
[11] Parveen Kumar and Poonam Gahlan, “A Minimum Process Synchronous Checkpointing
Algorithm for Mobile Distributed System”, International Journal of Computer Engineering &
Technology (IJCET), Volume 1, Issue 1, 2010, pp. 72 - 81, ISSN Print: 0976 – 6367,
ISSN Online: 0976 – 6375.
393