This document summarizes a dissertation on an improved load balancing technique for secure data in cloud computing. The dissertation discusses research issues in load balancing and data security in cloud computing. It proposes a load balancing methodology that uses a load balancer, Kerberos authentication, and Nginx load balancing algorithms like round robin and least connections to securely store and balance load of encrypted data across multiple cloud nodes. The methodology is implemented using tools like HP LoadRunner, Amazon Web Services, and Jelastic cloud platform. Performance is analyzed in terms of transaction time. The proposed technique aims to improve resource utilization, access control, data security, and efficiency in cloud environments.
1. Dissertation
on
“Improved Load Balancing technique for Secure data in Cloud”
Submitted By
“ Vrushali T. Lanjewar”
(M.E 2nd Year )
To
Prof. V. M. Thakare Prof. R. V. Dharaskar
(H.O.D) (Guide)
P.G Department of Computer Science & Engineering
Sant Gadge Baba Amravati University, 444604
3. Introduction
• Cloud computing
Cloud computing is an on demand service in
which shared resources, information, software and
other devices are provided according to the clients
requirement at specific time.
NIST defines the Cloud Computing architecture by
describing five essential characteristics, three
cloud services models and four cloud deployment
models (Cloud Security Alliance, 2009, p14)
4. Cloud Service Providers
Figure.1.1 Cloud Service Models; Visual model of NIST Working Definition of
Cloud Computing (Cloud Security Alliance, 2009, p14)
5. • Jelastic Cloud Platform- PaaS
•Amazon Web Services- IaaS
• Windows Azure- PaaS
• HP Cloud
• OpenShift by Red Hat
• Citrix Cloud Platform
•IBM –Bluemix IaaS, PaaS
•DigitalOcean IaaS, PaaS
6. 1. Public cloud: They gain the benefits of that pay-as-you-go
services so you only pay for what you use.
2. Private cloud:
(i) on-premise private clouds and (ii) externally hosted private clouds.
These chained-in, restrained cloud environments are protected behind
a firewall.
3. Hybrid cloud: Hybrid clouds are a combination of public and
private cloud .
4. Managed cloud: Managed clouds are provided by a designated
service provider and may offer either a dedicated or shared operating
environment.
Cloud Service Deployment and
Consumption Models
7. Load Balancing
• Load balancing is a relatively new technique that
facilitates networks and resources by providing a
maximum throughput with minimum response time.
• Dividing the traffic between servers, data can be
sent and received without major delay.
• Without load balancing, users could experience
delays, timeouts and possible long system responses.
8. Research Issues
• Unbalancing problem
• Load rebalancing problem
• Load balance problem and privacy-preserving
• Virtualization and authorization
• Data privacy and access control
• Data security
• Data Integrity
9. Problem Statement
• Load balancing is the issue in cloud
computing.
• The load imbalance problem may arise even
due to the failure of the node.
•Work load control is crucial to improve system
performance and maintain stability.
The problem work out here is to providing
security to the data stored in cloud.
10. Aim
The main aim of the proposed scheme includes resource
utilization, access control, data confidentiality, traceability
and efficiency in cloud environment .
• Cost effectiveness: Load balancing help to provide better
system performance at lower cost.
• Scalability and flexibility: The system for which load
balancing algorithms are implemented may be change in
size after some time. So the algorithm/model must handle
these types’ situations.
11. Objectives
• The main goal of load balancing is to achieve
route the requests among the web servers with a
minimum response time.
• To provides availability of data by overcoming
many existing problem like denial of services,
data leakage.
• To control over the continuous data updation and
also provide more flexibility and capability to
meet the new demand of today’s complex and
diverse network .
12. Existing Methodologies
Iterative Load Balancer:
• The problem of load balance for deploying ORAM-
based storage in clouds a tree-based ORAM structure,
and a set of storage servers, LB for deploying ORAM
in Cloud problem seeks a data placement such that
the maximum access load among all servers is
minimized.
• To overcome this difficulty, author propose a low-
complexity algorithm called ILB(Iterative Load
Balancer)to iteratively place buckets on servers, such
that only need to deal with a small-scale line are
programming in each iteration. [7]
13. Weighted round robin load balancing:
•(WRR) is a common routing policy offered in cloud load balancers.
However, there is a lack of effective mechanisms to decide the
weights assigned to each server to achieve overall optimal revenue of
the system.
•The relations between probabilistic routing (PR) and weighted round
robin (WRR) policies and introduce the result of the algorithms
under different number of users classes.
•The advantage of the heuristic algorithm is that it is independent of
the number of requests Nr for each class and it has been proved to
achieve an optimality ratio of 1+1/(M−1) under heavy load, thus it
gets closer to optimality with increasing number of VMs.[8]
14. Kerberos Model:
• To secure sensitive data Kerberos is used
for a user process protection method
based on a virtual machine monitor. The
basic set up of Kerberos protocol is as
shown.
The Kerberos server consists of an
Authentication Server (AS) and a Ticket
Granting Server (TGS). The AS and
TGS are responsible for creating and
issuing tickets to the clients upon
request. The AS and TGS usually run on
the same computer, and are collectively
known as the Key Distribution Center
(KDC). [25] Fig. Kerberos protocol
15. Proposed Methodology
In this architecture, a load balancer is used to split the file into
chunks in order to store the data in various nodes as shown in
fig .
When the server control performs operations on data like
deletion or updation load imbalance problem occurs.
This problem can be solved by the load balancer which
balances the load in the cloud after the above operations
performed.
The data to be stored in the cloud is encrypted before storage
for more security. The encryption is done by the key generated
at the client side. Then data is made into chunks and stored in
various nodes (using Kerberos authentication)
16. Multi-cloud feature provides the ability to achieve higher
availability through geo-distribution among different data centers or
clouds, easily relocate the projects to the superior hardware with the
help of environment migration, choose between higher quality or
more cost affordable hardware and host applications with the trusted
cloud vendors.
Fig. Proposed architecture of Cluster with multiple nodes for pubic cloud
17. Cloudlet
At Jelastic Platform, consumed by container resources are measured in
cloudlets a special measurement unit, which includes 128 MB of RAM and
400 MHz of CPU power simultaneously.
18. Cloudlet Types
• Reserved
• Dynamic
• Reserved Cloudlets these ones are reserved in
advance and will be charged irrespective of your
actual resource usage.
• Dynamic Cloudlets are added & removed
automatically according to the amount of
resources that are required by your application in
a particular moment of time.
19. Reasons to choose Software load
balancer
• Once upon a time, load balancing in most application stacks
was heavily dependent on hardware. More modern
virtualized and cloud-based infrastructures offer increased
agility and scalability at a lower cost, but are frequently
plagued by compromised performance.
• Features and functionality are meaningless if applications
can’t perform. And, for companies still relying on hardware-
based application delivery controllers (ADCs) or load
balancers, application performance and scalability could be
a serious issue.
• Software load balancers – like NGINX might be the best
news for the applications and business at large.
20. NGINX Web Server and
Load Balancer
• Ngnix is an free, open-source HTTP server characterized
by its small footprint, exceptional performance and
efficient use of resources.
• Ngnix is one of the web servers are already
widely used and increasingly compete with
the Apache web server.
• Round robin - default;
• Least connected - used when least number of active
connection.
• Load balancing with https enabled for websites which
enforces encryption to all connections including load
balancer.
21. Round Robin
• Round Robin works best when the characteristics of the servers and
requests are unlikely to cause some servers to become overloaded
relative to others. Some of the conditions are:
• All the servers have about the same capacity. This requirement is less
important if differences between servers are accurately represented by
server weights.
• All the servers host the same content.
• Requests are pretty similar in the amount of time or processing power
they require. If there’s a wide variation in request weight, a server can
become overloaded because the load balancer happens to send it a lot
of heavyweight requests in quick succession.
• Traffic volume is not heavy enough to push servers to near full
capacity very often. If servers are already heavily loaded, it’s more
likely that Round Robin’s rote distribution of requests will lead to
push some servers “over the edge” into overload as described in the
previous bullet.
22. • Given the following sample configuration of the backend upstream group,
the load balancer sends the first three connection requests
to web1, web2, and web3 in order, the fourth to web1, the fifth to web2,
and so on.
upstream backend {
server web1;
server web2;
server web3;
}
server {
server_name www.example.com;
location / {
proxy_pass http://backend;
}
}
23. Least Connections
• Least Connections also effectively distributes workload
across servers according to their capacity.
• A more powerful server fulfills requests more quickly,
so at any given moment it’s likely to have a smaller
number of connections still being processed (or even
waiting for processing to start) than a server with less
capacity.
• Least Connections sends each request to the server with
the smallest number of current connections, and so is
more likely to send requests to powerful servers.
24. upstream backend {
least_conn;
server web1;
server web2;
server web3;
}
server {
server_name www.example.com;
location / {
proxy_pass http://backend;
}
}
You configure it with the least_conn directive.
25. Kerberos provides data confidentiality,
authentication and integrity services
• Windows 2000, XP and Windows Server 2003 all
include the Kerberos extensions that can be used
to provide data confidentiality, authentication and
integrity for messages that are sent after the initial
Kerberos exchange.
• These extensions are known as the KRB_PRIV
(providing data confidentiality) and the
KRB_SAFE (providing data authentication and
integrity) Kerberos extensions.
26. Implementation
• The proposed a load balancing algorithm which will
transfer the load to another server in cloud when the
current server is overloaded. When the multiple request
are arrived to allocate the resource at server in cloud
environment, the server gets overloaded at some instance.
• Step1: Load Testing Module - HP LoadRunner tool will
behave as the client for the websites/ single or multiple
protocols HTTP, FTP, SMTP etc., web services, where we
can test, monitor, analyze as well as control the load
through real world transactions of data.
- User Scripts: The actions that a Vuser performs during the
scenario are described in a Vuser script. When a scenario
is executed, each Vuser executes a Vuser script.
27. - Transactions: To measure the performance of
the server, transactions are defined.
Transactions measure the time that it takes for
the server to respond to tasks submitted by
Vuser.
- Controller: LoadRunner Controller is used to
manage and maintain scenarios.
- Hosts: When you execute a scenario, the
LoadRunner Controller distributes each Vuser
in the scenario to a host.
28. • Step 2: Authentication Module - Kerberos enabled in
HP LoadRunner as well as in browser .
• First phase which is known as Data classification is
done by client before storing the data.
• According to the concept of user who wants to access
the data need to be authenticated, to avoid
impersonation and data leakage.
• Now there is third entity who is (whose data is stored)
customer who want to access, they need to register first
and then before every access to data, his/her identity is
authenticated for authorization.
29. • Step 3: Ngnix Load Balancer in public
Cloud –
• Least Connection and Round Robin Load
balancing algorithm will be developed using
Java. The algorithm makes use of combining
the logic of least connections present, and
fastest response time.
• Monitoring agents are used to look at the
current activities, and load.
• Depending upon the instance nature, that
instance is being called, which returns the
results.
30. • Step4: For web application- An web application
for Load Balancing Created and deployed in
Cloud which will be used in HP LoadRunner to
check performance parameters.
• For web service- Amazon Web Services will be
used to create EC2 instances. EC2 instances will
be created where the load balancer developed in
step2 and the web service will be deployed in
step1
• Step 5 :For web application - Analysis tool of
LoadRunner gives the performance analysis of
system under test
• For Web service- LoadRunner tool will be used to
increase the load on the EC2 instances, and will
be using to compare the results from the two-load
balancer depending on request/response sent.
44. SLA status at set time intervals over timeline within run.
45. • Results of Load Balancing
Efficient provisioning of resources and scheduling
of resources as well as tasks will ensure:
1. Resources are easily available on demand.
2. Resources are efficiently utilized under
condition of high/low load.
3. Energy is saved in case of low load (i.e. when
usage of cloud resources is below certain
threshold).
4. Cost of using resources is reduced.
46. Tools
SOFTWARE REQUIREMENTS:
• Operating System: Windows XP and other
higher versions
• Programming Language : JAVA
• Database : MySQL , MongoDB etc.
• IDE : Netbeans 8.1
• Load Testing tool : HP LoadRunner 12.53
• Web development tool: WinginX
• Cloud Platform : Jelastic PAAS,
• Amazon IAAS
47. Data Management tools -
Redis
• Redis is an open source , in-memory data structure
store, used as a database, cache and message
broker.
• It supports data structures such as strings, hashes,
lists, sets, sorted sets with range queries, bitmaps,
hyperlog logs and geospatial indexes with radius
queries.
• Redis has built-in replication, and provides high
availability via Redis Sentinel and automatic
partitioning with Redis Cluster.
48. Memcached
• Free & open source, high-performance, distributed
memory object caching system, generic in nature,
but intended for use in speeding up dynamic web
applications by alleviating database load.
• Memcached is an in-memory key-value store for
small chunks of arbitrary data (strings, objects) from
results of database calls, API calls, or page
rendering.
49. Advantages
• High Availability & Health Checks
• Session Persistence & Routing : session
persistence including cookie insertion and
sticky routes.
• Easily manage traffic for optimal performance
without disrupting the user experience.
50. Limitations
• Here Node capacity details are not identified
here.
• More amount of payment is required for
utilization of additional resources.
• Workload control is difficult to handle
sometimes.
• It is not stability network.
51. Conclusion
• For better load balancing multiple machines are used
where load is distributed simultaneously when load
generated. The proposed method not only processes
more transactions, it also reduces or does not change
the average time.
• Not only the proposed method has a better tps
(transaction per sec), it has better scalability too as
seen from the performance results.
52. Future Scope
• The proposed load balancing technique uses the concept of
static weights. It can be extended to dynamic weights where
weights are assigned to the server dynamically based on the
server performance.
• The server, which is performing, better is assigned higher
weight as compared to less performing server. This can be
achieved by adding monitoring agents on the each server,
which monitors the performance.
54. [9] Zhipeng Gao; Dangpeng Liu; Yang Yang; JingchenZheng; YuwenHao, "A Load Balance Algorithm Based
On Nodes Performance In Hadoop Cluster," in Network Operations and Management Symposium
(APNOMS), 2014 16th Asia-Pacific,Vol.,No., pp.1-4, 17-19 Sept.2014
[10] Taeho Jung, Xiang-Yang Li, Senior Member, IEEE, Zhiguo Wan, and Meng Wan, "Control Cloud Data
Access Privilege and Anonymity With Fully Anonymous Attribute-Based Encryption", IEEE
TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY,pp.190- 199 vol.10, no.1,
JANUARY 2015
[11] Xinhua Dong; Ruixuan Li; Heng He; Wanwan Zhou; Zhengyuan Xue; Hao Wu, "Secure Sensitive Data
Sharing On a Big Data Platform", Tsinghua Science and Technology published in IEEE,
vol.20,no.1,pp.72-80,Feb.2015 DOI: 10.1109/ TST. 2015.7040516
[12] W. Teng; G. Yang; Y. Xiang; T. Zhang; D. Wang, "Attribute-based Access Control with Constant-size
Ciphertext in Cloud Computing," in IEEE Transactions on Cloud Computing , vol.PP, no.99, pp.1-1, 02
June 2015,doi: 10.1109/TCC.2015.2440247
[13] J. Li; W. Yao; Y. Zhang; H. Qian; J. Han, "Flexible and Fine-Grained Attribute-Based Data Storage in
Cloud Computing," in IEEE Transactions on Services Computing , vol. PP, no.99, pp.1-1, 22 January
2016, doi:10.1109/TSC.2016.2520932
[14] V. Chang and M. Ramachandran, "Towards Achieving Data Security with the Cloud Computing
Adoption Framework," in IEEE Transactions on Services Computing, vol.9, no.1, pp.138-151, Jan.-Feb.1
2016,doi:10.1109/TSC.2015.2491281
[15] Jia Zhao, Kun Yang, Xiaohui Wei, Yan Ding, Liang Hu, Gaochao Xu, "A Heuristic Clustering-Based
Task Deployment Approach for Load Balancing Using Bayes Theorem in Cloud Environment", IEEE
Transactions on Parallel & Distributed Systems, vol.27, no. 2, pp. 305-316, Feb. 2016,
[16] H.Wang; D. He; S. Tang, "Identity-Based Proxy-Oriented Data Uploading and Remote Data Integrity
Checking in Public Cloud," in IEEE Transactions on Information Forensics and Security ,vol.PP,no.99,pp.1-1,