SlideShare a Scribd company logo
1 of 11
Download to read offline
Big Data
Taufiq Hail Ghilan Al-Madhagy
6/7/2013
1. Introduction
The world is changing, and this is the digital ear. Almost everything around us is digitized and the flow of
information is huge from variety of sources ranging from mobile phone, smart devices, surveillance, sensors
of the universe, weather forecasting sensors, medical equipments, customers transactions of the internet, user
behaviors on the internet, and so on. This creates huge amount of data that have the sizes of terabytes to
petabytes and are on daily or weekly bases transactions. This data is called “Big Data” that provoked new
researches on the area of information analysis, structuring, and visualizing. One of the most dominant
successful methods to gain insight of this Big Data is Hadoop which the pioneers of data base management
systems are adopting with some added tools to deal with it and to gain valuable information from this data to
better understand it and consequently take proper actions and decisions based on this understanding. In the
following essay we are going to dig further on the definition of Big Data, the types and the benefits of it, the
challenges surrounds it, the techniques that are used so far to solve the challenges. Big Data is now becoming
the talk not of town, but the talk of the IT market and scientists which needs not only few pages to cover but
a PhD research to find better ways to solve the problem which is increasing day by day in the digitized era as
more digitized devices infiltrate in our daily life.
2. Types of big data
The Big Data has varying definitions; some define it as it is the greater volume of today’s data, the new
types of data and analysis, or the emerging requirements for more real-time information analysis1
. Others
argue that the “Big” is not a certain amount of data that could be predicted. The big nowadays may become
tomorrow a small, but at the end we can say according to some majority of researchers that the amounts of
data between Terabytes to Petabytes are considered as Big Data. Although this value may change over time
to take bigger numbers. Big Data generates value from the storage and processing of very large quantities of
digital information that cannot be analyzed with traditional computing techniques [1]. The Big Data has
variety of types that are classified as structured, unstructured data, text and multimedia [2]. The big data
types could be social media, web and software logs, cameras pictures and other log info, information-sensing
mobile devices, aerial sensory technologies, genomics, and medical records [6].
1
IBM Executive Report, reference 2.
3. Benefits of big data
Many companies are looking at the Big Data as a source for better understanding and a facility to
predict the customer behaviors, and thus improve the customer experiences. The social media, the
transactions of different sources banks and others, the syndicated data through sources such as loyalty cards,
and other customer related information gave a valuable information for the companies to predict the
customer’s preferences and needs, in other words, building a long term businesses and customer services for
decades. By having this understanding, organizations of all types are finding new ways to connect with
existing and potential customers. This approach applies to small and enterprises such as in
telecommunications, healthcare, government, and banking and in business-to-business interactions among
partners and suppliers.
Many benefits of the Big Data include customer-centric objectives, and many functional objectives that are
being addressed through early applications of big data. Operational optimization, risk and financial
management, employee collaboration and enabling new business models are some of the benefits for both
the customer himself and the producer.
A released report on May 2011 by McKinsey says that leading companies are using big data analytics to gain
competitive advantage. Those companies forecast that a 60% margin increase for retail companies who are
able to harvest the power of big data [6]2
. Those companies perceived the importance of these huge amount
of data and released that it is now the time to take advantage of it [6].
4. Challenges in big data
Doug Laney is the first who developed the model of the Big Data that is described by the “3 Vs”, namely
the volume, velocity and variety. On the other hand, IBM added another V which is the veracity. Inclusion of
veracity as the fourth big data attribute emphasizes the importance of addressing and managing for the
uncertainty inherent within some types of data according to IBM [2]. I should here add that some researchers
call the three “Vs” as the “3Ss”, where the first is the source, as the variety, the second is the speed, as the
velocity, and the last is the size, which refers to the volume. In the following lines we are going to describe
the three challenges of the Big Data.
1. Volume
The huge amount of data that ranges between terabytes and petabytes is the main challenge that faces the
Big Data. Volume refers to the mass quantities of data that organizations are trying to exploit to improve
2
A document from Oracle.
decision-making across the enterprise. Data volumes continue to increase at an unprecedented rate according
to [2]. The traditional hardware and the relational database processing are incapable of handling many tasks
required by the Big Data. These tasks including modeling of climate on earth, predicting the weather
forecast, receiving and analyzing huge amount of data collected in hospitals of patients, diagnosing diseases,
gathering information from the galaxy, and so on.
2. Velocity
The amount of data flowing in every day to any enterprise is exponentially increasing beyond the
traditional Systems of storing and processing. Also the speed of the creation of data, processing and
analyzing it continues to progress at very high speed and therefore the data is always in motion since its
creation to it processing phase to the storage and retrieving phase [2].
It is also known that data streaming is becoming an essential of any internet activity for almost every user
of the systems nowadays even in mobile devices such as mobile phone or tablets. Nowadays, data is
continuously generated at a pace that is impossible for traditional systems to capture, store and analyze. The
online video, location tracking using GPS, or augmented reality among many applications depends on large
amounts of fast data streaming [1].
These services becoming a challenge for many organizations that needs to use new methods of delivering
these services in which the conventional methods are not suitable.
For time-sensitive processes such as real-time fraud detection or multi-channel “instant” marketing, certain
types of data must be analyzed in real time to be used in business decisions that gives the value for the
business to improve and elevate. We can say if there is velocity we should talk about the latency from which
the data is created till is accessed and analyzed [2].
3. Variety
It is simply referred to the different types of data and data sources. The data that is stored and processed
everyday has a variety of types. In the past the data that had to be processed were personal documents,
financial transactions, stock records, and so on. In the present, we have audio, video, graphics, 3D models,
location data and many complex data that needs to be stored, delivered, or processed. These unstructured Big
Data are therefore not easy to categorize with traditional methods of dealing with huge amounts of data. All
of these data are in reality messy and needs cleansing before any analysis to be applied [1].
We can simply say that variety is about managing the complexity of multiple data types, i.e. structured,
semi-structured and unstructured data. Organizations need to integrate and analyze data from both traditional
and non-traditional information sources, from within and outside the enterprise. With the expansion of using
sensors, smart phones and social collaboration technologies, data is generated in a variety of forms,
including: text, web data, tweets, sensor data, audio, video, click streams, log files and more as discussed in
the report from IBM [2].
4. Veracity
It refers to the data uncertainty and the level of reliability associated with certain types of data. One of the
critical requirements of Big Data is to have the quality, on the other hand, the available tools to purify the
data from its inherited unpredictability is not possible some examples like weather forecast, finance,
customer attitudes to buy, and so on [2]. In many organizations there are huge piles of data and in many
cases the managers themselves cannot trust the analysis of these data and this uncertainty is very important
for the Big Data to be understood from those managers to enable them to take the proper decisions in this
continual changing environment. Opportunities to use big data technology and analytics to improve decision-
making and performance exist in every industry and managers should be aware of these capabilities. We can
take example of the uncertainty of the Big Data in generating energy from natural resources. The amount of
data generated about the wind is huge, but still we cannot predict the full picture precisely as we cannot
predict the behavior of the weather, the winds and clouds. Despite that, there are still big amount of data that
can be valuable and useful to base decisions for future power production. So how do you plan if all these
uncertainties are in place? Analysts say through data fusion in which combining multiple less reliable
sources to create a more useful data point. An example would be the social comments appended to geospatial
location information. The other way to manage uncertainty is using the advanced mathematics such as fuzzy
logic and robust optimization techniques.
5. Technique/approach to overcome the challenges
The three “Vs”, namely volume, velocity and variety, are the main challenges for the Big Data and there is
a requirement for new technology away from the traditional methods used my Rational Database Systems
used today to overcome these challenges. One of the approaches used to overcome the issues of Big Data is
the Hadoop project which is an open source from Apache that was developed with software libraries which
provide reliability, scalability, and distributed system computing. This technology is able to handle the Big
Data processing and analytics. It is worth mentioning that Hadoop is widely used at large scale of most Big
Data pioneers such as LinkedIn that generates over 100 billion personalized recommendations every week as
mentioned in the source [1], and others like twitter as well.
To dig into further of the mechanism used with Hadoop, I am going to use simple explanation as follows.
The large data set are fragmented or divided into smaller sets, then it is scattered across cluster of servers to
do the computation using simple programming method. The number of servers may range from few
hundreds to around 2000 thousands or maybe more. The new thing with this computation method is that
Hadoop detects and compensates for any hardware failure at the application level whether the traditional
method depends on expensive servers. This guarantees the continuity of the services delivered in case of any
server failure in any of the clusters. In this case we distributed the computing capabilities among servers of
the mass data in a low-cost and effective way [1].
The two key elements of Hadoop are the Hadoop Distributed File System, HDFS, and the MapReduce.
The first allows for high bandwidth and the cluster based storage needed by Big Data processing. The second
is the data processing framework. The MapReduce is based on Google’s search technology that maps large
data sets across the cluster servers. The overall data set is processed in parts with each server and each server
is doing his part and then from this it creates a summary. All the summaries are aggregated to the “Reduce
stage”. In this way the data is pre-processed before applying traditional data analysis tools [3].
Let’s walk through the technical side a little bit. In the following illustration, we can say that Hadoop
consists of two parts, namely the HDFS and the MapReduce. The lower part layer consists of the name node
which stores the metadata or the info about the smaller actual data that are processed in the Data node. In the
higher layer there is the job tracker who decides what piece of data will run and where. The final part is the
task tracker, which runs the code [4].
Figure 1
Let us see the differences between the conventional way of processing data and the way it
Hadoop, the MapReduce. The following table shows the differences in terms of access, updates, structure,
integrity, and scaling.
We notice that the data is always moving and dynamic in the MapReduce and writing is discouraged,
that the data can scale to higher volumes.
Microsoft has adopted Hadoop with some modification to make it easy and user friendly interface and
added some connectors to it to make Microsoft
with Big Data are Powerview, PowerPivot in Excel, and sharepoint
Figure 2
Let us see the differences between the conventional way of processing data and the way it
Hadoop, the MapReduce. The following table shows the differences in terms of access, updates, structure,
We notice that the data is always moving and dynamic in the MapReduce and writing is discouraged,
that the data can scale to higher volumes.
Microsoft has adopted Hadoop with some modification to make it easy and user friendly interface and
added some connectors to it to make Microsoft-like product. Some of the tools that Microsoft uses to deal
th Big Data are Powerview, PowerPivot in Excel, and sharepoint [5].
Let us see the differences between the conventional way of processing data and the way it is used by
Hadoop, the MapReduce. The following table shows the differences in terms of access, updates, structure,
We notice that the data is always moving and dynamic in the MapReduce and writing is discouraged, and
Microsoft has adopted Hadoop with some modification to make it easy and user friendly interface and
like product. Some of the tools that Microsoft uses to deal
These are some of the tools that are
addition, Microsoft is implementing Hadoop on windows Azure and windows server as well. It created
JavaScript libraries and frame work for Hadoop and accomplished partnership with Hortonworks
Microsoft provided ODBC drivers and hive add
enable 3rd
party applications to be able to integrate with Hadoop on windows systems [4].
Microsoft is providing, in the following illustration
Figure 3
As it is shown in the diagram above, the data maybe structured data, (ERP, CRM, LOB, APPS), or
unstructured of different sources, (Sensors, Devices, Bots, Crawlers). It is stored in Enterprise data
Warehouse, if it is structured, or to be moved to the uppe
platform, Windows server or Azure. It is then processed using SQL Server Analysis Service or SQL Server
Reporting Service on the Business Intelligent platform, to be analyzed and to gain insight of all this mi
huge Data. At the end, the output is visualized by Excel PowerPivot, Power View, Predictive analytic tools,
or Embedded BI tools which all are Microsoft tools that the user is familiar with
3
The source is from Microsoft, reference [5].
ese are some of the tools that are usually used with BI to gain insight of the structured Data. In
addition, Microsoft is implementing Hadoop on windows Azure and windows server as well. It created
JavaScript libraries and frame work for Hadoop and accomplished partnership with Hortonworks
Microsoft provided ODBC drivers and hive add-in for excel to deal with Big Data. The ODBC drivers
party applications to be able to integrate with Hadoop on windows systems [4].
the following illustration, solution for the Big Data.
As it is shown in the diagram above, the data maybe structured data, (ERP, CRM, LOB, APPS), or
unstructured of different sources, (Sensors, Devices, Bots, Crawlers). It is stored in Enterprise data
Warehouse, if it is structured, or to be moved to the upper layer to be processed with Hadoop on windows
platform, Windows server or Azure. It is then processed using SQL Server Analysis Service or SQL Server
Reporting Service on the Business Intelligent platform, to be analyzed and to gain insight of all this mi
huge Data. At the end, the output is visualized by Excel PowerPivot, Power View, Predictive analytic tools,
BI tools which all are Microsoft tools that the user is familiar with3
.
from Microsoft, reference [5].
usually used with BI to gain insight of the structured Data. In
addition, Microsoft is implementing Hadoop on windows Azure and windows server as well. It created
JavaScript libraries and frame work for Hadoop and accomplished partnership with Hortonworks. Moreover,
in for excel to deal with Big Data. The ODBC drivers
party applications to be able to integrate with Hadoop on windows systems [4].
As it is shown in the diagram above, the data maybe structured data, (ERP, CRM, LOB, APPS), or
unstructured of different sources, (Sensors, Devices, Bots, Crawlers). It is stored in Enterprise data
r layer to be processed with Hadoop on windows
platform, Windows server or Azure. It is then processed using SQL Server Analysis Service or SQL Server
Reporting Service on the Business Intelligent platform, to be analyzed and to gain insight of all this mixed
huge Data. At the end, the output is visualized by Excel PowerPivot, Power View, Predictive analytic tools,
Oracle is also among the pioneers that are developing methods to solve the Big Data issue. They have
developed Oracle Big Data Connectors, Oracle Loader for Hadoop, Oracle Data Integrator [6]. In addition
some statistical and analysis capabilities like Open Source Project R and Oracle R Enterprise are developed
to take advantage of Hadoop capabilities. Oracle looks to the traditional data and created the tools to
facilitate understanding it and gain insight. In the following is traditional data from Oracle perspective.
Figure 4
Oracle added new mechanism using the Hadoop technology and their preparatory analytics and BI tools to
deal with Big Data. See the following figure.
Figure 5
Many of Big Data pioneers deploy the old and new data in parallel that is using the Hadoop alongside
with the traditional way. It is also expected that Hadoop will replace other data processing methods and be
the dominant solution for Big Data.
Big Data will progress as artificial intelligence advances, and as new types of computer processing power
become available such as quantum computing which uses quantum mechanical states and is expected to
excel theoretically the parallel processing of unstructured data [3]. There are other technologies used in big
data include massively parallel-processing (MPP) databases, search-based applications, data-mining grids,
distributed file systems , distributed databases, cloud based infrastructure. Almost all of these technologies
are not new but there are enhancements in using it is used with Big Data.
The Big Data requires a high speed transaction, analysis, and retrieving of data therefore it needs such high
capacity hard-drives as SATA drives and/or high speed storage disks such as the Solid State disks, SSD,
which are memory- based hard disks. These storage systems are inside the parallel processing nodes used
with Big Data.
6. Conclusion
In the past decade, the information became a dominant factor in our daily life. Everything surrounding use
is digitized and the data kept progressing and moving all the time. The huge amounts of data that is
continuously moving and changing became unpredictable and not easy to understand as it is not organized in
a way that we can take the benefit of. The main interest in the past for companies is to take whatever
information about customer behavior, or take as much data as the medical equipment can take, or collect as
much information from the galaxy as the sensor can take but we come to the question “what are we going to
do with all these piles of data?” Now we come to an era that companies need to take advantage of all these
data in a way that we can take the insight and the value of it. Hadoop is now the major player that Google
started to build it algorithm in 2004 with its open source. The pioneers of dealing and processing the
databases are now in fast race to adopt and integrate their preparatory analytical and BI tools such as
Microsoft and Oracle or IBM. The race is still continuing and the core of all these development is using the
parallel processing using Hadoop.
We do not know yet what the future is hiding for dealing with the Big Data, is it going to be solved using
the new processing algorithms or new hardware adoption using the latest technologies. Is it going to be the
issue that comes in front of the queue before the cloud computing? Is the artificial intelligence going to play
any role in developing a dynamic algorithm that can cope with the dynamic fast moving data? The question
is widely open and future is expected to bring more to us. What’s important is that the key information
architecture principles are the same, but the tactics of applying these principles differ from one company to
another. We should look to Big Data as an asset that will bring better future for use if we perfectly gain the
insight of it.
7. References
[1]http://www.explainingcomputers.com. (n.d.). Retrieved June 2, 2013, from
http://www.explainingcomputers.com: ] http://www.explainingcomputers.com/big_data.html
[2](2013). Retrieved June 4, 2013, from http://public.dhe.ibm.com/:
http://public.dhe.ibm.com/common/ssi/ecm/en/gbe03519usen/GBE03519USEN.PDF
[3]http://en.wikipedia.org. (2013). Retrieved May 29, 2013, from http://en.wikipedia.org:
http://en.wikipedia.org/wiki/Big_data
[4]https://www.youtube.com. (n.d.). Retrieved May 29, 2013, from https://www.youtube.com:
https://www.youtube.com/watch?v=HM0YX7mpplk
[5]http://download.microsoft.com/download/F/A/1/FA126D6D-841B-4565-BB26-
D2ADD4A28F24/Microsoft_Big_Data_Solution_Brief.pdf
[6]http://www.oracle.com. (n.d.). Retrieved June 4, 2013, from
http://www.oracle.com/technetwork/topics/entarch/articles/oea-big-data-guide-1522052.pdf

More Related Content

What's hot

Applications of Big Data Analytics in Businesses
Applications of Big Data Analytics in BusinessesApplications of Big Data Analytics in Businesses
Applications of Big Data Analytics in BusinessesT.S. Lim
 
Analysis on big data concepts and applications
Analysis on big data concepts and applicationsAnalysis on big data concepts and applications
Analysis on big data concepts and applicationsIJARIIT
 
Simplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessSimplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessTeradata Aster
 
Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...
Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...
Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...Simplilearn
 
Big Data Analytics: Recent Achievements and New Challenges
Big Data Analytics: Recent Achievements and New ChallengesBig Data Analytics: Recent Achievements and New Challenges
Big Data Analytics: Recent Achievements and New ChallengesEditor IJCATR
 
Practical analytics john enoch white paper
Practical analytics john enoch white paperPractical analytics john enoch white paper
Practical analytics john enoch white paperJohn Enoch
 
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest MindsWhitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest MindsHappiest Minds Technologies
 
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Edureka!
 
Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...
Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...
Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...Taniya Fansupkar
 
Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data DATAVERSITY
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAkshata Humbe
 
Big data - a review (2013 4)
Big data - a review (2013 4)Big data - a review (2013 4)
Big data - a review (2013 4)Sonu Gupta
 
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate Oomph! Recruitment
 
BIG Data and Methodology-A review
BIG Data and Methodology-A reviewBIG Data and Methodology-A review
BIG Data and Methodology-A reviewShilpa Soi
 
Nuestar "Big Data Cloud" Major Data Center Technology nuestarmobilemarketing...
Nuestar "Big Data Cloud" Major Data Center Technology  nuestarmobilemarketing...Nuestar "Big Data Cloud" Major Data Center Technology  nuestarmobilemarketing...
Nuestar "Big Data Cloud" Major Data Center Technology nuestarmobilemarketing...IT Support Engineer
 

What's hot (20)

Applications of Big Data Analytics in Businesses
Applications of Big Data Analytics in BusinessesApplications of Big Data Analytics in Businesses
Applications of Big Data Analytics in Businesses
 
Analysis on big data concepts and applications
Analysis on big data concepts and applicationsAnalysis on big data concepts and applications
Analysis on big data concepts and applications
 
1
11
1
 
Simplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessSimplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the Business
 
Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...
Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...
Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...
 
Big Data Analytics: Recent Achievements and New Challenges
Big Data Analytics: Recent Achievements and New ChallengesBig Data Analytics: Recent Achievements and New Challenges
Big Data Analytics: Recent Achievements and New Challenges
 
Bigdata
BigdataBigdata
Bigdata
 
Practical analytics john enoch white paper
Practical analytics john enoch white paperPractical analytics john enoch white paper
Practical analytics john enoch white paper
 
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest MindsWhitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
 
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
 
Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...
Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...
Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
IT FUTURE- Big data
IT FUTURE- Big dataIT FUTURE- Big data
IT FUTURE- Big data
 
Big data - a review (2013 4)
Big data - a review (2013 4)Big data - a review (2013 4)
Big data - a review (2013 4)
 
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
 
BIG Data and Methodology-A review
BIG Data and Methodology-A reviewBIG Data and Methodology-A review
BIG Data and Methodology-A review
 
Nuestar "Big Data Cloud" Major Data Center Technology nuestarmobilemarketing...
Nuestar "Big Data Cloud" Major Data Center Technology  nuestarmobilemarketing...Nuestar "Big Data Cloud" Major Data Center Technology  nuestarmobilemarketing...
Nuestar "Big Data Cloud" Major Data Center Technology nuestarmobilemarketing...
 
The ABCs of Big Data
The ABCs of Big DataThe ABCs of Big Data
The ABCs of Big Data
 

Viewers also liked (7)

ICT policy development in yemen
ICT policy development in yemenICT policy development in yemen
ICT policy development in yemen
 
Sql server2008
Sql server2008Sql server2008
Sql server2008
 
Overview on computer games
Overview on computer games Overview on computer games
Overview on computer games
 
Ict policy yemen
Ict policy yemenIct policy yemen
Ict policy yemen
 
E waste is the impact of technology advancement
E waste is the impact of technology advancementE waste is the impact of technology advancement
E waste is the impact of technology advancement
 
Performance measures
Performance measuresPerformance measures
Performance measures
 
Hardware software comparisom
Hardware software comparisomHardware software comparisom
Hardware software comparisom
 

Similar to Big data

Analysis of Big Data
Analysis of Big DataAnalysis of Big Data
Analysis of Big DataIRJET Journal
 
Introduction to big data – convergences.
Introduction to big data – convergences.Introduction to big data – convergences.
Introduction to big data – convergences.saranya270513
 
Know The What, Why, and How of Big Data_.pdf
Know The What, Why, and How of Big Data_.pdfKnow The What, Why, and How of Big Data_.pdf
Know The What, Why, and How of Big Data_.pdfAnil
 
What Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdfWhat Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdfPridesys IT Ltd.
 
What Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdfWhat Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdfPridesys IT Ltd.
 
Implementation of application for huge data file transfer
Implementation of application for huge data file transferImplementation of application for huge data file transfer
Implementation of application for huge data file transferijwmn
 
Big data analytics and its impact on internet users
Big data analytics and its impact on internet usersBig data analytics and its impact on internet users
Big data analytics and its impact on internet usersStruggler Ever
 
IRJET- Scope of Big Data Analytics in Industrial Domain
IRJET- Scope of Big Data Analytics in Industrial DomainIRJET- Scope of Big Data Analytics in Industrial Domain
IRJET- Scope of Big Data Analytics in Industrial DomainIRJET Journal
 
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...IJSCAI Journal
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...gerogepatton
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...ijscai
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...gerogepatton
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...gerogepatton
 

Similar to Big data (20)

Analysis of Big Data
Analysis of Big DataAnalysis of Big Data
Analysis of Big Data
 
big-data.pdf
big-data.pdfbig-data.pdf
big-data.pdf
 
Introduction to big data – convergences.
Introduction to big data – convergences.Introduction to big data – convergences.
Introduction to big data – convergences.
 
Know The What, Why, and How of Big Data_.pdf
Know The What, Why, and How of Big Data_.pdfKnow The What, Why, and How of Big Data_.pdf
Know The What, Why, and How of Big Data_.pdf
 
Unit III.pdf
Unit III.pdfUnit III.pdf
Unit III.pdf
 
new.pptx
new.pptxnew.pptx
new.pptx
 
Big data upload
Big data uploadBig data upload
Big data upload
 
What Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdfWhat Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdf
 
Unit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdfUnit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdf
 
Unit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdfUnit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdf
 
Bidata
BidataBidata
Bidata
 
What Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdfWhat Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdf
 
Implementation of application for huge data file transfer
Implementation of application for huge data file transferImplementation of application for huge data file transfer
Implementation of application for huge data file transfer
 
Big data analytics and its impact on internet users
Big data analytics and its impact on internet usersBig data analytics and its impact on internet users
Big data analytics and its impact on internet users
 
IRJET- Scope of Big Data Analytics in Industrial Domain
IRJET- Scope of Big Data Analytics in Industrial DomainIRJET- Scope of Big Data Analytics in Industrial Domain
IRJET- Scope of Big Data Analytics in Industrial Domain
 
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 

Recently uploaded

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Big data

  • 1. Big Data Taufiq Hail Ghilan Al-Madhagy 6/7/2013
  • 2. 1. Introduction The world is changing, and this is the digital ear. Almost everything around us is digitized and the flow of information is huge from variety of sources ranging from mobile phone, smart devices, surveillance, sensors of the universe, weather forecasting sensors, medical equipments, customers transactions of the internet, user behaviors on the internet, and so on. This creates huge amount of data that have the sizes of terabytes to petabytes and are on daily or weekly bases transactions. This data is called “Big Data” that provoked new researches on the area of information analysis, structuring, and visualizing. One of the most dominant successful methods to gain insight of this Big Data is Hadoop which the pioneers of data base management systems are adopting with some added tools to deal with it and to gain valuable information from this data to better understand it and consequently take proper actions and decisions based on this understanding. In the following essay we are going to dig further on the definition of Big Data, the types and the benefits of it, the challenges surrounds it, the techniques that are used so far to solve the challenges. Big Data is now becoming the talk not of town, but the talk of the IT market and scientists which needs not only few pages to cover but a PhD research to find better ways to solve the problem which is increasing day by day in the digitized era as more digitized devices infiltrate in our daily life. 2. Types of big data The Big Data has varying definitions; some define it as it is the greater volume of today’s data, the new types of data and analysis, or the emerging requirements for more real-time information analysis1 . Others argue that the “Big” is not a certain amount of data that could be predicted. The big nowadays may become tomorrow a small, but at the end we can say according to some majority of researchers that the amounts of data between Terabytes to Petabytes are considered as Big Data. Although this value may change over time to take bigger numbers. Big Data generates value from the storage and processing of very large quantities of digital information that cannot be analyzed with traditional computing techniques [1]. The Big Data has variety of types that are classified as structured, unstructured data, text and multimedia [2]. The big data types could be social media, web and software logs, cameras pictures and other log info, information-sensing mobile devices, aerial sensory technologies, genomics, and medical records [6]. 1 IBM Executive Report, reference 2.
  • 3. 3. Benefits of big data Many companies are looking at the Big Data as a source for better understanding and a facility to predict the customer behaviors, and thus improve the customer experiences. The social media, the transactions of different sources banks and others, the syndicated data through sources such as loyalty cards, and other customer related information gave a valuable information for the companies to predict the customer’s preferences and needs, in other words, building a long term businesses and customer services for decades. By having this understanding, organizations of all types are finding new ways to connect with existing and potential customers. This approach applies to small and enterprises such as in telecommunications, healthcare, government, and banking and in business-to-business interactions among partners and suppliers. Many benefits of the Big Data include customer-centric objectives, and many functional objectives that are being addressed through early applications of big data. Operational optimization, risk and financial management, employee collaboration and enabling new business models are some of the benefits for both the customer himself and the producer. A released report on May 2011 by McKinsey says that leading companies are using big data analytics to gain competitive advantage. Those companies forecast that a 60% margin increase for retail companies who are able to harvest the power of big data [6]2 . Those companies perceived the importance of these huge amount of data and released that it is now the time to take advantage of it [6]. 4. Challenges in big data Doug Laney is the first who developed the model of the Big Data that is described by the “3 Vs”, namely the volume, velocity and variety. On the other hand, IBM added another V which is the veracity. Inclusion of veracity as the fourth big data attribute emphasizes the importance of addressing and managing for the uncertainty inherent within some types of data according to IBM [2]. I should here add that some researchers call the three “Vs” as the “3Ss”, where the first is the source, as the variety, the second is the speed, as the velocity, and the last is the size, which refers to the volume. In the following lines we are going to describe the three challenges of the Big Data. 1. Volume The huge amount of data that ranges between terabytes and petabytes is the main challenge that faces the Big Data. Volume refers to the mass quantities of data that organizations are trying to exploit to improve 2 A document from Oracle.
  • 4. decision-making across the enterprise. Data volumes continue to increase at an unprecedented rate according to [2]. The traditional hardware and the relational database processing are incapable of handling many tasks required by the Big Data. These tasks including modeling of climate on earth, predicting the weather forecast, receiving and analyzing huge amount of data collected in hospitals of patients, diagnosing diseases, gathering information from the galaxy, and so on. 2. Velocity The amount of data flowing in every day to any enterprise is exponentially increasing beyond the traditional Systems of storing and processing. Also the speed of the creation of data, processing and analyzing it continues to progress at very high speed and therefore the data is always in motion since its creation to it processing phase to the storage and retrieving phase [2]. It is also known that data streaming is becoming an essential of any internet activity for almost every user of the systems nowadays even in mobile devices such as mobile phone or tablets. Nowadays, data is continuously generated at a pace that is impossible for traditional systems to capture, store and analyze. The online video, location tracking using GPS, or augmented reality among many applications depends on large amounts of fast data streaming [1]. These services becoming a challenge for many organizations that needs to use new methods of delivering these services in which the conventional methods are not suitable. For time-sensitive processes such as real-time fraud detection or multi-channel “instant” marketing, certain types of data must be analyzed in real time to be used in business decisions that gives the value for the business to improve and elevate. We can say if there is velocity we should talk about the latency from which the data is created till is accessed and analyzed [2]. 3. Variety It is simply referred to the different types of data and data sources. The data that is stored and processed everyday has a variety of types. In the past the data that had to be processed were personal documents, financial transactions, stock records, and so on. In the present, we have audio, video, graphics, 3D models, location data and many complex data that needs to be stored, delivered, or processed. These unstructured Big Data are therefore not easy to categorize with traditional methods of dealing with huge amounts of data. All of these data are in reality messy and needs cleansing before any analysis to be applied [1]. We can simply say that variety is about managing the complexity of multiple data types, i.e. structured, semi-structured and unstructured data. Organizations need to integrate and analyze data from both traditional and non-traditional information sources, from within and outside the enterprise. With the expansion of using sensors, smart phones and social collaboration technologies, data is generated in a variety of forms,
  • 5. including: text, web data, tweets, sensor data, audio, video, click streams, log files and more as discussed in the report from IBM [2]. 4. Veracity It refers to the data uncertainty and the level of reliability associated with certain types of data. One of the critical requirements of Big Data is to have the quality, on the other hand, the available tools to purify the data from its inherited unpredictability is not possible some examples like weather forecast, finance, customer attitudes to buy, and so on [2]. In many organizations there are huge piles of data and in many cases the managers themselves cannot trust the analysis of these data and this uncertainty is very important for the Big Data to be understood from those managers to enable them to take the proper decisions in this continual changing environment. Opportunities to use big data technology and analytics to improve decision- making and performance exist in every industry and managers should be aware of these capabilities. We can take example of the uncertainty of the Big Data in generating energy from natural resources. The amount of data generated about the wind is huge, but still we cannot predict the full picture precisely as we cannot predict the behavior of the weather, the winds and clouds. Despite that, there are still big amount of data that can be valuable and useful to base decisions for future power production. So how do you plan if all these uncertainties are in place? Analysts say through data fusion in which combining multiple less reliable sources to create a more useful data point. An example would be the social comments appended to geospatial location information. The other way to manage uncertainty is using the advanced mathematics such as fuzzy logic and robust optimization techniques. 5. Technique/approach to overcome the challenges The three “Vs”, namely volume, velocity and variety, are the main challenges for the Big Data and there is a requirement for new technology away from the traditional methods used my Rational Database Systems used today to overcome these challenges. One of the approaches used to overcome the issues of Big Data is the Hadoop project which is an open source from Apache that was developed with software libraries which provide reliability, scalability, and distributed system computing. This technology is able to handle the Big Data processing and analytics. It is worth mentioning that Hadoop is widely used at large scale of most Big Data pioneers such as LinkedIn that generates over 100 billion personalized recommendations every week as mentioned in the source [1], and others like twitter as well. To dig into further of the mechanism used with Hadoop, I am going to use simple explanation as follows. The large data set are fragmented or divided into smaller sets, then it is scattered across cluster of servers to do the computation using simple programming method. The number of servers may range from few
  • 6. hundreds to around 2000 thousands or maybe more. The new thing with this computation method is that Hadoop detects and compensates for any hardware failure at the application level whether the traditional method depends on expensive servers. This guarantees the continuity of the services delivered in case of any server failure in any of the clusters. In this case we distributed the computing capabilities among servers of the mass data in a low-cost and effective way [1]. The two key elements of Hadoop are the Hadoop Distributed File System, HDFS, and the MapReduce. The first allows for high bandwidth and the cluster based storage needed by Big Data processing. The second is the data processing framework. The MapReduce is based on Google’s search technology that maps large data sets across the cluster servers. The overall data set is processed in parts with each server and each server is doing his part and then from this it creates a summary. All the summaries are aggregated to the “Reduce stage”. In this way the data is pre-processed before applying traditional data analysis tools [3]. Let’s walk through the technical side a little bit. In the following illustration, we can say that Hadoop consists of two parts, namely the HDFS and the MapReduce. The lower part layer consists of the name node which stores the metadata or the info about the smaller actual data that are processed in the Data node. In the higher layer there is the job tracker who decides what piece of data will run and where. The final part is the task tracker, which runs the code [4]. Figure 1
  • 7. Let us see the differences between the conventional way of processing data and the way it Hadoop, the MapReduce. The following table shows the differences in terms of access, updates, structure, integrity, and scaling. We notice that the data is always moving and dynamic in the MapReduce and writing is discouraged, that the data can scale to higher volumes. Microsoft has adopted Hadoop with some modification to make it easy and user friendly interface and added some connectors to it to make Microsoft with Big Data are Powerview, PowerPivot in Excel, and sharepoint Figure 2 Let us see the differences between the conventional way of processing data and the way it Hadoop, the MapReduce. The following table shows the differences in terms of access, updates, structure, We notice that the data is always moving and dynamic in the MapReduce and writing is discouraged, that the data can scale to higher volumes. Microsoft has adopted Hadoop with some modification to make it easy and user friendly interface and added some connectors to it to make Microsoft-like product. Some of the tools that Microsoft uses to deal th Big Data are Powerview, PowerPivot in Excel, and sharepoint [5]. Let us see the differences between the conventional way of processing data and the way it is used by Hadoop, the MapReduce. The following table shows the differences in terms of access, updates, structure, We notice that the data is always moving and dynamic in the MapReduce and writing is discouraged, and Microsoft has adopted Hadoop with some modification to make it easy and user friendly interface and like product. Some of the tools that Microsoft uses to deal
  • 8. These are some of the tools that are addition, Microsoft is implementing Hadoop on windows Azure and windows server as well. It created JavaScript libraries and frame work for Hadoop and accomplished partnership with Hortonworks Microsoft provided ODBC drivers and hive add enable 3rd party applications to be able to integrate with Hadoop on windows systems [4]. Microsoft is providing, in the following illustration Figure 3 As it is shown in the diagram above, the data maybe structured data, (ERP, CRM, LOB, APPS), or unstructured of different sources, (Sensors, Devices, Bots, Crawlers). It is stored in Enterprise data Warehouse, if it is structured, or to be moved to the uppe platform, Windows server or Azure. It is then processed using SQL Server Analysis Service or SQL Server Reporting Service on the Business Intelligent platform, to be analyzed and to gain insight of all this mi huge Data. At the end, the output is visualized by Excel PowerPivot, Power View, Predictive analytic tools, or Embedded BI tools which all are Microsoft tools that the user is familiar with 3 The source is from Microsoft, reference [5]. ese are some of the tools that are usually used with BI to gain insight of the structured Data. In addition, Microsoft is implementing Hadoop on windows Azure and windows server as well. It created JavaScript libraries and frame work for Hadoop and accomplished partnership with Hortonworks Microsoft provided ODBC drivers and hive add-in for excel to deal with Big Data. The ODBC drivers party applications to be able to integrate with Hadoop on windows systems [4]. the following illustration, solution for the Big Data. As it is shown in the diagram above, the data maybe structured data, (ERP, CRM, LOB, APPS), or unstructured of different sources, (Sensors, Devices, Bots, Crawlers). It is stored in Enterprise data Warehouse, if it is structured, or to be moved to the upper layer to be processed with Hadoop on windows platform, Windows server or Azure. It is then processed using SQL Server Analysis Service or SQL Server Reporting Service on the Business Intelligent platform, to be analyzed and to gain insight of all this mi huge Data. At the end, the output is visualized by Excel PowerPivot, Power View, Predictive analytic tools, BI tools which all are Microsoft tools that the user is familiar with3 . from Microsoft, reference [5]. usually used with BI to gain insight of the structured Data. In addition, Microsoft is implementing Hadoop on windows Azure and windows server as well. It created JavaScript libraries and frame work for Hadoop and accomplished partnership with Hortonworks. Moreover, in for excel to deal with Big Data. The ODBC drivers party applications to be able to integrate with Hadoop on windows systems [4]. As it is shown in the diagram above, the data maybe structured data, (ERP, CRM, LOB, APPS), or unstructured of different sources, (Sensors, Devices, Bots, Crawlers). It is stored in Enterprise data r layer to be processed with Hadoop on windows platform, Windows server or Azure. It is then processed using SQL Server Analysis Service or SQL Server Reporting Service on the Business Intelligent platform, to be analyzed and to gain insight of all this mixed huge Data. At the end, the output is visualized by Excel PowerPivot, Power View, Predictive analytic tools,
  • 9. Oracle is also among the pioneers that are developing methods to solve the Big Data issue. They have developed Oracle Big Data Connectors, Oracle Loader for Hadoop, Oracle Data Integrator [6]. In addition some statistical and analysis capabilities like Open Source Project R and Oracle R Enterprise are developed to take advantage of Hadoop capabilities. Oracle looks to the traditional data and created the tools to facilitate understanding it and gain insight. In the following is traditional data from Oracle perspective. Figure 4 Oracle added new mechanism using the Hadoop technology and their preparatory analytics and BI tools to deal with Big Data. See the following figure. Figure 5 Many of Big Data pioneers deploy the old and new data in parallel that is using the Hadoop alongside with the traditional way. It is also expected that Hadoop will replace other data processing methods and be the dominant solution for Big Data. Big Data will progress as artificial intelligence advances, and as new types of computer processing power become available such as quantum computing which uses quantum mechanical states and is expected to excel theoretically the parallel processing of unstructured data [3]. There are other technologies used in big data include massively parallel-processing (MPP) databases, search-based applications, data-mining grids, distributed file systems , distributed databases, cloud based infrastructure. Almost all of these technologies are not new but there are enhancements in using it is used with Big Data.
  • 10. The Big Data requires a high speed transaction, analysis, and retrieving of data therefore it needs such high capacity hard-drives as SATA drives and/or high speed storage disks such as the Solid State disks, SSD, which are memory- based hard disks. These storage systems are inside the parallel processing nodes used with Big Data. 6. Conclusion In the past decade, the information became a dominant factor in our daily life. Everything surrounding use is digitized and the data kept progressing and moving all the time. The huge amounts of data that is continuously moving and changing became unpredictable and not easy to understand as it is not organized in a way that we can take the benefit of. The main interest in the past for companies is to take whatever information about customer behavior, or take as much data as the medical equipment can take, or collect as much information from the galaxy as the sensor can take but we come to the question “what are we going to do with all these piles of data?” Now we come to an era that companies need to take advantage of all these data in a way that we can take the insight and the value of it. Hadoop is now the major player that Google started to build it algorithm in 2004 with its open source. The pioneers of dealing and processing the databases are now in fast race to adopt and integrate their preparatory analytical and BI tools such as Microsoft and Oracle or IBM. The race is still continuing and the core of all these development is using the parallel processing using Hadoop. We do not know yet what the future is hiding for dealing with the Big Data, is it going to be solved using the new processing algorithms or new hardware adoption using the latest technologies. Is it going to be the issue that comes in front of the queue before the cloud computing? Is the artificial intelligence going to play any role in developing a dynamic algorithm that can cope with the dynamic fast moving data? The question is widely open and future is expected to bring more to us. What’s important is that the key information architecture principles are the same, but the tactics of applying these principles differ from one company to another. We should look to Big Data as an asset that will bring better future for use if we perfectly gain the insight of it. 7. References [1]http://www.explainingcomputers.com. (n.d.). Retrieved June 2, 2013, from http://www.explainingcomputers.com: ] http://www.explainingcomputers.com/big_data.html [2](2013). Retrieved June 4, 2013, from http://public.dhe.ibm.com/: http://public.dhe.ibm.com/common/ssi/ecm/en/gbe03519usen/GBE03519USEN.PDF [3]http://en.wikipedia.org. (2013). Retrieved May 29, 2013, from http://en.wikipedia.org: http://en.wikipedia.org/wiki/Big_data
  • 11. [4]https://www.youtube.com. (n.d.). Retrieved May 29, 2013, from https://www.youtube.com: https://www.youtube.com/watch?v=HM0YX7mpplk [5]http://download.microsoft.com/download/F/A/1/FA126D6D-841B-4565-BB26- D2ADD4A28F24/Microsoft_Big_Data_Solution_Brief.pdf [6]http://www.oracle.com. (n.d.). Retrieved June 4, 2013, from http://www.oracle.com/technetwork/topics/entarch/articles/oea-big-data-guide-1522052.pdf