More Related Content Similar to Rapport veille salon-mobile IT & bigdata Similar to Rapport veille salon-mobile IT & bigdata (20) Rapport veille salon-mobile IT & bigdata1.
EXIBITION WATCH REPORT
Mobile IT & Big Data
rd th
23 ‐25 of October, 2012
Paris, Porte de Versailles
Applications for mobile, Business Intelligence
Pour tous renseignements : contact@veillesalon.com
Tél. 08 71 57 21 78 ‐ Fax. 01 34 35 04 89
A report made by the VIEDOC company
Un site produit et édité par VIEDOC Solutions
2 rue de Hélène Boucher, 78280 Guyancourt, FRANCE
8 rue de Malleville, 95880 Enghien les bains
For any further information: contact@veillesalon.com ‐ Tel : +33(0)1 30 43 45 27
Websites : www.veillesalon.com and www.viedoc.fr
2. P a g e | 2
TABLE OF CONTENTS
ABSTRACT ................................................................................................................................................................ 4
RESUME .................................................................................................................................................................. 4
.
Part 1. innovations on mobile it ........................................................................................................................ 5
1.1 Background on Gamification on mobile ................................................................................................ 5
1.1.1 Definition of gamification ................................................................................................................. 5
.
1.1.2 Gamification market forecast ........................................................................................................... 6
1.1.3 Innovations from platform providers ................................................................................................ 6
1.2 Nomalys by Nomalys (Mobile application) ............................................................................................ 7
1.3 Teopad by Thales ................................................................................................................................... 8
Part 2. Big data ................................................................................................................................................ 12
2.1 Background on Big Data ...................................................................................................................... 12
2.1.1 Defining big data ............................................................................................................................. 12
2.1.2 Characteristics of Big Data: The four Vs .......................................................................................... 12
2.1.3 The Importance of Big Data ............................................................................................................ 13
2.1.4 Estimations of IT spending driven by Big Data issues ..................................................................... 15
2.2 Big Data Architecture Capabilities and their primary technologies .................................................... 16
.
2.2.1 Comparison of information architectures ....................................................................................... 16
2.2.2 Storage and Management Capability .............................................................................................. 17
2.2.3 Database Capability ......................................................................................................................... 19
2.2.4 Processing Capability ...................................................................................................................... 20
.
2.2.5 Data Integration Capability ............................................................................................................. 21
2.2.6 Statistical Analysis Capability .......................................................................................................... 22
2.3 Trends on big data ............................................................................................................................... 23
2.3.1 The Internet of Things already here ................................................................................................ 24
2.3.2 Getting to the right business model(s) for data .............................................................................. 24
2.3.3 Adding a Social layer to traditional activities .................................................................................. 24
2.3.4 The New Frontier of Business Intelligence & Semantics at petabyte scale..................................... 25
2.4 Key companies in the Big Data exibition in Paris ................................................................................. 25
2.4.1 Data Publica .................................................................................................................................... 25
2.4.2 Altic ................................................................................................................................................. 25
2.4.3 Talend .............................................................................................................................................. 26
Conclusion ............................................................................................................................................................. 27
About VEILLE SALON ............................................................................................................................................. 28
PRESENTATION of VIEDOC SARL ........................................................................................................................... 29
© VIEDOC – For any further information: contact@veillesalon.com
3. P a g e | 3
DISCLAIMER
This report was compiled from interviews conducted by us with the exhibitors present at
each event, from gathering and analyzing information in conferences and from the
compilation of information on the web afterwards.
Thus, the data contained in this report have information value. Although the objective is to
disseminate timely and accurate information, VEILLE SALON cannot guarantee the result.
Any damage that may result from use of this information can’t be imputed to this site. The
use or reproduction of all or part of this document is prohibited without the prior written
consent of VEILLE SALON.
For full terms and conditions of use of this report, thank you for contacting us.
© VIEDOC – For any further information: contact@veillesalon.com
4. P a g e | 4
ABSTRACT
According to the organizers, the exhibition Mobile IT and Big Data have not attracted many visitors. Big
companies like Orange, SFR, Bouygues, Free for telecoms or like Intel, Dell, IBM for Big Data were absent. But
the conferences on the evolution of these sectors have been very successful. In a gloomy atmosphere where
visitors and exhibitors talk openly about tiny budgets for information technology, some sectors, however, were
quite healthy and innovative. This was the case for equipment manufacturers and developers of next
generation telephony, or web provider. There were some impressive innovations in the field of smartphones
coming from a large number of young companies, specializing in mobile business solutions. The advent of
smartphones and tablets is revolutionizing enterprise mobility. Judicious use of interfaces from the video
games industry brings playful applications, which allows more friendly use by customers. We talk about
"gamification" phenomenon, which is about to commercially explode in the short term.
Conferences on Big Data grew quite a crowd and allowed visitors to discover an emerging sector that should
weigh heavily in the development of enterprises. In only 10 years, the amount of data increased exponentially.
Data storage is a costly problem for businesses, but these data are relatively untapped by companies. The idea
of big data is to create added value from very diverse data. People now talk about flows, exchanges,
collaborations rather than storage. Nothing is sorted but everything can be found. Big Data (from 10 TB of data)
is revolutionizing the infrastructure in information technology. Environments such as Hadoop provide flexibility
in resources and adapt to the workload by adding inexpensive servers in parallel. Big Data has generated a
turnover of $ 17 billion in 2011 and it is estimated that this figure will double by 2016. The great debate with
big data is to find a balance between data transparency and privacy of citizens.
Key words: mobile, gamification, smartphone, security, big data, data scientists, hadoop, business intelligence
RESUME
De l’aveu même des organisateurs, les salon Mobile IT et Big Data n’ont pas attiré beaucoup de visiteurs et les
grands du milieu comme Orange, SFR, Bouygues, Free pour les télécoms ou comme Intel, Dell, IBM pour les Big
Data étaient absents. Mais les conférences techniques et sociétales sur l’évolution de ces secteurs ont connu
un vif succès. Dans une ambiance morose où visiteurs et exposants parlent ouvertement de chutes des
budgets aux technologies de l’information, certains secteurs affichent cependant une santé de fer, les
fabricants d’équipements et développeurs de téléphonie de nouvelle génération, ou les hébergeurs, pour ne
citer qu’eux. On assiste particulièrement à des innovations florissantes dans le domaine des smartphones avec
un grand nombre de jeunes sociétés, spécialisées dans les solutions professionnelles mobiles. L’arrivée des
smartphones et des tablettes révolutionne la mobilité en entreprise. L’utilisation judicieuse des interfaces
venant de l’industrie des jeux vidéos apporte un côté ludique aux applications, qui permet une meilleure
appropriation par les utilisateurs. On parle de « gamification », phénomène amener à exploser
commercialement à très court terme.
Les conférences sur le Big Data ont amené les visiteurs à découvrir un secteur naissant qui devrait peser très
lourd dans le développement des entreprises. On assiste depuis 10 ans à une explosion du poids des données.
Le stockage de données est une problématique couteuse pour les entreprises, mais ces données sont
relativement peu exploitées par les entreprises. L’idée des big data est de créer de la valeur ajoutée à partir des
données de nature très diverses. On raisonne désormais en flux, échange, collaboration plutôt qu’en stockage.
On ne classe rien mais on retrouve tout. Le Big Data (à partir de 10 To de données) est en train de
révolutionner les infrastructures dans les technologies de l’information. Les environnements comme Hadoop
permettent d’avoir une grande souplesse dans les ressources et de s’adapter à la masse de travail en ajoutant
en parallèles des serveurs peu couteux. Le Big Data a déjà généré un chiffre d’affaires de 17 milliards de dollars
en 2011 et on estime que ce chiffre doublera d’ici 2016. Le grand débat avec la finesse d’exploitation des big
data va être où placer le curseur entre la transparence des données et le respect de la vie privée des citoyens.
Mots clés : big data, stockage, données, valorisation, géolocalisation, mobilité, smartphone, serveur, Hadoop
© VIEDOC – For any further information: contact@veillesalon.com
5. P a g e | 5
PART 1. INNOVATIONS ON MOBILE IT
1.1 BACKGROUND ON GAMIFICATION ON MOBILE
1.1.1 Definition of gamification
Gamification is the use of games or competition to encourage a user to complete an action or set of actions.
Users respond to a range of prompts and are encouraged to return regularly to the application. The prompts
include:
What makes gamification so attractive is the fact that we generally enjoy actively participating and engaging
with others through entertainment. It is in our human nature to interact and be entertained with playful
applications, particularly when there are engaging game design elements employed.
Consumer games and digital entertainment continues to attract attention given the interest the public has with
games. Compelling game mechanics and design are at the core of an engaging user experience. Gamification,
therefore must work to enhance the user experience in order to better engage, retain, motivate and promote
overall participation.
Gamification takes advantage of game mechanics to deliver engaging applications, and make non‐game
© VIEDOC – For any further information: contact@veillesalon.com
6. P a g e | 6
applications more entertaining and appealing. By deploying these dynamics in a co‐ordinated application, a
company can use games to motivate behaviours and drive outcomes for both the customer and the
organisation.
1.1.2 Gamification market forecast
The adoption of applying game mechanics in more nontraditional industries has grown exponentially in the
past 18 months. This is due in part to the growth of social and mobile games, as well as the increasing
consumer adoption of social media.
M2 Research estimates that the market spend on gamification solutions, applying game mechanics and
behavioral analytics in non‐traditional applications will reach $242 million by the end of 2012, which is more
than double from 2011. Revenue estimates are comprised of a number of components that includes:
1. Platform vendor revenue
2. Agency and production revenue
3. Internal development
1.1.3 Innovations from platform providers
2012 is a milestone year for gamification and as it grows will evolve into a serious component of consumer and
employee engagement. It will be critical for both platform providers as well as deploying organizations to
understand that implementing gamification is not a short‐term strategy. It is a long‐term commitment that
requires diligence in audience research, application design and activation/maintenance to ultimately benefit
from the opportunities that gamification principles offer.
© VIEDOC – For any further information: contact@veillesalon.com
7. P a g e | 7
Despite the anticipated growth rates, gamification will remain a market that will be carefully evaluated by
potential customers for platform providers. Mobile IT took really advantage of gamification for application, and
the main innovations displayed in the Mobile IT exhibition in Paris come from platform providers.
1.2 NOMALYS BY NOMALYS (MOBILE APPLICATION)
Address : Contact :
46 rue Auguste Blanqui Celine BLANC
94250 Gentilly, France Courriel : contact@nomalys.com
Tel : 01 46 65 21 58 Website : http://www.nomalys.com/
Fax : 01 79 73 55 89
NOMALYS offers the opportunity to nomad professionals using a Smartphone (iphone, iPad, Android,
BlackBerry et Windows Phone 8) to finally access the totality of their strategic company’s data.
Source: Nomalys, 2012
Every company equipped with a structured IT system can connect it to the Nomalys application. The
applications ergonomics, engine and algorithms have been designed to be generic, this means that every IT
system can be browsed by any mobile device with the same ergonomics and colorful user interface.
However, Nomalys is not only a way to make your CRM or ERP mobile. It is also a chance for each company to
build through the power and the innovative ergonomics of Nomalys an application able to display their large
range of products and services. The access is immediate, intuitive, dynamic and secured. It is possible to be
warned in real time of any important event happening on your database.
© VIEDOC – For any further information: contact@veillesalon.com
8. P a g e | 8
Source: Nomalys, 2012
Connection is made on existing CRM or ERP software. This allows to access data such as: clients, prospects,
stocks, invoices, quotations, pays, human resources, complaints …
Source: Nomalys, 2012
With the solution developed by NOMALYS, your software becomes mobile, dynamic, interactive and fully
promoted.
Nomalys received a Convergence 2012 awards in the Mobile IT exhibition. Nomalys has developed close
partnerships with CNRS, Institut Telecom for developing unique algorithms.
1.3 TEOPAD BY THALES
Address : Contact :
Thales Communications & Security Raphaël BINET
45, rue de Villiers – Product Marketing Manager
92200 Neuilly‐sur‐Seine Cedex. Email: Raphael.binet@thalesgroup.com
Website: http://www.thalesgroup.com Tel : +33 1 46 13 29 52
Mobile: +33 6 08 17 93 91
TEOPAD is a securing solution for professional applications on smartphones and tablets, developed by Thales
and dedicated to companies and public services.
TEOPAD allows to create on the terminal a secure professional environment that can coexist with an open
personal context. This professional environment is in the form of an application that can be started after a
© VIEDOC – For any further information: contact@veillesalon.com
9. P a g e | 9
strong user authentication and by means of a simple icon on the terminal's native desktop. The user can then
access a second desktop, which constitutes his/her professional environment. The latter is completely isolated
from the personal and native part by a patented sandboxing technology.
Source: Thales, 2012
This part is entirely encrypted and controlled, contains all the applications, data and settings necessary for the
user within the framework of his/her business activity:
Applications of all types: web browser, e‐mail client, viewers, note pads, telephony client, business
applications, etc.
Documents, contact database, personal organizer, e‐mail archives, etc.
The innovations developed by Thales enable TEOPAD to propose significant differentiators with respect to the
other market solutions:
Flexibility in choosing the terminal: for a given OS, the solution may be deployed on most of the
market terminals using this OS.
Flexibility in choosing the applications: for a given OS, most of the applications available on the market
may be hosted and protected in the secure environment. This applies to native applications, as well as
to third applications or applications developed by the company for its own needs.
Protection of the information in all its forms: information remains vulnerable when manipulated,
transmitted or stored. Therefore, there is no use encrypting only e‐mails or telephony, as most of the
current solutions offer to do so. TEOPAD allows to protect information in all its editing, viewing or
exchanging contexts.
Flexibility of the secure perimeter: thanks to "TEOPAD Market Place" the company can make any time
new secure applications available for its employees. For instance, they can be adapted depending on
the employees' missions or business trips. This flexibility enables the employee to travel in complete
safety with a terminal, the content of which is strictly adapted to his/her needs. He/she can leave with
a terminal with no professional context, the latter being downloaded securely once he/she has
reached his/her destination.
Simplicity of deployment for the user: once he/she has received his/her authentication means, the
user downloads the TEOPAD application and his/her customized professional context from the
"TEOPAD Market Place" available on the Intranet of his/her company.
User‐friendly interface: TEOPAD preserves integrally the ergonomics of the native OS and the
applications used.
© VIEDOC – For any further information: contact@veillesalon.com
10. P a g e | 10
No additional specific infrastructure: TEOPAD is connected very simply to the existing information
system. There is no use deploying proprietary servers or gateways, which highly limits the costs.
Offer of high‐quality professional services dedicated to the users
Flexible operation: it may be partially or completely given to a trustworthy third.
Source: Thales, 2012
The Teopad sandboxing technology is a unique and patented technology that allows to create terminal duality
between two environments – professional and personal ‐ working simultaneously, but independently,and
without resorting to proprietary applications.
This technology does not rely on virtualization principles, which makes it particularly light, with all possible
benefits in terms of performance and autonomy. The Android applications are authorized to perform specific
tasks or reach system components depending on the privileges they received.
The TEOPAD SANDBOX system controls the authorizations, and then, filters the exchanges between:
professional and personal applications;
professional applications and operating system.
This mechanism allows the Information System Department to limit the interaction capabilities of professional
applications with their environment. The ringfenced professional environment is then generated and is
displayed in the form of a separate desktop on the terminal.
© VIEDOC – For any further information: contact@veillesalon.com
11. P a g e | 11
This technology supplies efficient means to fight against intrusions, information leaks or trapping of
professional applications.
The TEOPAD SANDBOX advantages:
customized compartmentalization of professional applications and data with respect to the rest of the
terminal;
professional desktop that can host any type of applications available on the market or developed by
the company (no mandatory Thales proprietary application);
simultaneous operation of professional and personal environments with unique notification interface
for the user (Android native bar);
application content exclusively from the company's Teopad Market Place and entirely under control of
the latter;
protection of professional data, including those being visualized, when they are no longer encrypted;
very poor print on the terminal, which enables to maintain perfectly the performance of the latter;
user‐friendly interface maintained.
The TEOPAD SANDBOX compartmentalization service is proposed independently from the local encryption
service on the terminal. These are two complementary services.
Source: Thales, 2012
The TEOPAD solution is composed of the following elements:
For the user:
o The TEOPAD application to be installed on the terminal.
o The TEOPAD Market Place client application.
For the company:
o The TEOPAD infrastructure is particularly light as it does not require any proprietary element
to connect the users to the information system.
o It allows a centralized and industrialized deployment, and then operation of TEOPAD. The
tools enable in particular to create generic or customized profiles and to become adapted to
fleets with high dimensions or specialized per business activity.
© VIEDOC – For any further information: contact@veillesalon.com
12. P a g e | 12
PART 2. BIG DATA
2.1 BACKGROUND ON BIG DATA
2.1.1 Defining big data
Big data typically refers to the following types of data:
Traditional enterprise data – includes customer information from CRM systems, transactional ERP
data, web store transactions, general ledger data.
Machine‐generated /sensor data – includes Call Detail Records (“CDR”), weblogs, smart meters,
manufacturing sensors, equipment logs (often referred to as digital exhaust), trading systems data.
Social data – includes customer feedback streams, micro‐blogging sites like Twitter, social media
platforms like Facebook
The McKinsey Global Institute estimates that data volume is growing 40% per year, and will grow 44x between
2009 and 2020. But while it’s often the most visible parameter, volume of data is not the only characteristic
that matters.
Big Data is sized in peta‐, exa‐, and soon perhaps, zetta‐bytes! And, it’s not just about volume, the approach to
analysis contends with data content and structure that cannot be anticipated or predicted. These analytics and
the science behind them filter low value or low‐density data to reveal high value or high‐density data. As a
result, new and often proprietary analytical techniques are required. Big Data has a broad array of interesting
architecture challenges.
2.1.2 Characteristics of Big Data: The four Vs
In fact, there are four key characteristics that define big data: Volume, Velocity, Variety and Value. It is often
said that data volume, velocity, and variety define Big Data, but the unique characteristic of Big Data is the
manner in which the value is discovered.
a) Volume.
Machine‐generated data is produced in much larger quantities than non‐traditional data. For instance, a single
jet engine can generate 10TB of data in 30 minutes. With more than 25,000 airline flights per day, the daily
volume of just this single data source runs into the Petabytes. Smart meters and heavy industrial equipment
© VIEDOC – For any further information: contact@veillesalon.com
13. P a g e | 13
like oil refineries and drilling rigs generate similar data volumes, compounding the problem. People really speak
about big data when the volume is above 10 To.
b) Velocity.
Social media data streams – while not as massive as machine‐generated data – produce a large influx of
opinions and relationships valuable to customer relationship management. Even at 140 characters per tweet,
the high velocity (or frequency) of Twitter data ensures large volumes (over 8 TB per day).
c) Variety.
Traditional data formats tend to be relatively well described and change slowly. In contrast, non‐traditional
data formats exhibit a dizzying rate of change. As new services are added, new sensors deployed, or new
marketing campaigns executed, new data types are needed to capture the resultant information.
d) Value
The economic value of different data varies significantly. Typically there is good information hidden amongst a
larger body of non‐traditional data; the challenge is identifying what is valuable and then transforming and
extracting that data for analysis.
With Big Data, the value is discovered through a refining modeling process: make a hypothesis, create
statistical, visual, or semantic models, validate, then make a new hypothesis. It either takes a person
interpreting visualizations or making interactive knowledge‐based queries, or by developing ‘machine learning’
adaptive algorithms that can discover meaning. And in the end, the algorithm may be short‐lived.
2.1.3 The Importance of Big Data
The growth of big data is a result of the increasing channels and variety of data in today’s world. Some of the
new data sources are user‐generated content through social media, web and software logs, cameras,
information‐sensing mobile devices, aerial sensory technologies, genomics, and medical records.
Source: Cisco, “VNI Service Adoption Forecast, 2011–2016”, May 2012
Companies have realized that there is competitive advantage in this information and that now is the time to
© VIEDOC – For any further information: contact@veillesalon.com
14. P a g e | 14
put this data to work. To make the most of big data, enterprises must evolve their IT infrastructures to handle
the rapid rate of delivery of extreme volumes of data, with varying data types, which can then be integrated
with an organization’s other enterprise data to be analyzed.
When big data is distilled and analyzed in combination with traditional enterprise data, enterprises can develop
a more thorough and insightful understanding of their business, which can lead to enhanced productivity, a
stronger competitive position and greater innovation – all of which can have a significant impact on the bottom
line.
For example, in the delivery of healthcare services, management of chronic or long‐term conditions is
expensive. Use of in‐home monitoring devices to measure vital signs, and monitor progress is just one way that
sensor data can be used to improve patient health and reduce both office visits and hospital admittance.
Manufacturing companies deploy sensors in their products to return a stream of telemetry. Sometimes this is
used to deliver services like OnStar, that delivers communications, security and navigation services. Perhaps
more importantly, this telemetry also reveals usage patterns, failure rates and other opportunities for product
improvement that can reduce development and assembly costs.
The proliferation of smart phones and other GPS devices offers advertisers an opportunity to target consumers
when they are in close proximity to a store, a coffee shop or a restaurant. This opens up new revenue for
service providers and offers many businesses a chance to target new customers.
Retailers usually know who buys their products. Use of social media and web log files from their ecommerce
sites can help them understand who didn’t buy and why they chose not to, information not available to them
today. This can enable much more effective micro customer segmentation and targeted marketing campaigns,
as well as improve supply chain efficiencies.
Finally, social media sites like Facebook and LinkedIn simply wouldn’t exist without big data. Their business
model requires a personalized experience on the web, which can only be delivered by capturing and using all
the available data about a user or member.
© VIEDOC – For any further information: contact@veillesalon.com
15. P a g e | 15
2.1.4 Estimations of IT spending driven by Big Data issues
The huge volumes of data generated by today’s digital businesses, known as “big data”, will drive $28 billion of
worldwide IT spending this year and $34bn next year, according to a forecast from Gartner, the IT research
firm.
At the same time, Gartner predicted that by 2015, 4.4 million IT jobs will be created to support big data,
including 1.9 million in the US, but warned that there will be a scramble for the limited number of IT
professionals qualified to fill these jobs.
© VIEDOC – For any further information: contact@veillesalon.com
16. P a g e | 16
$232 Billion is projected to be sold in total across all categories in the forecast from 2011 to 2016. From $24.4
Billion in 2011 to $43.7 Billion in 2016, this presents a 12.42% CAGR in total market growth.
2.2 BIG DATA ARCHITECTURE CAPABILITIES AND THEIR PRIMARY TECHNOLOGIES
2.2.1 Comparison of information architectures
Big data differs from other data realms in many dimensions. In the following table you can compare and
contrast the characteristics of big data alongside the other data realms.
Source: Oracle, 2012
These different characteristics have influenced how you capture, store, process, retrieve, and secure your
information architectures. As you evolve into Big Data, you can minimize your architecture risk by finding
synergies across your investments allowing you to leverage your specialized organizations and their skills,
equipment, standards, and governance processes.
© VIEDOC – For any further information: contact@veillesalon.com
17. P a g e | 17
Here is an example for data flow architecture diagram when big data is used for combined analytics.
Source: Oracle, 2012
2.2.2 Storage and Management Capability
a) Hadoop Distributed File System (HDFS)
HDFS has two main layers:
Namespace
o Consists of directories, files and blocks
o It supports all the namespace related file system operations such as create, delete, modify
and list files and directories.
Block Storage Service has two parts
© VIEDOC – For any further information: contact@veillesalon.com
18. P a g e | 18
o Block Management (which is done in Namenode)
Provides datanode cluster membership by handling registrations, and periodic heart
beats.
Processes block reports and maintains location of blocks.
Supports block related operations such as create, delete, modify and get block
location.
Manages replica placement and replication of a block for under replicated blocks and
deletes blocks that are over replicated.
o Storage ‐ is provided by datanodes by storing blocks on the local file system and allows
read/write access.
In order to scale the name service horizontally, federation uses multiple independent
Namenodes/namespaces. The Namenodes are federated, that is, the Namenodes are independent and don’t
require coordination with each other. The datanodes are used as common storage for blocks by all the
Namenodes. Each datanode registers with all the Namenodes in the cluster. Datanodes send periodic
heartbeats and block reports and handles commands from the Namenodes. Here the Key Benefits
Namespace Scalability ‐ HDFS cluster storage scales horizontally but the namespace does not. Large
deployments or deployments using lot of small files benefit from scaling the namespace by adding
more Namenodes to the cluster
Performance ‐ File system operation throughput is limited by a single Namenode in the prior
architecture. Adding more Namenodes to the cluster scales the file system read/write operations
throughput.
Isolation ‐ A single Namenode offers no isolation in multi user environment. An experimental
application can overload the Namenode and slow down production critical applications. With multiple
Namenodes, different categories of applications and users can be isolated to different namespaces.
By way of conclusion, here are the main characteristics of HDFS known by developers:
An Apache open source distributed file system, http://hadoop.apache.org
Expected to run on high‐performance commodity hardware
Known for highly scalable storage and automatic data replication across three nodes for fault
tolerance
Automatic data replication across three nodes eliminates need for backup
Write once, read many times
b) Cloudera Manager:
Cloudera Manager is the market‐leading management platform for CDH (Cloudera's Distribution, including
Apache Hadoop). As the industry’s first end‐to‐end management application for Apache Hadoop, Cloudera
Manager sets the standard for enterprise deployment by delivering granular visibility into and control over
every part of CDH ‐ empowering operators to improve cluster performance, enhance quality of service,
increase compliance and reduce administrative costs.
Here are the main characteristics of Clourdera Manager:
Cloudera Manager is an end‐to‐end management application for Cloudera’s Distribution of Apache
Hadoop, http://www.cloudera.com
Cloudera Manager gives a cluster‐wide, real‐time view of nodes and services running; provides a
single, central place to enact configuration changes across the cluster; and incorporates a full range of
reporting and diagnostic tools to help optimize cluster performance and utilization.
© VIEDOC – For any further information: contact@veillesalon.com
19. P a g e | 19
2.2.3 Database Capability
a) Oracle NoSQL
Oracle NoSQL Database delivers scalable throughput with bounded latency, easy administration, and a simple
programming model. It scales horizontally to hundreds of nodes with high availability and transparent load
balancing. "NoSQL" is a general term meaning that the database isn't an RDBMS which supports SQL as its
primary access language, but there are many types of NoSQL databases: BerkeleyDB is an example of a local
NoSQL database, whereas HBase is very much a distributed database.
Source: Oracle, 2012
Here are the main characteristics of Oracle NoSQL:
Dynamic and flexible schema design. High performance key value pair database. Key value pair is an
alternative to a pre‐defined schema. Used for non‐predictive and dynamic data.
Able to efficiently process data without a row and column structure. Major + Minor key paradigm
allows multiple record reads in a single API call
Highly scalable multi‐node, multiple data center, fault tolerant, ACID operations
Simple programming model, random index reads and writes
Not Only SQL. Simple pattern queries and custom‐developed solutions to access data such as Java
APIs.
b) Apache HBase
Apache HBase™ is the Hadoop database, a distributed, scalable, big data store. You can use Apache HBase
when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very
large tables ‐‐ billions of rows X millions of columns ‐‐ atop clusters of commodity hardware. Apache HBase is
an open‐source, distributed, versioned, column‐oriented store modeled after Google's Bigtable: A Distributed
Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage
provided by the Google File System, Apache HBase provides Bigtable‐like capabilities on top of Hadoop and
HDFS.
Here are the main characteristics of Apache Hbase:
Allows random, real time read/write access
Strictly consistent reads and writes
Automatic and configurable sharding of tables
Automatic failover support between Region Servers
c) Apache Cassandra
The Apache Cassandra database is the right choice when you need scalability and high availability without
© VIEDOC – For any further information: contact@veillesalon.com
20. P a g e | 20
compromising performance. Linear scalability and proven fault‐tolerance on commodity hardware or cloud
infrastructure make it the perfect platform for mission‐critical data.
Here are the main characteristics of Apache Cassandra:
Data model offers column indexes with the performance of log‐structured updates, materialized
views, and built‐in caching
Fault tolerance capability is designed for every node, replicating across multiple datacenters
Can choose between synchronous or asynchronous replication for each update
d) Apache Hive
Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad‐hoc queries, and the
analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project
structure onto this data and query the data using a SQL‐like language called HiveQL. At the same time this
language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when
it is inconvenient or inefficient to express this logic in HiveQL.
Hive is based on Hadoop, which is a batch processing system. As a result, Hive does not and cannot promise
low latencies on queries. The paradigm here is strictly of submitting jobs and being notified when the jobs are
completed as opposed to real‐time queries. In contrast to the systems such as Oracle where analysis is run on a
significantly smaller amount of data, but the analysis proceeds much more iteratively with the response times
between iterations being less than a few minutes, Hive queries response times for even the smallest jobs can
be of the order of several minutes. However for larger jobs (e.g., jobs processing terabytes of data) in general
they may run into hours.
In summary, low latency performance is not the top‐priority of Hive's design principles. What Hive values most
are scalability (scale out with more machines added dynamically to the Hadoop cluster), extensibility (with
MapReduce framework and UDF/UDAF/UDTF), fault‐tolerance, and loose‐coupling with its input formats.
Here are the main characteristics of Hive:
Tools to enable easy data extract/transform/load (ETL) from files stored either directly in Apache HDFS
or in other data storage systems such as Apache HBase
Uses a simple SQL‐like query language called HiveQL
Query execution via MapReduce
2.2.4 Processing Capability
a) MapReduce
Source: Oracle, 2012
© VIEDOC – For any further information: contact@veillesalon.com
21. P a g e | 21
MapReduce is a programming model and an associated implementation for processing and generating large
data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate
key/value pairs, and a reduce function that merges all intermediate values associated with the same
intermediate key. Many real world tasks are expressible in this model.
Here are the main characteristics of MapReduce:
Defined by Google in 2004
Break problem up into smaller sub‐problems
Able to distribute data workloads across thousands of nodes
Can be exposed via SQL and in SQL‐based BI tools
b) Apache Hadoop
Apache Hadoop is 100% open source, and pioneered a fundamentally new way of storing and processing data.
Instead of relying on expensive, proprietary hardware and different systems to store and process data, Hadoop
enables distributed parallel processing of huge amounts of data across inexpensive, industry‐standard servers
that both store and process the data, and can scale without limits. With Hadoop, no data is too big. And in
today’s hyper‐connected world where more and more data is being created every day, Hadoop’s breakthrough
advantages mean that businesses and organizations can now find value in data that was recently considered
useless.
Here are the main characteristics of Apache Hadoop:
Leading MapReduce implementation
Highly scalable parallel batch processing
Highly customizable infrastructure
Writes multiple copies across cluster for fault tolerance
2.2.5 Data Integration Capability
a) Oracle Big Data Connectors, Oracle Loader for Hadoop, Oracle Data Integrator
Built from the ground up by Oracle, Oracle Big Data Connectors delivers a high‐performance Hadoop to Oracle
Database integration solution and enables optimized analysis using Oracle’s distribution of open source R
analysis directly on Hadoop data. By providing efficient connectivity, Big Data Connectors enables analysis of all
data in the enterprise – both structured and unstructured.
© VIEDOC – For any further information: contact@veillesalon.com
22. P a g e | 22
Here are the main characteristics of Big data connectors:
Exports MapReduce results to RDBMS, Hadoop, and other targets
Connects Hadoop to relational databases for SQL processing
Includes a graphical user interface integration designer that generates Hive scripts to move and
transform MapReduce results
Optimized processing with parallel data import/export
Can be installed on Oracle Big Data Appliance or on a generic Hadoop cluster
2.2.6 Statistical Analysis Capability
a) Open Source Project R and Oracle R Enterprise:
© VIEDOC – For any further information: contact@veillesalon.com
23. P a g e | 23
R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety
of UNIX platforms, Windows and MacOS. R provides a wide variety of statistical (linear and nonlinear
modelling, classical statistical tests, time‐series analysis, classification, clustering, ...) and graphical techniques,
and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology,
and R provides an Open Source route to participation in that activity.
One of R's strengths is the ease with which well‐designed publication‐quality plots can be produced, including
mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor
design choices in graphics, but the user retains full control. R is available as Free Software under the terms of
the Free Software Foundation's GNU General Public License in source code form. It compiles and runs on a
wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS.
Here are the main characteristics of project R:
Programming language for statistical analysis
Introduced into Oracle Database as a SQL extension to perform high performance in‐database
statistical analysis
Oracle R Enterprise allows reuse of pre‐existing R scripts with no modification
2.3 TRENDS ON BIG DATA
In the Big Data exhibition in Paris, innovations were not really displayed as the Big Data world is continuously
evolving towards something, no one really knows. So, experts were mostly exchanging words on what they are
doing and most importantly on how they feel about the future on big data. They were all agreeing on one
thing: Big Data is something that is going to fuel the 21st century and it is almost impossible to forecast how big
an economical impact will come from the use of Big Data.
Indeed, a new kind of job is coming: Data scientist! But, all the experts pointed out that there will be a shortage
of talent for these jobs. The advance of big data shows no signs of slowing. Data scientists are
© VIEDOC – For any further information: contact@veillesalon.com
24. P a g e | 24
difficult and expensive to hire, and given the very competitive market for their services, difficult to retain.
There simply are not a lot of people with their combination of scientific background and computational and
analytical skills.
Among the conferences, it was possible to define some trends in the Big Data world.
2.3.1 The Internet of Things already here
It is not so long ago that the “Internet of Things” (a vast collection of small devices seamlessly connected to the
Net) was still just a concept in research papers. And before you know it, it’s here, and like Monsieur Jourdain,
people don’t quite fully understand it. Even if you think calling your smart phone a “Thing” is debatable, and
yet it is a “Thing” that sends lots and lots of information to many servers world‐wide, you would be amazed to
know the number of anonymous devices that are already fully connected.
For example, La Poste has worked with Exalead on connecting the opto‐electronic machines that it uses to filter
and sort our mail to the Net. It then uses all the information gathered to build a full‐fledged business
intelligence tool, used to operationally monitor the system. Another example: did you know that high‐end car
manufacturers have turned their vehicles into “Things” that keep sending monitoring information to central
servers to assure better service and maintenance? One has to understand that every such “Thing” creates huge
logs of, literally, hundreds of billions of records: that’s more than pages on the entire Web!
2.3.2 Getting to the right business model(s) for data
Data, is the new frontier these days. Big Data, Open Data, DaaS (Data as a Service), as one can name it. Data is
like Software, it is very scalable: one invests heavily to create data sets, and then sells them by the millions,
with zero or very small marginal costs. At least that is how the theory goes.
But in fairness, it’s hard to say that anybody has cracked the right business model for data. For instance, one
interesting question remains: to be scalable, a data set needs to be reusable by many applications and
developers. But then, the value of such a data data set is probably very low, unless it’s absolutely needed to
build everybody’s application and you have exclusivity, which is likely to be a very rare case, especially with
Open Data.
At the other end of the spectrum, using the Big Data artillery to build a very specific data set can yield a very
exclusive “product” that can only be used by one or maybe a handful of non‐competing companies. Such a data
set can be very expensive (to build and to buy), and can also create a lot of value for the company that uses it.
But it’s an entirely different business model that is very different from the intrinsically scalable business model
of the software industry (especially, SaaS). At least until someone cracks it.
2.3.3 Adding a Social layer to traditional activities
Well, that is also a very interesting trend: using social networks like Twitter to produce real‐time “voice of the
customer” applications. Indeed, Facebook knows what you are doing, Twitter knows what you are saying, and
Google knows what you are thinking.
For instance, Mesagraph is working with broadcasters to build iPad applications connected to TV programs so
that you can comment and interact with other viewers in real‐time, while you’re watching a show. That is truly
revolutionary: finally, a way to connect back to the broadcasters. Consumers can find their interests here, quite
obviously, but at the same time, think of the implications in terms of advertising. Real‐time advertising, even.
Fine‐grained audience segmentation. This is an entirely new field with all sorts of promises and challenges.
Another very interesting application that was presented at WWW2012 is the use of tweets to monitor the
Netflix media streaming service, by detecting tweets containing phrases like “is out” (come on, guys, you can
do better than that :‐) . Even with very simple heuristics, about 90% of outages were correctly detected.
© VIEDOC – For any further information: contact@veillesalon.com
25. P a g e | 25
2.3.4 The New Frontier of Business Intelligence & Semantics at petabyte scale
The Internet of Things is making petabyte scales a reality today (a petabyte is 1,000 terabytes, or 1,000,000
gigabytes). A copy of the entire Web amounts to several petabytes. So Big Data technologies are needed to
handle such a vast amount of data, and one has to perform some form of Business Intelligence to make sense
of it.
There are two major breakthroughs to handle this challenge.
On one side, RAM‐based databases, where data is organized in “columns”, as opposed to “rows”, allow for very
fast processing of large quantities of data (as long as this data fits in RAM, that is). Slicing and dicing couldn’t be
any faster or easier.
On the other hand, search‐engines, which are “columnar” by essence, are evolving to handle many more kind
of data (semantic, numeric, etc.), are becoming more and more transactional (“ACID”, in barbarian terms) and
can process even larger data sets since they do not require that entire data sets fit in RAM.
You get to choose your favorite. But one thing is clear: semantic treatment of textual data will be a major
requirement for next‐generation Business Intelligence platforms. That is the next frontier for Big Data. And
search engines are uniquely positioned to win this race.
2.4 KEY COMPANIES IN THE BIG DATA EXIBITION IN PARIS
2.4.1 Data Publica
Address : Contact :
Data Publica ‐ 8 rue Jouffroy d’Abbans – M. François BANCILHON
75017 Paris, France Mail: francois.bancilhon@data‐publica.com
Website: http://www.data‐publica.com/
Created in July 2011, Data Publica is one of the leading historical open data in France. The company has
benefited from technological investments made in 2010 as part of a R & D project The company was initially
funded by a group of "angels" and the seed fund IT Translation .
Data Publica is a company working on assembling data sets built from both public data and open data, and then
selling these data sets to companies to help them build innovative applications. Data Publica describes itself as
a “Data Vendor” similar, in the domain of Open Data, to what “Software Vendors” are to the domain of
Software.
2.4.2 Altic
Address :
95 Avenue Victor Hugo, 93360 NEUILLY PLAISANCE
Tel: 09 53 64 63 69
Website: http://www.altic.org/
Contact: Marc SALLIERES (CEO), contact@altic.org
ALTIC is an ALTernative of Information and Communication.
It is an Open Source Software integrator created in June of 2004, and a founding member of the ASS2L. ALTIC
assists companies and administration to implement the management software in Open Source. It works on the
following domains and open source solutions: Business Solutions (SpagoBI, Talend, JasperReports, BIRT,
LemonOLAP), Management Solutions (Compiere, Vtiger, SQL/Ledger), Communication Solutions (Joomla!,
Tutos, LemonLDAP). Altic supports also the LemonLDAP project, the Open Source Web SSO.
© VIEDOC – For any further information: contact@veillesalon.com
26. P a g e | 26
2.4.3 Talend
Address : Contact :
Talend SA, 9 rue Pagès, M. Cédric CARBONE
92150 Suresnes Tel: +33 1 46 25 06 00
France Website: http://fr.talend.com/ sales.fr@talend.com
Talend is one of the largest pure play vendors of open source software, offering a breadth of middleware
solutions that address both data management and application integration needs.
Since the emergence of data integration and data quality tools in the 1990s, and the more recent appearance
of Master Data Management solutions, the data management market has been dominated by a small ‐ and
quickly consolidating ‐ number of traditional vendors offering proprietary, closed solutions, which only the
largest and wealthiest organizations can afford. The situation in the application integration space is quite
similar, with significant consolidation occurring as well. As a result, only a minority of organizations use
commercial solutions to meet their data management and application integration needs. Indeed, these
solutions not only demand a steep initial investment, but they also often require significant resources to
manage implementation and ongoing operation.
Furthermore, companies are faced with exponential growth in the volume and heterogeneity of the data and
applications they need to manage and control. A key challenge that IT departments face today is ensuring the
consistency of their data and processes by using modeling tools, workflow management and storage, the
foundations of data governance in any company today. This challenge is actually faced by organizations of all
sizes ‐ not only the largest corporations.
In just a few years, Talend has become the recognized market leader in open source data management. The
acquisition in 2010 of Sopera, a leader in open source application integration, has reinforced Talend’s market
coverage, creating a global leader in open source middleware. Many large organizations around the globe use
Talend's products and services to optimize the costs of data integration, data quality, Master Data
Management (MDM) and application integration. With an ever growing number of product downloads and
paying customers, Talend offers the most widely used and deployed data management solutions in the world.
© VIEDOC – For any further information: contact@veillesalon.com
27. P a g e | 27
CONCLUSION
According to the organizers, the exhibition Mobile IT and Big Data have not attracted many visitors. Big
companies like Orange, SFR, Bouygues, Free for telecoms or like Intel, Dell, IBM for Big Data were absent. But
the conferences on the evolution of these sectors have been very successful. In a gloomy atmosphere where
visitors and exhibitors talk openly about tiny budgets for information technology, some sectors, however, were
quite healthy and innovative. This was the case for equipment manufacturers and developers of next
generation telephony, or web provider. There were some impressive innovations in the field of smartphones
coming from a large number of young companies, specializing in mobile business solutions. The advent of
smartphones and tablets is revolutionizing enterprise mobility. Judicious use of interfaces from the video
games industry brings playful applications, which allows more friendly use by customers. We talk about
"gamification" phenomenon, which is about to commercially explode in the short term.
Conferences on Big Data grew quite a crowd and allowed visitors to discover an emerging sector that should
weigh heavily in the development of enterprises. In only 10 years, the amount of data increased exponentially.
Data storage is a costly problem for businesses, but these data are relatively untapped by companies. The idea
of big data is to create added value from very diverse data. People now talk about flows, exchanges,
collaborations rather than storage. Nothing is sorted but everything can be found. Big Data (from 10 TB of data)
is revolutionizing the infrastructure in information technology. Environments such as Hadoop provide flexibility
in resources and adapt to the workload by adding inexpensive servers in parallel. Big Data has generated a
turnover of $ 17 billion in 2011 and it is estimated that this figure will double by 2016. The great debate with
big data is to find a balance between data transparency and privacy of citizens.
Big data is rapidly emerging as a market force, not just a single market unto itself. Big Data IT Services Spending
will attain a 10.20% CAGR from 2011 to 2016. By 2020, big data functionality will be part of the baseline of
enterprise software, with enterprise vendors enhancing the value of their applications with it.
© VIEDOC – For any further information: contact@veillesalon.com
28. P a g e | 28
ABOUT VEILLE SALON
Officially launched in early 2010 by VIEDOC Consulting, a business & competitive & technological intelligence
company, VeilleSalon.com is the first professional service for watching and reporting on trade show
innovations for companies and is based on one of the largest global directory of trade shows, symposiums and
other international events.
This new professional service is designed both for visitors / companies, for exhibitors and trade show
organizers.
Through a bilingual directory, VEILLE SALON has already referenced more than 7,500 exhibitions and
international events sorted and searchable according to business areas:
for industrial sector : Aerospace, Agriculture, Agribusiness, Automotive, Materials, Construction,
Consumer goods, Cosmetics, Electronics, Defense, Energy, Optics, Pharmaceuticals,
Telecommunications ...
for tertiary sector: Banking / Insurance, Hospitality, Real Estate, Media / advertising, Human Services,
Tourism ...
for business area : Chemistry, Design / Architecture, Distribution, Packaging, Education / Training,
Health & Environment, Computing, Innovation, Maintenance, Mechanical, Quality, Human Resources.
Besides the powerful features of multi‐criteria searches (dates, places, keywords, sectors, organizers, exhibitors
...), VeilleSalon.com also offers visitors a customized and interactive calendar of forthcoming exhibitions, a
monthly newsletter, a forum and many other services.
For potential exhibitors and event’s organizers, VeilleSalon.com is a real communication tool: registration of
new events, presentation of your company and of latest news (product & process innovations, new services),
free or charged conference proceedings, real time information for the visitor ... VeilleSalon.com is also a forum
where visitors can meet directly with you to prepare at best their visit and where they can get information
about your company.
Why offer a professional service dedicated to trade show innovation watching?
Watching trade show innovations is an ideal way to identify and analyze competitors, suppliers, new products,
equipment, and services, to detect technology transfers and innovations, to achieve business development
with potential new customers and to enhance market and trends knowledge.
Therefore the team VEILLESALON, through experienced consultants and seasoned business intelligence
engineers from VIEDOC Consulting, offers a range of services in: reporting on trade show innovations, in France
and abroad, supporting individuals on‐site events, conducting on demand investigations and interviews, staff
training...
So whether you are a company wishing to maximize your trade show innovation watch, a future exhibitor or an
event organizer, we have developed tailored solutions to meet your expectations.
To access our website: http://www.veillesalon.com.
© VIEDOC – For any further information: contact@veillesalon.com
29. P a g e | 29
PRESENTATION OF VIEDOC SARL
VIEDOC CONSULTING’s core business is information. VIEDOC is your company’s partner from strategy to
operation.
VIEDOC aims to assist its customers in the first stages of their activities (Business intelligence, knowledge
management, competitive analysis, technological watch, market research, patent monitoring, benchmarking,
technology transfers, state of the art ...) through information collect and analysis relevant to your business.
Business Intelligence does not require mandatory life‐long skills within the company but impose to get the right
information at the right time. VIEDOC has worked for customers both on extended and short periods of time to
assist companies in decision making.
VIEDOC advises companies from all industries (automotive, aerospace and defense, food, cosmetics, health,
materials, optics, packaging, telecommunications ...).
VIEDOC can assist companies that are ambitious and aware of the importance of investing at this level:
From the small innovative company looking forward to having strategic advice in tight milestones, up
to major industrial groups anxious to keep their leadership position.
Methodology:
We have a pragmatic approach built on a rigorous methodology showing the issues of collecting, processing,
analyzing and dispatching of information with high added value information.
Through its multi‐sector experience, VIEDOC provides its clients with services tailored to their needs by
listening to their concerns and being available to meet their requirements and methods.
To successfully help its customers at different stages of the life of their company (from creation to recovery), of
their products (from design to sale) or of their projects (from the first study to the end of the project), VIEDOC
operates both on process and on product innovation. VIEDOC deals both with technical and economical
information.
You can benefit from our experience, of specialists in collecting and analyzing value‐added information, from
our methodologies and analytical capacity to provide qualified information and high quality validation.
As experts in technology transfer identification, we have consistently grown our multisectoral vision by
providing our professionalism and expertise to many clients, large industrial groups and SMEs, in a dozen of
distinct sectors.
This experience allows us today to make available to our customers, a meaningful analysis which does not
neglect any technical, economical, legal and human implications and fully complies with ethical rules that guide
all activities of our company.
© VIEDOC – For any further information: contact@veillesalon.com
30. P a g e | 30
www.veillesalon.com
Un service made by :
VIEDOC SARL
2 Rue Hélène Boucher
78280 Guyancourt (France)
Tel : +33 (0)1 30 43 45 27
Email : info@viedoc.biz
Website : www.viedoc.fr
© VIEDOC – For any further information: contact@veillesalon.com