SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Downloaden Sie, um offline zu lesen
Tools for
data visualization
01
02
03
The data scientist’s
toolbox
Five data visualization
tools
Get the benefit from data
with four webinars
Data Science stands today as a multidisciplinary profession. The
following is intended to be a basic guide of some useful resources
available for each of the facets performed by these professionals.
The data scientist’s
toolbox
01. TOOLBOX
Data Science stands today as a
multidisciplinary profession, in which
knowledge from various areas overlap in a
profile more typical of the Renaissance than
from this super-specialized 21st century.
TOOLS AND
LANGUAGES
• SQL
• Sqlite
• SQlite3
• RSQlite
• Toad
• Tora
• RapidMiner
• Knime
• Pentaho
• RODBC
• RJDBC
• pyODBC
• mxODBC
• SQLAlchemy
• pandas
• data.table
• XML
• Jsonlite
• json
Given the scarcity of formal training in
this field, data scientists are forced to
collect dispersed knowledge and tools
to optimally develop their skills.
The following is intended to
be a basic guide, obviously
not exhaustive, of some useful
resources available for each of
the facets performed by these
professionals.
Data management
Part of the work of the data scientist it to capture,
clean-up and store information in a format suitable
for its processing and analysis.
The most usual scenario is to access a copy of the
data source for a one-time or periodic capture.
You will need to know SQL to access the data
stored in relational databases. Each database has a
console to execute SQL queries, even though most
01. TOOLBOX
people prefer to use a graphical environment with
information about tables, fields and indexes. Some
of the most popular data management tools are
Toad, proprietary software for Microsoft’s platform,
and Tora, which is open-source and cross-platform.
Once the data is extracted we can store it in plain
text files which we will upload to our working
environment, for machine learning or to be used
with a tool such as SQlite.
01. TOOLBOX
SQlite is a lightweight relational database with no
external dependencies and which does not require
to be installed in a server. Moving a database is as
easy as copying a single file. In our case, when
processing information we can do it without
concurrence or multiple access to the source data,
which perfectly suits the characteristics of SQlite.
The languages we use for our algorithms have
connectivity to SQlite (Python, through SQlite3 and
R, trhough RSQlite) so we can choose to import the
data before preprocessing or to do part of it in the
database itself, which will help us to avoid more
than one problem after a certain amount of
records.
Another alternative to bulk data capture is to use a
tool including the full ETL cycle (Extraction,
Transformation and Load), i.e. RapidMiner, Knime
or Pentaho. With them, we can graphically define
the acquisition and debugging cycles of data using
connectors.
Once we have guaranteed access to the data
source during preprocessing, we can use an ODBC
connection (RODBC and RJDBC in R, and pyODBC,
mxODBC and SQLAlchemy in Python) and benefit
from making connections (JOIN) and groups
(GROUP BY) using the database engine and
subsequently importing the results.
For the external processing, pandas (a Python
library) and data.table (a package in R) are our first
choice. Data.table allows to circumvent one of R’s
weaknesses, memory management, performing
vector operations and reference groups without
having to duplicate objects temporarily.
01. TOOLBOX
A third scenario would be to access
information generated in real time and
transmit it in formats like XML or JSON.
These are called incremental learning
projects, and among them we find
recommendation systems, online advertising
and high frequency trading.
For this we will use tools like XML or jsonlite
(R packages), or xml and json (Python
modules). With them we will make a
streaming capture, make our predictions,
send it back in the same format, and update
our model once the source system provides
us, later on, with the results observed in
reality.
Even though the Business Intelligence, Data
Warehousing and Machine Learning fields are part
of Data Science, the latter is the one which
requires a greater number of specific utilities.
Hence, our toolbox will need to include R y
Python, the programming language most widely
used in machine learning.
Data analysis
01. TOOLBOX
For Python we highlight the suite scikit-learn, which
covers almost all techniques, except perhaps neural
networks. For these we have several interesting
alternatives, such as Caffe and Pylearn2. The latter
is based on Theano, an interesting Python library
that allows symbolic definitions and a transparent
use of GPU processors.
01. TOOLBOX
If we need to change any R package we will need C++ and some utilities that allow us to re-generate them:
Rtools, an environment for creating packages in R under Windows, and devtools, which facilitates all
processes related to development.
• Data.table: Fast reading of text files; creation,
modification and deletion of columns by
reference; joins by a common key or group, and
summary of data.
• Foreach: Execution of parallel processes against
a previously defined backend with utilities such
as doMC or doParallel.
• Bigmemory: Manage massive matrices in R and
share information across multiple sessions or
parallel analyses.
• Caret: Compare models, control data partitions
(splitting, bootstrapping, subsampling) and
tuning parameters (grid search).
• Matrix: Manage sparse matrices and
transformation of categorical variables to binary
(onehote encoding) using the
sparse.model.matrix function.
There are also some general purpose tools that will make our life easier in R:
• Gradient boosting: gbm y xgboost.
• Random forests for classification and regression:
randomForest and randomForestSRC.
• Support vector machines: e1071, LiblineaR and
kernlab.
• Regularized regression (Ridge, Lasso and
ElasticNet): glmnet.
• Generalized additive models: gam.
• Clustering: cluster.
Some of the most used packages for R:
01. TOOLBOX
Distributed environments deserve a special mention. If we have dealt with data from a large institution or
company, we will probably have experience working with the so-called Hadoop ecosystem. Hadoop is a
distributed file system (HDFS) equipped with algorithms (MapReduce) that allows to perform information
processing in parallel.
• Vowpal Wabbit: Online learning methods based
on gradient descent..
• Mahout: A suite of algorithms, including among
them recommendation systems, clustering,
logistic regression, and random forest.
• h2o: Perhaps the tool experiencing a higher
growth phase, with a large number of
parallelizable algorithms. It can be executed
from a graphical environment or from R or
Python.
Among the machine learning tools compatible with Hadoop we find:
The data scientist should also keep abreast of new
trends of generational change of Hadoop to Spark.
Spark has several advantages over Hadoop to
process information and the execution of
algorithms. The main one is speed, as it is 100
times faster because, unlike Hadoop, it uses in-
memory management and only writes to disk
when necessary.
01. TOOLBOX
Spark can run independently or may
coexist as a component of Hadoop,
allowing migration to be planned in a non-
traumatic way. You can, for example,
use HBase as a database, even
though Cassandra is emerging as a
storage solution thanks to its redundancy
and scalability.
Spark can run independently or may
coexist as a component of Hadoop,
allowing migration to be planned in a non-
traumatic way. You can, for example,
use HBase as a database, even
though Cassandra is emerging as a
storage solution thanks to its redundancy
and scalability.
Finally, a brief reference to the
presentation of results.
The most popular tools for R are
clearly lattice y ggplot2,
and Matplotlib for Python. But if we
need professional presentations
embedded in web environments the
best choice is certainly D3.js.
Among the integrated Business
Intelligence environments with a clear
approach to presentations we should
highlight the well known Tableau, and
as alternatives for graphical
exploration of data, Birst and Necto.
Visualization
01. TOOLBOX
We present you some of the best data visualization tools that you
can use in your business to take full advantage of the large
amount of information created every day in the digital world.
Five data visualization
tools that you should not miss
02. DATA VISUALIZATION TOOLS
The digital universe is reaching new
thresholds. The amount of data
generated by both private users and
companies is growing at a rapid pace.
Actually, according to a study by IDC and
EMC, the world of digital data is doubling
its size every two years, and in 2020 it
will have generated 44 zettabytes of
information, or what is the same: 44
trillion gigabytes of structured and
unstructured data.
The fact of creating and accessing a
website, participating in a blog, increasing
our number of followers, post comments,
send a tweet or just surfing the internet
produces a whole range of data that, if
exploited properly, can be of great value
for companies.
VISUALIZATION TOOLS INDEX
• Google Fusion Tables
• CartoDB
• Tableau Public
• iCharts
• Smart Data Report
02. DATA VISUALIZATION TOOLS
The big challenge, however, is to make sense of all
that data. That is, to be able to capture, link,
analyze and extract its true value, so that the
information can be presented in an attractive, clear,
concise and understandable manner, facilitating
decision making in your business. Exploring and
analyzing visually customers’ data can also take you
to discover new ways to reach them, create a
better segmentation, personalized offers for
products or services, and generate innovative ideas,
among many other possibilities which can
contribute to maintain the engagement between
your brand and your users over time.
Where to start
The first steps in data visualization may be
intimidating. Fortunately, the same way data is
growing, so do the tools that help us get the most
out of it. Here we present the five tools that we
consider the best, based on the capabilities they
provide and the level of experience required.
Google Fusion Tables
02. DATA VISUALIZATION TOOLS
It’s an excellent tool for beginners or for those
who don’t know programming. For more
advanced users there is an API that allows to
produce graphics or maps from information.
One of the advantages of this application is the
diversity of data representations it offers. It also
offers a relatively fast way to create graphics and
maps, including GIS functions to analyze data by
geographic area.
This tool is used frequently by The Guardian to
produce detailed maps very quickly.
CartoDB
This is an open source service directed to any user,
regardless his technical level, with a friendly
interface. It allows to create a variety of interactive
maps, choosing from a catalog of options (which
includes Google Maps) or adding your own
customized maps.
The most interesting feature of this tool is that it lets
you access Twitter’s data to see how users react to
a brand, a particular marketing campaign or event.
We can see a good example of this on the map
tracking tweets that was created last year with the
launch of Beyoncé’s latest album. It shows clearly
the places where the release had more impact. This
is a great source of visual information for marketing
professionals and businesses.
It should also be highlighted that it has an active
group of developers who provide extensive
documentation and examples. In addition, the open
nature of its API allows to create continuously new
integrations and to increase the capabilities of the
tool with new libraries.
02. DATA VISUALIZATION TOOLS
Tableau Public
02. DATA VISUALIZATION TOOLS
With Tableau Public you can create easily
interactive maps, bar and pie charts, etc. One of its
advantages is that, like Google Fusion Tables, you
can import tables from Excel to facilitate your work.
In a matter of minutes you can generate an
interactive graphic, embed it in your website and
share it. For example, the news portal Global
Post created with it a series of charts about the best
countries to do business in Africa.
In the recently released 8.2 version we can also find
the new OpenStreetMap tool, which allows to
produce very detailed maps from local data such as
cafes or shops. Tableau Public is a free tool,
although it also has a premium version.
iCharts
You can get started in the world of data
visualization with the service offered by iCharts,
which has a free version (Basic) and two premium
options (Platinum and Enterprise). With this tool you
can create visualizations in just a few steps,
exporting Excel and Google Drive documents or
adding data manually.
Through this tool it is possible to share your
graphics with your collaborators privately, besides
being able to edit and update them with new data
through its cloud computing service. You can even
share them with your clients through emails,
newsletters or social networks.
Among the companies using this service we find
the prestigious consulting firm IDC, which
uses iCharts to provide visual images of relevant
data included in its reports.
02. DATA VISUALIZATION TOOLS
Smart Data Report
Finally, we also recommend Smart Data Report,
which is not a tool as powerful as the previous ones
but has the advantage of being an affordable data
solution for entrepreneurs and small businesses
whose workers don’t have much spare time.
Among other services, this website offers free data
analysis and the option to receive reports by email,
without having to create them yourself. Once the
service has your report ready, it generates
an HTML code that you can embed in your
corporate website or in your articles.
02. DATA VISUALIZATION TOOLS
Mapping data, visualizing them in geospatial apps and applying
automatic learning. We put our knowledge into practice with the
help of these video tutorials.
Get the maximum benefit from data with
these four webinars
Mapping data
03. WEBINARS
CartoDB explains how to convert location data into knowledge for your business. In this tutorial you can learn
how to analyze, visualize and build data apps using the CartoDB tool.
Machine Learning
03. WEBINARS
Now summer's round the corner, Andrés González, solutions manager for Big Data and Data Prediction at
Clever Task, shows us how to make forecasts from data in a very specific area: the tourist sector.
Geospatial apps
03. WEBINARS
And if you want to learn to create apps and geospatial data, you can't miss this tutorial –also by CartoDB–
explaining how you can make the most of an API –in this case the one opened by BBVA for the
InnovaChallenge competition– to create apps and visualizations.
Good examples of visualization
03. WEBINARS
Finally, to finish off this selection, Alberto Cairo, professor of data visualization at the Universidad de Miami,
teaches us good practices in data visualization. It's good to learn from our own mistakes and from the
successes of others.
share
THIS MIGHT INTEREST YOU
Innovation Edge Big Data: to create
business value with data
Emerging Tech: Data visualization
beyond the noise
Infographic: the keys of Big Data by
DJ Patil
Infographic: Big Data, chronology,
present and future
Caso study: data visualization with
Illustreets y CartoDB
Sign up
To keep up to
date with the
latest trends
Interact with us on:
BBVA no BBVA is not resposible for the opinions expressed here in
www.bbvaopen4u.com

Weitere ähnliche Inhalte

Kürzlich hochgeladen

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 

Kürzlich hochgeladen (20)

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 

Empfohlen

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Empfohlen (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Data visualization tools for users

  • 1. Tools for data visualization 01 02 03 The data scientist’s toolbox Five data visualization tools Get the benefit from data with four webinars
  • 2. Data Science stands today as a multidisciplinary profession. The following is intended to be a basic guide of some useful resources available for each of the facets performed by these professionals. The data scientist’s toolbox
  • 3. 01. TOOLBOX Data Science stands today as a multidisciplinary profession, in which knowledge from various areas overlap in a profile more typical of the Renaissance than from this super-specialized 21st century. TOOLS AND LANGUAGES • SQL • Sqlite • SQlite3 • RSQlite • Toad • Tora • RapidMiner • Knime • Pentaho • RODBC • RJDBC • pyODBC • mxODBC • SQLAlchemy • pandas • data.table • XML • Jsonlite • json Given the scarcity of formal training in this field, data scientists are forced to collect dispersed knowledge and tools to optimally develop their skills. The following is intended to be a basic guide, obviously not exhaustive, of some useful resources available for each of the facets performed by these professionals.
  • 4. Data management Part of the work of the data scientist it to capture, clean-up and store information in a format suitable for its processing and analysis. The most usual scenario is to access a copy of the data source for a one-time or periodic capture. You will need to know SQL to access the data stored in relational databases. Each database has a console to execute SQL queries, even though most 01. TOOLBOX people prefer to use a graphical environment with information about tables, fields and indexes. Some of the most popular data management tools are Toad, proprietary software for Microsoft’s platform, and Tora, which is open-source and cross-platform. Once the data is extracted we can store it in plain text files which we will upload to our working environment, for machine learning or to be used with a tool such as SQlite.
  • 5. 01. TOOLBOX SQlite is a lightweight relational database with no external dependencies and which does not require to be installed in a server. Moving a database is as easy as copying a single file. In our case, when processing information we can do it without concurrence or multiple access to the source data, which perfectly suits the characteristics of SQlite. The languages we use for our algorithms have connectivity to SQlite (Python, through SQlite3 and R, trhough RSQlite) so we can choose to import the data before preprocessing or to do part of it in the database itself, which will help us to avoid more than one problem after a certain amount of records. Another alternative to bulk data capture is to use a tool including the full ETL cycle (Extraction, Transformation and Load), i.e. RapidMiner, Knime or Pentaho. With them, we can graphically define the acquisition and debugging cycles of data using connectors. Once we have guaranteed access to the data source during preprocessing, we can use an ODBC connection (RODBC and RJDBC in R, and pyODBC, mxODBC and SQLAlchemy in Python) and benefit from making connections (JOIN) and groups (GROUP BY) using the database engine and subsequently importing the results. For the external processing, pandas (a Python library) and data.table (a package in R) are our first choice. Data.table allows to circumvent one of R’s weaknesses, memory management, performing vector operations and reference groups without having to duplicate objects temporarily.
  • 6. 01. TOOLBOX A third scenario would be to access information generated in real time and transmit it in formats like XML or JSON. These are called incremental learning projects, and among them we find recommendation systems, online advertising and high frequency trading. For this we will use tools like XML or jsonlite (R packages), or xml and json (Python modules). With them we will make a streaming capture, make our predictions, send it back in the same format, and update our model once the source system provides us, later on, with the results observed in reality.
  • 7. Even though the Business Intelligence, Data Warehousing and Machine Learning fields are part of Data Science, the latter is the one which requires a greater number of specific utilities. Hence, our toolbox will need to include R y Python, the programming language most widely used in machine learning. Data analysis 01. TOOLBOX For Python we highlight the suite scikit-learn, which covers almost all techniques, except perhaps neural networks. For these we have several interesting alternatives, such as Caffe and Pylearn2. The latter is based on Theano, an interesting Python library that allows symbolic definitions and a transparent use of GPU processors.
  • 8. 01. TOOLBOX If we need to change any R package we will need C++ and some utilities that allow us to re-generate them: Rtools, an environment for creating packages in R under Windows, and devtools, which facilitates all processes related to development. • Data.table: Fast reading of text files; creation, modification and deletion of columns by reference; joins by a common key or group, and summary of data. • Foreach: Execution of parallel processes against a previously defined backend with utilities such as doMC or doParallel. • Bigmemory: Manage massive matrices in R and share information across multiple sessions or parallel analyses. • Caret: Compare models, control data partitions (splitting, bootstrapping, subsampling) and tuning parameters (grid search). • Matrix: Manage sparse matrices and transformation of categorical variables to binary (onehote encoding) using the sparse.model.matrix function. There are also some general purpose tools that will make our life easier in R: • Gradient boosting: gbm y xgboost. • Random forests for classification and regression: randomForest and randomForestSRC. • Support vector machines: e1071, LiblineaR and kernlab. • Regularized regression (Ridge, Lasso and ElasticNet): glmnet. • Generalized additive models: gam. • Clustering: cluster. Some of the most used packages for R:
  • 9. 01. TOOLBOX Distributed environments deserve a special mention. If we have dealt with data from a large institution or company, we will probably have experience working with the so-called Hadoop ecosystem. Hadoop is a distributed file system (HDFS) equipped with algorithms (MapReduce) that allows to perform information processing in parallel. • Vowpal Wabbit: Online learning methods based on gradient descent.. • Mahout: A suite of algorithms, including among them recommendation systems, clustering, logistic regression, and random forest. • h2o: Perhaps the tool experiencing a higher growth phase, with a large number of parallelizable algorithms. It can be executed from a graphical environment or from R or Python. Among the machine learning tools compatible with Hadoop we find: The data scientist should also keep abreast of new trends of generational change of Hadoop to Spark. Spark has several advantages over Hadoop to process information and the execution of algorithms. The main one is speed, as it is 100 times faster because, unlike Hadoop, it uses in- memory management and only writes to disk when necessary.
  • 10. 01. TOOLBOX Spark can run independently or may coexist as a component of Hadoop, allowing migration to be planned in a non- traumatic way. You can, for example, use HBase as a database, even though Cassandra is emerging as a storage solution thanks to its redundancy and scalability. Spark can run independently or may coexist as a component of Hadoop, allowing migration to be planned in a non- traumatic way. You can, for example, use HBase as a database, even though Cassandra is emerging as a storage solution thanks to its redundancy and scalability.
  • 11. Finally, a brief reference to the presentation of results. The most popular tools for R are clearly lattice y ggplot2, and Matplotlib for Python. But if we need professional presentations embedded in web environments the best choice is certainly D3.js. Among the integrated Business Intelligence environments with a clear approach to presentations we should highlight the well known Tableau, and as alternatives for graphical exploration of data, Birst and Necto. Visualization 01. TOOLBOX
  • 12. We present you some of the best data visualization tools that you can use in your business to take full advantage of the large amount of information created every day in the digital world. Five data visualization tools that you should not miss
  • 13. 02. DATA VISUALIZATION TOOLS The digital universe is reaching new thresholds. The amount of data generated by both private users and companies is growing at a rapid pace. Actually, according to a study by IDC and EMC, the world of digital data is doubling its size every two years, and in 2020 it will have generated 44 zettabytes of information, or what is the same: 44 trillion gigabytes of structured and unstructured data. The fact of creating and accessing a website, participating in a blog, increasing our number of followers, post comments, send a tweet or just surfing the internet produces a whole range of data that, if exploited properly, can be of great value for companies. VISUALIZATION TOOLS INDEX • Google Fusion Tables • CartoDB • Tableau Public • iCharts • Smart Data Report
  • 14. 02. DATA VISUALIZATION TOOLS The big challenge, however, is to make sense of all that data. That is, to be able to capture, link, analyze and extract its true value, so that the information can be presented in an attractive, clear, concise and understandable manner, facilitating decision making in your business. Exploring and analyzing visually customers’ data can also take you to discover new ways to reach them, create a better segmentation, personalized offers for products or services, and generate innovative ideas, among many other possibilities which can contribute to maintain the engagement between your brand and your users over time. Where to start The first steps in data visualization may be intimidating. Fortunately, the same way data is growing, so do the tools that help us get the most out of it. Here we present the five tools that we consider the best, based on the capabilities they provide and the level of experience required.
  • 15. Google Fusion Tables 02. DATA VISUALIZATION TOOLS It’s an excellent tool for beginners or for those who don’t know programming. For more advanced users there is an API that allows to produce graphics or maps from information. One of the advantages of this application is the diversity of data representations it offers. It also offers a relatively fast way to create graphics and maps, including GIS functions to analyze data by geographic area. This tool is used frequently by The Guardian to produce detailed maps very quickly.
  • 16. CartoDB This is an open source service directed to any user, regardless his technical level, with a friendly interface. It allows to create a variety of interactive maps, choosing from a catalog of options (which includes Google Maps) or adding your own customized maps. The most interesting feature of this tool is that it lets you access Twitter’s data to see how users react to a brand, a particular marketing campaign or event. We can see a good example of this on the map tracking tweets that was created last year with the launch of Beyoncé’s latest album. It shows clearly the places where the release had more impact. This is a great source of visual information for marketing professionals and businesses. It should also be highlighted that it has an active group of developers who provide extensive documentation and examples. In addition, the open nature of its API allows to create continuously new integrations and to increase the capabilities of the tool with new libraries. 02. DATA VISUALIZATION TOOLS
  • 17. Tableau Public 02. DATA VISUALIZATION TOOLS With Tableau Public you can create easily interactive maps, bar and pie charts, etc. One of its advantages is that, like Google Fusion Tables, you can import tables from Excel to facilitate your work. In a matter of minutes you can generate an interactive graphic, embed it in your website and share it. For example, the news portal Global Post created with it a series of charts about the best countries to do business in Africa. In the recently released 8.2 version we can also find the new OpenStreetMap tool, which allows to produce very detailed maps from local data such as cafes or shops. Tableau Public is a free tool, although it also has a premium version.
  • 18. iCharts You can get started in the world of data visualization with the service offered by iCharts, which has a free version (Basic) and two premium options (Platinum and Enterprise). With this tool you can create visualizations in just a few steps, exporting Excel and Google Drive documents or adding data manually. Through this tool it is possible to share your graphics with your collaborators privately, besides being able to edit and update them with new data through its cloud computing service. You can even share them with your clients through emails, newsletters or social networks. Among the companies using this service we find the prestigious consulting firm IDC, which uses iCharts to provide visual images of relevant data included in its reports. 02. DATA VISUALIZATION TOOLS
  • 19. Smart Data Report Finally, we also recommend Smart Data Report, which is not a tool as powerful as the previous ones but has the advantage of being an affordable data solution for entrepreneurs and small businesses whose workers don’t have much spare time. Among other services, this website offers free data analysis and the option to receive reports by email, without having to create them yourself. Once the service has your report ready, it generates an HTML code that you can embed in your corporate website or in your articles. 02. DATA VISUALIZATION TOOLS
  • 20. Mapping data, visualizing them in geospatial apps and applying automatic learning. We put our knowledge into practice with the help of these video tutorials. Get the maximum benefit from data with these four webinars
  • 21. Mapping data 03. WEBINARS CartoDB explains how to convert location data into knowledge for your business. In this tutorial you can learn how to analyze, visualize and build data apps using the CartoDB tool.
  • 22. Machine Learning 03. WEBINARS Now summer's round the corner, Andrés González, solutions manager for Big Data and Data Prediction at Clever Task, shows us how to make forecasts from data in a very specific area: the tourist sector.
  • 23. Geospatial apps 03. WEBINARS And if you want to learn to create apps and geospatial data, you can't miss this tutorial –also by CartoDB– explaining how you can make the most of an API –in this case the one opened by BBVA for the InnovaChallenge competition– to create apps and visualizations.
  • 24. Good examples of visualization 03. WEBINARS Finally, to finish off this selection, Alberto Cairo, professor of data visualization at the Universidad de Miami, teaches us good practices in data visualization. It's good to learn from our own mistakes and from the successes of others.
  • 25. share THIS MIGHT INTEREST YOU Innovation Edge Big Data: to create business value with data Emerging Tech: Data visualization beyond the noise Infographic: the keys of Big Data by DJ Patil Infographic: Big Data, chronology, present and future Caso study: data visualization with Illustreets y CartoDB
  • 26. Sign up To keep up to date with the latest trends Interact with us on: BBVA no BBVA is not resposible for the opinions expressed here in www.bbvaopen4u.com