SlideShare ist ein Scribd-Unternehmen logo
1 von 90
The Download: Community Tech Talks
Episode 12
March 15, 2018
Welcome!
• Please share: Let others know you are here with #HPCCTechTalks
• Ask questions! We will answer as many questions as we can following each speaker.
• Look for polls at the bottom of your screen. Exit full-screen mode or refresh your screen if
you don’t see them.
• We welcome your feedback - please rate us before you leave today and visit our blog for
information after the event.
• Want to be one of our featured speakers? Let us know! techtalks@hpccsystems.com
The Download: Tech Talks #HPCCTechTalks2
Watch for Details
Announced Soon!
Community announcements
3
Dr. Flavio Villanustre
VP Technology
RELX Distinguished Technologist
LexisNexis® Risk Solutions
Flavio.Villanustre@lexisnexisrisk.com
The Download: Tech Talks #HPCCTechTalks
• HPCC Systems® Platform updates
• 6.4.12 is the latest gold version / Community Changelog
• 7.0.0 Beta planned for early Q2 – among the key features:
• Spark integration
• Indexer
• Record Translation
• Session Management Improvements
• VS Code Beta version
• Roadmap items for 2018 and beyond
• New Case Study
• 3LOQ leverages HPCC Systems in their Habitual AI solution
• Latest Blogs
• Tips and Tricks for ECL – Part 2 - PARSE
• Fly on the wall at our first Hackathon
• Reminder: 2018 Summer Internship Proposal Period Open Through April 6, 2018
• Interested candidates can submit proposals from the Ideas List
• Program runs late May through mid August
• Visit the Student Wiki for more details
2018 HPCC Systems
Community Day
Coming soon - 10K Trees Campaign for Earth Day
4 The Download: Tech Talks #HPCCTechTalks
World Planting Day, March 21
through Earth Day on April 22
• Help us help the environment on behalf of our
community!
• HPCC Systems is dedicated to the environment
and is giving you the opportunity to take
action and be a small part of a big impact.
• HPCC Systems, partnering with the National
Forest Foundation, is growing and promoting
awareness of environmental sustainability with
their 10,000 Trees challenge.
Today’s speakers
5 The Download: Tech Talks #HPCCTechTalks
Itauma Itauma
PhD Candidate,
Keiser University
amightyo@gmail.com
Itauma Itauma is a doctoral candidate at Keiser University and a computer science
instructor at Wayne State University. His interests lie in learning analytics and utilizing
HPCC Systems for educational research. He has an undergraduate degree in Electrical
Engineering from the University of Ilorin and two Masters Degrees, a Master of Science
in Computer Engineering from Istanbul Technical University, majoring in human-robot
interaction and a Master of Science in Computer Science from Wayne State University
where his thesis was based on leveraging HPCC Systems for Big Data analytics.
Featured Community Speaker
Today’s speakers
6 The Download: Tech Talks #HPCCTechTalks
Ignacio Calvo
Software Engineering Lead
LexisNexis Risk Solutions
Ignacio.Calvo@lexisnexisrisk.com
Ignacio is a Software Engineering Lead with 17 years of experience in the
development of IT projects for different markets (insurance, finance, telecom,
retailing). He has been working for 5 years in LexisNexis creating Big Data solutions
with geospatial capabilities using HPCC Systems. He is the organizer of the HPCC
Systems meetup group in Dublin and a CoderDojo mentor.
Bob Foreman
Senior Software Engineer
LexisNexis Risk Solutions
Robert.Foreman@lexisnexisrisk.com
Bob Foreman has worked with the HPCC Systems technology platform and
the ECL programming language for over 5 years, and has been a technical
trainer for over 25 years. He is the developer and designer of the HPCC
Systems Online Training Courses, and is the Senior Instructor for all
classroom and Webex/Lync based training.
Conducting exploratory data analysis in
educational research using HPCC Systems®
Itauma Itauma
PhD Candidate
Keiser University
Quick poll:
How strongly correlated do you think identification
with math, and confidence in the ability to succeed
in math are?
See poll on bottom of presentation screen
Outline
The Download: Tech Talks #HPCCTechTalks9
• What is Exploratory Data Analysis (EDA)?
• Why is EDA Important?
• Techniques, Types, and Steps
• Role in Educational Research
• The HPCC Systems Advantage in Educational Research
• Data Visualization Examples
• Exploring the HSLS:09 Dataset
What is Exploratory Data Analysis (EDA)?
• Broad open minded overview of data
• Converts data from its raw form to a form
that makes sense
• Allows the data to speak for itself with no
assumptions made
• No rigidity with rules
• An important first step in data analysis
The Download: Tech Talks #HPCCTechTalks10
Exploratory Data Analysis
• Consists of:
• Organizing and summarizing raw data
• Looking for important features and patterns in
the data
• Looking for any striking deviations from any
pattern found
• Interpreting findings in the context of the
research question
The Download: Tech Talks #HPCCTechTalks11
Why is Exploratory Data Analysis Important?
• Gain new insight
• Explore data structures
• Detect missing data
• Check significant variables
• Examine relationships between
variables
• Select an appropriate model
• Check model assumptions
The Download: Tech Talks #HPCCTechTalks12
Importance of Exploratory Data Analysis
• Summarizes data
• Often reveals new ways to think about data.
• Helps in refining research questions and
sometimes reveals new questions.
• After EDA, we are able to ask specific questions
of our data
The Download: Tech Talks #HPCCTechTalks13
Techniques of EDA
• Usually graphical
• May be combined with quantitative techniques.
• Visualization helps to discover data patterns.
• raw data plots such as traces, histograms, and
probability plots;
• simple statistics plots such as mean plots, standard
deviation plots, and box plots.
• No limitation to these techniques
• A researcher can develop novel ways to visualize
data
The Download: Tech Talks #HPCCTechTalks14
Types and Steps in Exploratory Data Analysis
• Graphical vs Non-graphical
• Univariate vs Multivariate
• Examine one variable at a time
• Summarize and then examine the distribution of variable(s) of interest.
• What values the variables take
• How often the variables take those values.
• Can come up with different research questions and choose to analyze the
data in different ways.
• Data is so awesome and having a tool that makes it very easy to analyze
makes it fun and exciting.
The Download: Tech Talks #HPCCTechTalks15
Exploratory Data Analysis
• Statistics: collects data, summarizes data, and interprets data
• Statistics plays a significant role in social sciences which includes the field of
education. Converts data into useful information.
• EDA= Data Visualization + Statistics = Better data decision making
The Download: Tech Talks #HPCCTechTalks16
Educational Research
• Systematic and organized inquiry applied to
collecting, analyzing, and reporting
information that addresses educational
problems and questions (McMillan, 2015)
• Describe
• Predict
• Improve
• Explain
• Important for the advancement of knowledge
in the field of education
The Download: Tech Talks #HPCCTechTalks17
Machine Learning vs Statistical Learning: The HPCC Systems
Advantage in Educational Research
• Era of big data, learning analytics and
personalized learner experience
• Machine learning needed to build systems that
learn from data
• Learning analytics: The process of quantifying,
analyzing, and reporting learner data to discover
patterns and enhance learning to improve
learner performance (Siemens & Baker, 2012)
• Growing data collection from learning platforms
and devices to create personalized data-driven
learning programs for learner success
The Download: Tech Talks #HPCCTechTalks18
Machine Learning vs Statistical Learning: The HPCC Systems
Advantage in Educational Research
• Educational researchers need tools that can
handle big data, and will benefit from the
use of HPCC Systems.
• Statistical learning is limited because of the
need to develop a hypothesis and make
assumptions about the data before building
a model.
• In machine learning, algorithms are flexible,
run directly on the model, and outputs the
requested features with the data speaking
out for itself.
The Download: Tech Talks #HPCCTechTalks19
Machine Learning vs Statistical Learning: The HPCC Systems
Advantage in Educational Research
• Common statistical tools such as SPSS widely used in educational research, is
limited in terms of scalability and big data.
• HPCC Systems is open source, and can handle both data visualization and
statistical analysis, all integrated in the platform.
• The HPCC Systems Visualization Bundle provides visual representations of
data analysis.
• HPCC Systems can also perform simple descriptive & inferential statistics.
The Download: Tech Talks #HPCCTechTalks20
https://github.com/hpcc-systems/Visualization
Data Visualization in HPCC Systems
• Visualization bundle is an open-source add-
on to the HPCC Systems platform to allow
the creation of visualizations from the
results of queries written in ECL
• Important means of conveying information
from massive datasets
• Pie Charts, Line graphs, Maps, and other
visual graphs
• Simplifies the complex
• In addition, the underlying visualization
framework supports advanced features to
allow the combination of graphs to make
interactive dashboards
• Integration of Tableau in HPCC Systems
(Alternative)
The Download: Tech Talks #HPCCTechTalks21
Data Visualization Examples
• In a previous study, HPCC Systems
ML correlation and regression
modules were used to determine
the strength of the correlation
between chocolate consumption,
life expectancy, and happiness.
The Download: Tech Talks #HPCCTechTalks22
Exploring the HSLS:09 Dataset
• The US High School Longitudinal
Study of 2009 (HSLS:09) is a national
cohort study of over 23,000 ninth
graders from 944 schools, in 2009,
through their secondary and post-
secondary years.
• Focus of the HSLS:09 includes
students’ trajectories from high
school, and how students choose
college majors and careers.
The Download: Tech Talks #HPCCTechTalks23
• Research Question: Is Math Identity associated
with Math Self-efficacy?
• Math identity: the level of a student's
identification with math represented by
agreements with the statements "You see
yourself as a math person" and/or "Others see
me as a math person".
• Self-efficacy: the level of confidence a student
has about the ability to succeed.
• STEM (Science Technology Engineering
Mathematics)
• Let’s find out! Remember, EDA helps in refining
research questions and sometimes reveals new
questions.
The Download: Tech Talks #HPCCTechTalks 24
Exploring the HSLS:09 Dataset
Exploring the HSLS:09 Dataset
• CSV file sprayed into the HPCC Systems
cluster
• Recordset filtering
• Three features projected
• X1MTHID –Math Identity
• X1MTHEFF –Math Self-efficacy
• X2SEX –Gender
The Download: Tech Talks #HPCCTechTalks25
Exploring the HSLS:09 Dataset
• Next, query the dataset
• Descriptive Statistics
The Download: Tech Talks #HPCCTechTalks26
Exploring the HSLS:09 Dataset
• Is Math Identity associated
with Math Self-efficacy?
• Sub-question identified: Is
this association different
between males and females?
The Download: Tech Talks #HPCCTechTalks27
Correlation Coefficient
Correlation between Math Identity and Math Self Efficacy 0.6303
Correlation between Math Identity and Math Self Efficacy of Males 0.6237
Correlation between Math Identity and Math Self Efficacy of Females 0.6381
Data Visualization (Scatter Plots)
The Download: Tech Talks #HPCCTechTalks28
Data Interpretations
• The effect of seeing oneself as a math person or being seen as a math person
is associated with increased confidence in math
• Effect is stronger for females than for males
• The higher the level of a student’s identification with math, the higher the
confidence to succeed in math. This can be a strong factor in students’
decisions to enroll in STEM programs.
• *Correlation does not imply causation*
The Download: Tech Talks #HPCCTechTalks29
Quick poll:
Would you consider using HPCC Systems
for exploratory data analysis?
See poll on bottom of presentation screen
Questions?
Itauma Itauma
PhD Candidate
Keiser University
amightyo@gmail.com
https://www.keiseruniversity.edu
The Download: Tech Talks #HPCCTechTalks31
Big Data and Geospatial with HPCC Systems®
Ignacio Calvo
Software Engineering Lead
LexisNexis® Risk Solutions
Concepts in Geospatial
How to use them with HPCC Systems
Use cases
#HPCCTechTalks
An approach to applying statistical
analysis and other analytic techniques
to data which has a geographical or
spatial aspect
Definition
Why?
• Insights
• Market segmentation
• IoT
• Satellite images
• Risk analysis
Quick poll:
Do you have any kind of
geospatial information or addresses
in your datasets?
See poll on bottom of presentation screen
Origin of Geospatial
John Snow’s original map (1854),
using GIS to save lives. This map
was used to determine that
Cholera was water-borne
Need to know :
• Format
• Projection / coordinate system
Understanding the data
Formats : Vector vs Raster
Vector Raster
Quick poll:
What’s the size of
Greenland
compared with
Africa and Australia?
See poll on bottom of
presentation screen
What is a projection?
Projections are used to represent the world in ways
we can process
•The Earth is round and maps are flat
•Physical Maps
•Computer Maps
What is a projection?
Have I seen projections before?
•Peter vs Mercator vs Winkel tripel
•GPS (latitude/longitude)
•Google Maps
Two different projections representing the same place.
Projections
Lies, damned lies, statistics… and maps!
WGS84
•Latitude and longitude
•Our best approximation of the world
•Not always the best for a specific region
•Not technically a projection
Projections to know about
Mercator
•Many different ones, choose one based on your location
•Reduces the area it covers to a simple Cartesian plane
•Good near the central axis, bad far away from it :
• Web Mercator covers the whole world – good near equator, gets worse as you travel north or
south
• NAD83 / Georgia East, British National Grid, Irish National Grid…
Very good for that territory, awful anywhere else
Number one bug in Geospatial
*http://twcc.fr
Number one bug in Geospatial
Latitude
Longitude
X
Y
LatY LonX
Now I understand my data, what’s next?
Data Ingest Index Query
Bringing Geospatial into HPCC Systems
GOAL
Bring our geospatial processes
into the realm of Big Data
STEPS
Spatial filtering of vector geometries
Spatial operations using vector geometries
Spatial reference projection and transformation
Reading of compressed geo-raster files
Big Data
Extend HPCC Systems and ECL to support the following
main capabilities :
STEPS
Big Data
Integration of open source libraries
Ingesting vector data
It’s a CSV file.
Id Name Geometry Projection Value
1 Alice’s
place
POINT (53.78925462 -6.08354321) 4326* €5,973,000
2 Bob’s place POINT (-34.78925462 7.08354321) 4326 €872,000
3 Celine’s
place
POINT (102.78925462 -6.08354321) 4326 €9,324,000
* WGS84 (Lat/Lon)
3.
Peril tag
2.
Geocode address
1.
Policy data
Data ready to
ingest
Ingesting vector data
It’s a GML / XML file.
3.
Process and index
2.
Parse XPATH
1.
Shape data
Data ready to
query
Ingesting vector data
It’s a GML / XML file.
3.
Process and index
2.
Parse XPATH
1.
Shape data
Data ready to
query
Ingesting vector data
It’s a GML / XML file.
3.
Process and index
2.
Parse XPATH
1.
Shape data
Data ready to
query
Indexing vector data
Find the data structure that suits you best
• Rtree
• Quadkey
• Geohash
Indexing vector data
Rtree
• Outline Box: Biggest rectangle
• Boxes contain boxes
• Bottom box in the tree contains actual
geometries
• Here, 3 levels pictured
• Boxes can overlap (entries are only in one)
Querying vector data
Searching an R-Tree: e.g. Finding all buildings (points) inside a flood zone (polygon)
Does the query polygon overlap our box?
Return empty list
Search our boxes’
children
Is it a leaf node?
Return all nodes
for verification
Y
N
Y
N
Ingesting raster data
It’s a raster / TIFF file. Bitmap image
3.
Process and index
2.
Tile and spray
1.
Raster data
Data ready to
query
Ingesting raster data
3.
Process and index
2.
Tile and spray
1.
Raster data
Data ready to
query
Tiling divides raster images into
small manageable areas of known
dimensions.
These tiles have their own
metadata:
• Bounding box
• Grid position
Ingesting raster data
3.
Process and index
2.
Tile and spray
1.
Raster data
Data ready to
query
1. Figure out which grid position the
geometry needs
2. Extract the required pixel
3. Interrogate the pixel for its value
4. Interpret its value
5. Return to user
Ingesting raster data
It’s a raster / TIFF file. Bitmap image
3.
Process and index
2.
Tile and spray
1.
Raster data
Data ready to
query
Ingesting raster data
It’s a raster / TIFF file.
3.
Process and index
2.
Tile and spray
1.
Raster data
Data ready to
query
Bringing it all together
*Andrew Farrell
In pursuit of perils : Geo-spatial risk analysis through HPCC Systems
https://hpccsystems.com/resources/blog/afarrell/pursuit-perils-geo-spatial-risk-analysis-
through-hpcc-systems
Add even more value
Why Geospatial with HPCC Systems?
• Efficient parallel processing
• Ability to import libraries from different languages
• Good coverage of functions and spatial predicates
• Fast ingestion
• Support for different formats
• Sub-second queries
Questions?
Ignacio Calvo
Software Engineering Lead
LexisNexis Risk Solutions
Ignacio.Calvo@lexisnexisrisk.com
The Download: Tech Talks #HPCCTechTalks70
ECL Tip: The Top Ten Common ECL
Compiler/Runtime Errors, and how to correct them
Bob Foreman
Senior Software Engineer
LexisNexis Risk Solutions
Quick poll:
What do you think about the ECL
Compiler messages?
See poll on bottom of presentation screen
Background
• During the many years of ECL training classes, it was discovered that many
developers encounter the same errors while learning ECL.
• Many of these errors are easy fixes, but it is important to understand what
the error message is saying and what in turn needs to be corrected.
• Errors fall into two categories, compiler and runtime.
• Compiler errors are related to syntax or improper references to other definitions.
• Runtime (or system) errors are errors that prevent a submitted workunit from
completing, and these are often easily corrected.
• Presenting the Top Ten ECL Compiler/Runtime (System) Errors:
The Download: Tech Talks #HPCCTechTalks73
Number 10 – The Workunit Assassin
Text:
Error: System error: 10056: THOR ABORT
Type:
Runtime (System)
Cause:
Somebody killed (aborted) your workunit!
Fix:
Find out who killed you and why, then restart your workunit when all clear
The Download: Tech Talks #HPCCTechTalks74
Number 9 – Unfriended Node
The Download: Tech Talks #HPCCTechTalks75
Text:
Error: System error: 4: MP link closed (<ip address>:<port>)
Type:
Runtime (System), MP is Message Passing
Cause:
Out of memory (OOM), network issue, hardware fault, or version bug.
Fix:
Review your slave log and syslog, configuration, C++ leak. If problem
persists, open an issue in Jira.
Number 8 – Local Limbo
The Download: Tech Talks #HPCCTechTalks76
Text:
Error: Compile/Link failed for <pathL<workunit number>
Type:
Compiler
Cause:
You lost connection with your cluster, and the target has reverted to a
Local target.
Fix:
Restart your ECL IDE and verify cluster connection.
Number 7 – Missing Data Pieces (TIE)
The Download: Tech Talks #HPCCTechTalks77
Text:
Error: Need to supply a value for field <fieldname>
Error: Transform does not supply a value for field "SELF.<fieldname>"
Type:
Compiler
Cause:
In TABLE, your field is missing or field requires a default value.
In TRANSFORM, one or more SELF.field definition(s) missing.
Fix:
Add the default value to table, and make sure your field is referenced
properly in the TRANSFORM
Number 6 – Divide and Conquer
The Download: Tech Talks #HPCCTechTalks78
Text:
System error: 0: Graph graph1[1], dedup[3]: Global DEDUP,ALL is not
supported
Type:
Runtime (System)
Cause:
Some intensive ECL operations require breaking down the job into smaller
pieces to run more efficiently.
Fix:
GROUP your target DEDUP recordset
Number 5 – Dataset Hide and Seek
The Download: Tech Talks #HPCCTechTalks79
Text:
System error: 10001: Graph graph1[1], Missing logical file <filename>
Type:
Runtime (System)
Cause:
The filename you entered in the DATASET declaration does not match the
name of the file you sprayed.
Fix:
Find and correct your typo, check for proper use of the tilde (~).
Number 4 – No Dataset to Read!
The Download: Tech Talks #HPCCTechTalks80
Text:
Error: file.<fieldname> - no specified row for Table file
Type:
Compiler
Cause:
The code is trying to reference a field value from a single record when the
only thing in scope is the entire dataset, or a field may be out of scope in a
parent/child denormalized dataset.
Fix:
Definition needs to be modified to retrieve a single record in scope.
Number 3 – Data Imposters! (TIE)
The Download: Tech Talks #HPCCTechTalks81
Text:
Error: System error: 0: Dataset layout does not match published layout
for file <filename>
Error: System error: 0: Published record size # for file <filename> does
not match coded record size #
Type:
Runtime (System)
Cause:
Your RECORD structure definition does not exactly match the metadata
RECORD structure the DFU has for that dataset.
Fix:
Correct field name, position, or value type.
Number 2 - Action Retraction
The Download: Tech Talks #HPCCTechTalks82
Text:
Error: Definition contains actions after the EXPORT has been defined
Type:
Compiler
Cause:
Your ECL code contains an action (explicit or implicit) following an
EXPORTed definition.
Fix:
Remove either the action or the EXPORT.
Number 1 – MODULE Mayhem!
The Download: Tech Talks #HPCCTechTalks83
Text:
Warning: (1,0): error C2386: Module <module name> does not EXPORT
an attribute main()
Type:
Runtime (System)
Cause:
Your MODULE has multiple exports. You need to tell the compiler which
one you want to run.
Fix:
Use a Builder window or BWR file to explicitly drilldown to the definition
you need. You could also rename one EXPORT definition as “Main” (not
recommended).
Honorable Mention – Warning Worries
The Download: Tech Talks #HPCCTechTalks84
Text:
WARNING: Compiler/Server mismatch:
Compiler: 6.4.2 community_6.4.2-1
Server: community_6.4.8-
Cause:
Compiler referenced in ECL IDE does not match the server version.
Fix:
Update your ECL IDE or your cluster version as appropriate.
WARNING: SOAP 1.1 fault: SOAP-ENV:Client[no subcode]
"An HTTP processing error occurred“
Detail: [no detail]
Cause:
Your cluster is not using a shared repository.
Fix:
This warning can be safely ignored if you know you are using a local repository.
Summary – The Top Ten
1. Warning: (1,0): error C2386: Module <module name> does not EXPORT an attribute main() (0, 0), 0,
2. Error: Definition contains actions after the EXPORT has been defined (2, 1), 2325,
3. Error: System error: 0: Dataset layout does not match published layout for file <filename> (0, 0), 0,
3. Error: System error: 0: Published record size 29 for file <filename> does not match coded record size 32 (0, 0), 0,
4. Error: file.<fieldname> - no specified row for Table file (4, 1), 2131, <ECL File and Local Path>
5. Error: System error: 10001: Graph graph1[1], Missing logical file <filename> (0, 0), 10001,
6. Error: System error: 0: Graph graph1[1], dedup[3]: Global DEDUP,ALL is not supported (0, 0), 0,
7. Error: Need to supply a value for field <fieldname> (9, 50), 2170, (tables)
7. Error: Transform does not supply a value for field "SELF.<fieldname>" (15, 1), 2111,
8. Error: Compile/Link failed for <pathL<workunit number>
9. Error: System error: 4: MP link closed (10.194.96.16:6600)
10. Error: System error: 10056: THOR ABORT
Honorable mention:
WARNING: Compiler/Server mismatch:
Compiler: 6.4.2 community_6.4.2-1
Server: community_6.4.8-
WARNING: SOAP 1.1 fault: SOAP-ENV:Client[no subcode]
"An HTTP processing error occurred"
Detail: [no detail]
The Download: Tech Talks #HPCCTechTalks85
Summary
• Many compiler errors are common to everyone and can be easily analyzed.
• As time goes on, your exposure to these common errors will point to quick
and easy solutions.
• Knowing what to do and where to go when you can’t decipher a message is
critical for productivity.
The Download: Tech Talks #HPCCTechTalks86
Quick poll:
Out of the top ten messages just
presented, how many have you
personally experienced?
See poll on bottom of presentation screen
Questions?
Bob Foreman
Senior Software Engineer
LexisNexis Risk Solutions
Robert.Foreman@lexisnexisrisk.com
The Download: Tech Talks #HPCCTechTalks88
• Have a new success story to share?
• Want to pitch a new use case?
• Have a new HPCC Systems application you want to demo?
• Want to share some helpful ECL tips and sample code?
• Have a new suggestion for the roadmap?
• Be a featured speaker for an upcoming episode! Email your idea to
Techtalks@hpccsystems.com
• Visit The Download Tech Talks wiki for more information:
https://wiki.hpccsystems.com/display/hpcc/HPCC+Systems+Tech+Talks
Mark your calendar for the April 19 Tech Talk!
Topics include Developing A Custom, Pluggable HPCC Systems Security Manager
Watch our Events page for details.
Submit a talk for an upcoming episode!
89 The Download: Tech Talks #HPCCTechTalks
A copy of this presentation will be made available soon on our blog:
hpccsystems.com/blog
Thank You!

Weitere ähnliche Inhalte

Was ist angesagt?

How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceJuuso Parkkinen
 
Lifecycle of a Data Science Project
Lifecycle of a Data Science ProjectLifecycle of a Data Science Project
Lifecycle of a Data Science ProjectDigital Vidya
 
Data science as a professional career
Data science as a professional careerData science as a professional career
Data science as a professional careerDavid Rostcheck
 
Building successful data science teams
Building successful data science teamsBuilding successful data science teams
Building successful data science teamsVenkatesh Umaashankar
 
John Eberhardt NSTAC Testimony
John Eberhardt NSTAC TestimonyJohn Eberhardt NSTAC Testimony
John Eberhardt NSTAC TestimonyJohn Eberhardt
 
Business Models - Introduction to Data Science
Business Models -  Introduction to Data ScienceBusiness Models -  Introduction to Data Science
Business Models - Introduction to Data ScienceFrank Kienle
 
If you can't beat em, join em
If you can't beat em, join emIf you can't beat em, join em
If you can't beat em, join emJohn Eberhardt
 
What data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsWhat data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsHugo Bowne-Anderson
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceANOOP V S
 
Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First CourseArnab Majumdar
 
Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientistVijayMohan Vasu
 
8 minute intro to data science
8 minute intro to data science 8 minute intro to data science
8 minute intro to data science Mahesh Kumar CV
 
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI Webina...
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI  Webina...Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI  Webina...
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI Webina...Pistoia Alliance
 
Building the Data Science Profession in Europe
Building the Data Science Profession in EuropeBuilding the Data Science Profession in Europe
Building the Data Science Profession in EuropeSteven Miller
 
Deploying Open Learning Analytics at a National Scale
Deploying Open Learning Analytics at a National ScaleDeploying Open Learning Analytics at a National Scale
Deploying Open Learning Analytics at a National Scalemichaeldwebb
 
BIG DATA ANALYTICS-OPPORTUNITIES,CHALLENGES AND THE FUTURE:CERTIFICATE OF ACH...
BIG DATA ANALYTICS-OPPORTUNITIES,CHALLENGES AND THE FUTURE:CERTIFICATE OF ACH...BIG DATA ANALYTICS-OPPORTUNITIES,CHALLENGES AND THE FUTURE:CERTIFICATE OF ACH...
BIG DATA ANALYTICS-OPPORTUNITIES,CHALLENGES AND THE FUTURE:CERTIFICATE OF ACH...LOS BANOS NATIONAL HIGH SCHOOL
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI dayMohammed Barakat
 

Was ist angesagt? (20)

How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data Science
 
Lifecycle of a Data Science Project
Lifecycle of a Data Science ProjectLifecycle of a Data Science Project
Lifecycle of a Data Science Project
 
Data science as a professional career
Data science as a professional careerData science as a professional career
Data science as a professional career
 
Building successful data science teams
Building successful data science teamsBuilding successful data science teams
Building successful data science teams
 
John Eberhardt NSTAC Testimony
John Eberhardt NSTAC TestimonyJohn Eberhardt NSTAC Testimony
John Eberhardt NSTAC Testimony
 
Business Models - Introduction to Data Science
Business Models -  Introduction to Data ScienceBusiness Models -  Introduction to Data Science
Business Models - Introduction to Data Science
 
If you can't beat em, join em
If you can't beat em, join emIf you can't beat em, join em
If you can't beat em, join em
 
What data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsWhat data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientists
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First Course
 
Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientist
 
Introduction to Data Science by Datalent Team @Data Science Clinic #9
Introduction to Data Science by Datalent Team @Data Science Clinic #9Introduction to Data Science by Datalent Team @Data Science Clinic #9
Introduction to Data Science by Datalent Team @Data Science Clinic #9
 
8 minute intro to data science
8 minute intro to data science 8 minute intro to data science
8 minute intro to data science
 
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI Webina...
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI  Webina...Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI  Webina...
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI Webina...
 
Building the Data Science Profession in Europe
Building the Data Science Profession in EuropeBuilding the Data Science Profession in Europe
Building the Data Science Profession in Europe
 
Deploying Open Learning Analytics at a National Scale
Deploying Open Learning Analytics at a National ScaleDeploying Open Learning Analytics at a National Scale
Deploying Open Learning Analytics at a National Scale
 
NLP & ML Webinar
NLP & ML WebinarNLP & ML Webinar
NLP & ML Webinar
 
BIG DATA ANALYTICS-OPPORTUNITIES,CHALLENGES AND THE FUTURE:CERTIFICATE OF ACH...
BIG DATA ANALYTICS-OPPORTUNITIES,CHALLENGES AND THE FUTURE:CERTIFICATE OF ACH...BIG DATA ANALYTICS-OPPORTUNITIES,CHALLENGES AND THE FUTURE:CERTIFICATE OF ACH...
BIG DATA ANALYTICS-OPPORTUNITIES,CHALLENGES AND THE FUTURE:CERTIFICATE OF ACH...
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
 

Ähnlich wie The Download: Tech Talks by the HPCC Systems Community, Episode 12

A Hybrid Approach to Data Science Project Management
A Hybrid Approach to Data Science Project ManagementA Hybrid Approach to Data Science Project Management
A Hybrid Approach to Data Science Project ManagementElaine K. Lee
 
Visualizing Healthcare Data with Tableau (Toronto Central LHIN Presentation)
Visualizing Healthcare Data with Tableau (Toronto Central LHIN Presentation)Visualizing Healthcare Data with Tableau (Toronto Central LHIN Presentation)
Visualizing Healthcare Data with Tableau (Toronto Central LHIN Presentation)Stefan Popowycz
 
The Download: Tech Talks by the HPCC Systems Community, Episode 11
The Download: Tech Talks by the HPCC Systems Community, Episode 11The Download: Tech Talks by the HPCC Systems Community, Episode 11
The Download: Tech Talks by the HPCC Systems Community, Episode 11HPCC Systems
 
Kendall7e ch01
Kendall7e ch01Kendall7e ch01
Kendall7e ch01sayAAhmad
 
Analisis dan Perancangan Sistem - 1 - Kendall7e ch01
Analisis dan Perancangan Sistem - 1 - Kendall7e ch01Analisis dan Perancangan Sistem - 1 - Kendall7e ch01
Analisis dan Perancangan Sistem - 1 - Kendall7e ch01Ullum Pratiwi
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and PlacementAkhilGGM
 
Scientific Software Challenges and Community Responses
Scientific Software Challenges and Community ResponsesScientific Software Challenges and Community Responses
Scientific Software Challenges and Community ResponsesDaniel S. Katz
 
Patterns for Successful Data Science Projects (Spark AI Summit)
Patterns for Successful Data Science Projects (Spark AI Summit)Patterns for Successful Data Science Projects (Spark AI Summit)
Patterns for Successful Data Science Projects (Spark AI Summit)Bill Chambers
 
Road to rockstar system analyst
Road to rockstar system analystRoad to rockstar system analyst
Road to rockstar system analystMizno Kruge
 
Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)SayyedYusufali
 
Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)SayyedYusufali
 
Data science training in hydpdf converted (1)
Data science training in hydpdf  converted (1)Data science training in hydpdf  converted (1)
Data science training in hydpdf converted (1)SayyedYusufali
 
D1: The NMC Methodology
D1: The NMC MethodologyD1: The NMC Methodology
D1: The NMC Methodologylisbk
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?DIGITALSAI1
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification courseKumarNaik21
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)SayyedYusufali
 

Ähnlich wie The Download: Tech Talks by the HPCC Systems Community, Episode 12 (20)

A Hybrid Approach to Data Science Project Management
A Hybrid Approach to Data Science Project ManagementA Hybrid Approach to Data Science Project Management
A Hybrid Approach to Data Science Project Management
 
Data-X-Sparse-v2
Data-X-Sparse-v2Data-X-Sparse-v2
Data-X-Sparse-v2
 
Data-X-v3.1
Data-X-v3.1Data-X-v3.1
Data-X-v3.1
 
Visualizing Healthcare Data with Tableau (Toronto Central LHIN Presentation)
Visualizing Healthcare Data with Tableau (Toronto Central LHIN Presentation)Visualizing Healthcare Data with Tableau (Toronto Central LHIN Presentation)
Visualizing Healthcare Data with Tableau (Toronto Central LHIN Presentation)
 
The Download: Tech Talks by the HPCC Systems Community, Episode 11
The Download: Tech Talks by the HPCC Systems Community, Episode 11The Download: Tech Talks by the HPCC Systems Community, Episode 11
The Download: Tech Talks by the HPCC Systems Community, Episode 11
 
Kendall7e ch01
Kendall7e ch01Kendall7e ch01
Kendall7e ch01
 
Analisis dan Perancangan Sistem - 1 - Kendall7e ch01
Analisis dan Perancangan Sistem - 1 - Kendall7e ch01Analisis dan Perancangan Sistem - 1 - Kendall7e ch01
Analisis dan Perancangan Sistem - 1 - Kendall7e ch01
 
SDLC 21.11.2022.pdf
SDLC 21.11.2022.pdfSDLC 21.11.2022.pdf
SDLC 21.11.2022.pdf
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and Placement
 
Scientific Software Challenges and Community Responses
Scientific Software Challenges and Community ResponsesScientific Software Challenges and Community Responses
Scientific Software Challenges and Community Responses
 
Patterns for Successful Data Science Projects (Spark AI Summit)
Patterns for Successful Data Science Projects (Spark AI Summit)Patterns for Successful Data Science Projects (Spark AI Summit)
Patterns for Successful Data Science Projects (Spark AI Summit)
 
Data Scientists
 Data Scientists Data Scientists
Data Scientists
 
Road to rockstar system analyst
Road to rockstar system analystRoad to rockstar system analyst
Road to rockstar system analyst
 
Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)
 
Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)
 
Data science training in hydpdf converted (1)
Data science training in hydpdf  converted (1)Data science training in hydpdf  converted (1)
Data science training in hydpdf converted (1)
 
D1: The NMC Methodology
D1: The NMC MethodologyD1: The NMC Methodology
D1: The NMC Methodology
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 

Mehr von HPCC Systems

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...HPCC Systems
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsHPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsHPCC Systems
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn HPCC Systems
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingHPCC Systems
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle ChangesHPCC Systems
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index HPCC Systems
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningHPCC Systems
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesHPCC Systems
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsHPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch HPCC Systems
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem HPCC Systems
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis ToolHPCC Systems
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony HPCC Systems
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterHPCC Systems
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...HPCC Systems
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...HPCC Systems
 

Mehr von HPCC Systems (20)

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex Systems
 
Welcome
WelcomeWelcome
Welcome
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon Cutting
 
Path to 8.0
Path to 8.0 Path to 8.0
Path to 8.0
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle Changes
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine Learning
 
Docker Support
Docker Support Docker Support
Docker Support
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network Capabilities
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis Tool
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL Neater
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
 

Kürzlich hochgeladen

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Kürzlich hochgeladen (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

The Download: Tech Talks by the HPCC Systems Community, Episode 12

  • 1. The Download: Community Tech Talks Episode 12 March 15, 2018
  • 2. Welcome! • Please share: Let others know you are here with #HPCCTechTalks • Ask questions! We will answer as many questions as we can following each speaker. • Look for polls at the bottom of your screen. Exit full-screen mode or refresh your screen if you don’t see them. • We welcome your feedback - please rate us before you leave today and visit our blog for information after the event. • Want to be one of our featured speakers? Let us know! techtalks@hpccsystems.com The Download: Tech Talks #HPCCTechTalks2
  • 3. Watch for Details Announced Soon! Community announcements 3 Dr. Flavio Villanustre VP Technology RELX Distinguished Technologist LexisNexis® Risk Solutions Flavio.Villanustre@lexisnexisrisk.com The Download: Tech Talks #HPCCTechTalks • HPCC Systems® Platform updates • 6.4.12 is the latest gold version / Community Changelog • 7.0.0 Beta planned for early Q2 – among the key features: • Spark integration • Indexer • Record Translation • Session Management Improvements • VS Code Beta version • Roadmap items for 2018 and beyond • New Case Study • 3LOQ leverages HPCC Systems in their Habitual AI solution • Latest Blogs • Tips and Tricks for ECL – Part 2 - PARSE • Fly on the wall at our first Hackathon • Reminder: 2018 Summer Internship Proposal Period Open Through April 6, 2018 • Interested candidates can submit proposals from the Ideas List • Program runs late May through mid August • Visit the Student Wiki for more details 2018 HPCC Systems Community Day
  • 4. Coming soon - 10K Trees Campaign for Earth Day 4 The Download: Tech Talks #HPCCTechTalks World Planting Day, March 21 through Earth Day on April 22 • Help us help the environment on behalf of our community! • HPCC Systems is dedicated to the environment and is giving you the opportunity to take action and be a small part of a big impact. • HPCC Systems, partnering with the National Forest Foundation, is growing and promoting awareness of environmental sustainability with their 10,000 Trees challenge.
  • 5. Today’s speakers 5 The Download: Tech Talks #HPCCTechTalks Itauma Itauma PhD Candidate, Keiser University amightyo@gmail.com Itauma Itauma is a doctoral candidate at Keiser University and a computer science instructor at Wayne State University. His interests lie in learning analytics and utilizing HPCC Systems for educational research. He has an undergraduate degree in Electrical Engineering from the University of Ilorin and two Masters Degrees, a Master of Science in Computer Engineering from Istanbul Technical University, majoring in human-robot interaction and a Master of Science in Computer Science from Wayne State University where his thesis was based on leveraging HPCC Systems for Big Data analytics. Featured Community Speaker
  • 6. Today’s speakers 6 The Download: Tech Talks #HPCCTechTalks Ignacio Calvo Software Engineering Lead LexisNexis Risk Solutions Ignacio.Calvo@lexisnexisrisk.com Ignacio is a Software Engineering Lead with 17 years of experience in the development of IT projects for different markets (insurance, finance, telecom, retailing). He has been working for 5 years in LexisNexis creating Big Data solutions with geospatial capabilities using HPCC Systems. He is the organizer of the HPCC Systems meetup group in Dublin and a CoderDojo mentor. Bob Foreman Senior Software Engineer LexisNexis Risk Solutions Robert.Foreman@lexisnexisrisk.com Bob Foreman has worked with the HPCC Systems technology platform and the ECL programming language for over 5 years, and has been a technical trainer for over 25 years. He is the developer and designer of the HPCC Systems Online Training Courses, and is the Senior Instructor for all classroom and Webex/Lync based training.
  • 7. Conducting exploratory data analysis in educational research using HPCC Systems® Itauma Itauma PhD Candidate Keiser University
  • 8. Quick poll: How strongly correlated do you think identification with math, and confidence in the ability to succeed in math are? See poll on bottom of presentation screen
  • 9. Outline The Download: Tech Talks #HPCCTechTalks9 • What is Exploratory Data Analysis (EDA)? • Why is EDA Important? • Techniques, Types, and Steps • Role in Educational Research • The HPCC Systems Advantage in Educational Research • Data Visualization Examples • Exploring the HSLS:09 Dataset
  • 10. What is Exploratory Data Analysis (EDA)? • Broad open minded overview of data • Converts data from its raw form to a form that makes sense • Allows the data to speak for itself with no assumptions made • No rigidity with rules • An important first step in data analysis The Download: Tech Talks #HPCCTechTalks10
  • 11. Exploratory Data Analysis • Consists of: • Organizing and summarizing raw data • Looking for important features and patterns in the data • Looking for any striking deviations from any pattern found • Interpreting findings in the context of the research question The Download: Tech Talks #HPCCTechTalks11
  • 12. Why is Exploratory Data Analysis Important? • Gain new insight • Explore data structures • Detect missing data • Check significant variables • Examine relationships between variables • Select an appropriate model • Check model assumptions The Download: Tech Talks #HPCCTechTalks12
  • 13. Importance of Exploratory Data Analysis • Summarizes data • Often reveals new ways to think about data. • Helps in refining research questions and sometimes reveals new questions. • After EDA, we are able to ask specific questions of our data The Download: Tech Talks #HPCCTechTalks13
  • 14. Techniques of EDA • Usually graphical • May be combined with quantitative techniques. • Visualization helps to discover data patterns. • raw data plots such as traces, histograms, and probability plots; • simple statistics plots such as mean plots, standard deviation plots, and box plots. • No limitation to these techniques • A researcher can develop novel ways to visualize data The Download: Tech Talks #HPCCTechTalks14
  • 15. Types and Steps in Exploratory Data Analysis • Graphical vs Non-graphical • Univariate vs Multivariate • Examine one variable at a time • Summarize and then examine the distribution of variable(s) of interest. • What values the variables take • How often the variables take those values. • Can come up with different research questions and choose to analyze the data in different ways. • Data is so awesome and having a tool that makes it very easy to analyze makes it fun and exciting. The Download: Tech Talks #HPCCTechTalks15
  • 16. Exploratory Data Analysis • Statistics: collects data, summarizes data, and interprets data • Statistics plays a significant role in social sciences which includes the field of education. Converts data into useful information. • EDA= Data Visualization + Statistics = Better data decision making The Download: Tech Talks #HPCCTechTalks16
  • 17. Educational Research • Systematic and organized inquiry applied to collecting, analyzing, and reporting information that addresses educational problems and questions (McMillan, 2015) • Describe • Predict • Improve • Explain • Important for the advancement of knowledge in the field of education The Download: Tech Talks #HPCCTechTalks17
  • 18. Machine Learning vs Statistical Learning: The HPCC Systems Advantage in Educational Research • Era of big data, learning analytics and personalized learner experience • Machine learning needed to build systems that learn from data • Learning analytics: The process of quantifying, analyzing, and reporting learner data to discover patterns and enhance learning to improve learner performance (Siemens & Baker, 2012) • Growing data collection from learning platforms and devices to create personalized data-driven learning programs for learner success The Download: Tech Talks #HPCCTechTalks18
  • 19. Machine Learning vs Statistical Learning: The HPCC Systems Advantage in Educational Research • Educational researchers need tools that can handle big data, and will benefit from the use of HPCC Systems. • Statistical learning is limited because of the need to develop a hypothesis and make assumptions about the data before building a model. • In machine learning, algorithms are flexible, run directly on the model, and outputs the requested features with the data speaking out for itself. The Download: Tech Talks #HPCCTechTalks19
  • 20. Machine Learning vs Statistical Learning: The HPCC Systems Advantage in Educational Research • Common statistical tools such as SPSS widely used in educational research, is limited in terms of scalability and big data. • HPCC Systems is open source, and can handle both data visualization and statistical analysis, all integrated in the platform. • The HPCC Systems Visualization Bundle provides visual representations of data analysis. • HPCC Systems can also perform simple descriptive & inferential statistics. The Download: Tech Talks #HPCCTechTalks20 https://github.com/hpcc-systems/Visualization
  • 21. Data Visualization in HPCC Systems • Visualization bundle is an open-source add- on to the HPCC Systems platform to allow the creation of visualizations from the results of queries written in ECL • Important means of conveying information from massive datasets • Pie Charts, Line graphs, Maps, and other visual graphs • Simplifies the complex • In addition, the underlying visualization framework supports advanced features to allow the combination of graphs to make interactive dashboards • Integration of Tableau in HPCC Systems (Alternative) The Download: Tech Talks #HPCCTechTalks21
  • 22. Data Visualization Examples • In a previous study, HPCC Systems ML correlation and regression modules were used to determine the strength of the correlation between chocolate consumption, life expectancy, and happiness. The Download: Tech Talks #HPCCTechTalks22
  • 23. Exploring the HSLS:09 Dataset • The US High School Longitudinal Study of 2009 (HSLS:09) is a national cohort study of over 23,000 ninth graders from 944 schools, in 2009, through their secondary and post- secondary years. • Focus of the HSLS:09 includes students’ trajectories from high school, and how students choose college majors and careers. The Download: Tech Talks #HPCCTechTalks23
  • 24. • Research Question: Is Math Identity associated with Math Self-efficacy? • Math identity: the level of a student's identification with math represented by agreements with the statements "You see yourself as a math person" and/or "Others see me as a math person". • Self-efficacy: the level of confidence a student has about the ability to succeed. • STEM (Science Technology Engineering Mathematics) • Let’s find out! Remember, EDA helps in refining research questions and sometimes reveals new questions. The Download: Tech Talks #HPCCTechTalks 24 Exploring the HSLS:09 Dataset
  • 25. Exploring the HSLS:09 Dataset • CSV file sprayed into the HPCC Systems cluster • Recordset filtering • Three features projected • X1MTHID –Math Identity • X1MTHEFF –Math Self-efficacy • X2SEX –Gender The Download: Tech Talks #HPCCTechTalks25
  • 26. Exploring the HSLS:09 Dataset • Next, query the dataset • Descriptive Statistics The Download: Tech Talks #HPCCTechTalks26
  • 27. Exploring the HSLS:09 Dataset • Is Math Identity associated with Math Self-efficacy? • Sub-question identified: Is this association different between males and females? The Download: Tech Talks #HPCCTechTalks27 Correlation Coefficient Correlation between Math Identity and Math Self Efficacy 0.6303 Correlation between Math Identity and Math Self Efficacy of Males 0.6237 Correlation between Math Identity and Math Self Efficacy of Females 0.6381
  • 28. Data Visualization (Scatter Plots) The Download: Tech Talks #HPCCTechTalks28
  • 29. Data Interpretations • The effect of seeing oneself as a math person or being seen as a math person is associated with increased confidence in math • Effect is stronger for females than for males • The higher the level of a student’s identification with math, the higher the confidence to succeed in math. This can be a strong factor in students’ decisions to enroll in STEM programs. • *Correlation does not imply causation* The Download: Tech Talks #HPCCTechTalks29
  • 30. Quick poll: Would you consider using HPCC Systems for exploratory data analysis? See poll on bottom of presentation screen
  • 31. Questions? Itauma Itauma PhD Candidate Keiser University amightyo@gmail.com https://www.keiseruniversity.edu The Download: Tech Talks #HPCCTechTalks31
  • 32. Big Data and Geospatial with HPCC Systems® Ignacio Calvo Software Engineering Lead LexisNexis® Risk Solutions
  • 33. Concepts in Geospatial How to use them with HPCC Systems Use cases #HPCCTechTalks
  • 34. An approach to applying statistical analysis and other analytic techniques to data which has a geographical or spatial aspect Definition
  • 35. Why? • Insights • Market segmentation • IoT • Satellite images • Risk analysis
  • 36. Quick poll: Do you have any kind of geospatial information or addresses in your datasets? See poll on bottom of presentation screen
  • 37.
  • 38. Origin of Geospatial John Snow’s original map (1854), using GIS to save lives. This map was used to determine that Cholera was water-borne
  • 39. Need to know : • Format • Projection / coordinate system Understanding the data
  • 40. Formats : Vector vs Raster Vector Raster
  • 41. Quick poll: What’s the size of Greenland compared with Africa and Australia? See poll on bottom of presentation screen
  • 42. What is a projection?
  • 43. Projections are used to represent the world in ways we can process •The Earth is round and maps are flat •Physical Maps •Computer Maps What is a projection? Have I seen projections before? •Peter vs Mercator vs Winkel tripel •GPS (latitude/longitude) •Google Maps
  • 44. Two different projections representing the same place. Projections
  • 45. Lies, damned lies, statistics… and maps!
  • 46. WGS84 •Latitude and longitude •Our best approximation of the world •Not always the best for a specific region •Not technically a projection Projections to know about Mercator •Many different ones, choose one based on your location •Reduces the area it covers to a simple Cartesian plane •Good near the central axis, bad far away from it : • Web Mercator covers the whole world – good near equator, gets worse as you travel north or south • NAD83 / Georgia East, British National Grid, Irish National Grid… Very good for that territory, awful anywhere else
  • 47. Number one bug in Geospatial *http://twcc.fr
  • 48. Number one bug in Geospatial Latitude Longitude X Y LatY LonX
  • 49. Now I understand my data, what’s next? Data Ingest Index Query
  • 50. Bringing Geospatial into HPCC Systems GOAL Bring our geospatial processes into the realm of Big Data
  • 51. STEPS Spatial filtering of vector geometries Spatial operations using vector geometries Spatial reference projection and transformation Reading of compressed geo-raster files Big Data Extend HPCC Systems and ECL to support the following main capabilities :
  • 52. STEPS Big Data Integration of open source libraries
  • 53. Ingesting vector data It’s a CSV file. Id Name Geometry Projection Value 1 Alice’s place POINT (53.78925462 -6.08354321) 4326* €5,973,000 2 Bob’s place POINT (-34.78925462 7.08354321) 4326 €872,000 3 Celine’s place POINT (102.78925462 -6.08354321) 4326 €9,324,000 * WGS84 (Lat/Lon) 3. Peril tag 2. Geocode address 1. Policy data Data ready to ingest
  • 54. Ingesting vector data It’s a GML / XML file. 3. Process and index 2. Parse XPATH 1. Shape data Data ready to query
  • 55. Ingesting vector data It’s a GML / XML file. 3. Process and index 2. Parse XPATH 1. Shape data Data ready to query
  • 56. Ingesting vector data It’s a GML / XML file. 3. Process and index 2. Parse XPATH 1. Shape data Data ready to query
  • 57. Indexing vector data Find the data structure that suits you best • Rtree • Quadkey • Geohash
  • 58. Indexing vector data Rtree • Outline Box: Biggest rectangle • Boxes contain boxes • Bottom box in the tree contains actual geometries • Here, 3 levels pictured • Boxes can overlap (entries are only in one)
  • 59. Querying vector data Searching an R-Tree: e.g. Finding all buildings (points) inside a flood zone (polygon) Does the query polygon overlap our box? Return empty list Search our boxes’ children Is it a leaf node? Return all nodes for verification Y N Y N
  • 60. Ingesting raster data It’s a raster / TIFF file. Bitmap image 3. Process and index 2. Tile and spray 1. Raster data Data ready to query
  • 61. Ingesting raster data 3. Process and index 2. Tile and spray 1. Raster data Data ready to query Tiling divides raster images into small manageable areas of known dimensions. These tiles have their own metadata: • Bounding box • Grid position
  • 62. Ingesting raster data 3. Process and index 2. Tile and spray 1. Raster data Data ready to query 1. Figure out which grid position the geometry needs 2. Extract the required pixel 3. Interrogate the pixel for its value 4. Interpret its value 5. Return to user
  • 63. Ingesting raster data It’s a raster / TIFF file. Bitmap image 3. Process and index 2. Tile and spray 1. Raster data Data ready to query
  • 64. Ingesting raster data It’s a raster / TIFF file. 3. Process and index 2. Tile and spray 1. Raster data Data ready to query
  • 65. Bringing it all together *Andrew Farrell In pursuit of perils : Geo-spatial risk analysis through HPCC Systems https://hpccsystems.com/resources/blog/afarrell/pursuit-perils-geo-spatial-risk-analysis- through-hpcc-systems
  • 66. Add even more value
  • 67.
  • 68. Why Geospatial with HPCC Systems? • Efficient parallel processing • Ability to import libraries from different languages • Good coverage of functions and spatial predicates • Fast ingestion • Support for different formats • Sub-second queries
  • 69.
  • 70. Questions? Ignacio Calvo Software Engineering Lead LexisNexis Risk Solutions Ignacio.Calvo@lexisnexisrisk.com The Download: Tech Talks #HPCCTechTalks70
  • 71. ECL Tip: The Top Ten Common ECL Compiler/Runtime Errors, and how to correct them Bob Foreman Senior Software Engineer LexisNexis Risk Solutions
  • 72. Quick poll: What do you think about the ECL Compiler messages? See poll on bottom of presentation screen
  • 73. Background • During the many years of ECL training classes, it was discovered that many developers encounter the same errors while learning ECL. • Many of these errors are easy fixes, but it is important to understand what the error message is saying and what in turn needs to be corrected. • Errors fall into two categories, compiler and runtime. • Compiler errors are related to syntax or improper references to other definitions. • Runtime (or system) errors are errors that prevent a submitted workunit from completing, and these are often easily corrected. • Presenting the Top Ten ECL Compiler/Runtime (System) Errors: The Download: Tech Talks #HPCCTechTalks73
  • 74. Number 10 – The Workunit Assassin Text: Error: System error: 10056: THOR ABORT Type: Runtime (System) Cause: Somebody killed (aborted) your workunit! Fix: Find out who killed you and why, then restart your workunit when all clear The Download: Tech Talks #HPCCTechTalks74
  • 75. Number 9 – Unfriended Node The Download: Tech Talks #HPCCTechTalks75 Text: Error: System error: 4: MP link closed (<ip address>:<port>) Type: Runtime (System), MP is Message Passing Cause: Out of memory (OOM), network issue, hardware fault, or version bug. Fix: Review your slave log and syslog, configuration, C++ leak. If problem persists, open an issue in Jira.
  • 76. Number 8 – Local Limbo The Download: Tech Talks #HPCCTechTalks76 Text: Error: Compile/Link failed for <pathL<workunit number> Type: Compiler Cause: You lost connection with your cluster, and the target has reverted to a Local target. Fix: Restart your ECL IDE and verify cluster connection.
  • 77. Number 7 – Missing Data Pieces (TIE) The Download: Tech Talks #HPCCTechTalks77 Text: Error: Need to supply a value for field <fieldname> Error: Transform does not supply a value for field "SELF.<fieldname>" Type: Compiler Cause: In TABLE, your field is missing or field requires a default value. In TRANSFORM, one or more SELF.field definition(s) missing. Fix: Add the default value to table, and make sure your field is referenced properly in the TRANSFORM
  • 78. Number 6 – Divide and Conquer The Download: Tech Talks #HPCCTechTalks78 Text: System error: 0: Graph graph1[1], dedup[3]: Global DEDUP,ALL is not supported Type: Runtime (System) Cause: Some intensive ECL operations require breaking down the job into smaller pieces to run more efficiently. Fix: GROUP your target DEDUP recordset
  • 79. Number 5 – Dataset Hide and Seek The Download: Tech Talks #HPCCTechTalks79 Text: System error: 10001: Graph graph1[1], Missing logical file <filename> Type: Runtime (System) Cause: The filename you entered in the DATASET declaration does not match the name of the file you sprayed. Fix: Find and correct your typo, check for proper use of the tilde (~).
  • 80. Number 4 – No Dataset to Read! The Download: Tech Talks #HPCCTechTalks80 Text: Error: file.<fieldname> - no specified row for Table file Type: Compiler Cause: The code is trying to reference a field value from a single record when the only thing in scope is the entire dataset, or a field may be out of scope in a parent/child denormalized dataset. Fix: Definition needs to be modified to retrieve a single record in scope.
  • 81. Number 3 – Data Imposters! (TIE) The Download: Tech Talks #HPCCTechTalks81 Text: Error: System error: 0: Dataset layout does not match published layout for file <filename> Error: System error: 0: Published record size # for file <filename> does not match coded record size # Type: Runtime (System) Cause: Your RECORD structure definition does not exactly match the metadata RECORD structure the DFU has for that dataset. Fix: Correct field name, position, or value type.
  • 82. Number 2 - Action Retraction The Download: Tech Talks #HPCCTechTalks82 Text: Error: Definition contains actions after the EXPORT has been defined Type: Compiler Cause: Your ECL code contains an action (explicit or implicit) following an EXPORTed definition. Fix: Remove either the action or the EXPORT.
  • 83. Number 1 – MODULE Mayhem! The Download: Tech Talks #HPCCTechTalks83 Text: Warning: (1,0): error C2386: Module <module name> does not EXPORT an attribute main() Type: Runtime (System) Cause: Your MODULE has multiple exports. You need to tell the compiler which one you want to run. Fix: Use a Builder window or BWR file to explicitly drilldown to the definition you need. You could also rename one EXPORT definition as “Main” (not recommended).
  • 84. Honorable Mention – Warning Worries The Download: Tech Talks #HPCCTechTalks84 Text: WARNING: Compiler/Server mismatch: Compiler: 6.4.2 community_6.4.2-1 Server: community_6.4.8- Cause: Compiler referenced in ECL IDE does not match the server version. Fix: Update your ECL IDE or your cluster version as appropriate. WARNING: SOAP 1.1 fault: SOAP-ENV:Client[no subcode] "An HTTP processing error occurred“ Detail: [no detail] Cause: Your cluster is not using a shared repository. Fix: This warning can be safely ignored if you know you are using a local repository.
  • 85. Summary – The Top Ten 1. Warning: (1,0): error C2386: Module <module name> does not EXPORT an attribute main() (0, 0), 0, 2. Error: Definition contains actions after the EXPORT has been defined (2, 1), 2325, 3. Error: System error: 0: Dataset layout does not match published layout for file <filename> (0, 0), 0, 3. Error: System error: 0: Published record size 29 for file <filename> does not match coded record size 32 (0, 0), 0, 4. Error: file.<fieldname> - no specified row for Table file (4, 1), 2131, <ECL File and Local Path> 5. Error: System error: 10001: Graph graph1[1], Missing logical file <filename> (0, 0), 10001, 6. Error: System error: 0: Graph graph1[1], dedup[3]: Global DEDUP,ALL is not supported (0, 0), 0, 7. Error: Need to supply a value for field <fieldname> (9, 50), 2170, (tables) 7. Error: Transform does not supply a value for field "SELF.<fieldname>" (15, 1), 2111, 8. Error: Compile/Link failed for <pathL<workunit number> 9. Error: System error: 4: MP link closed (10.194.96.16:6600) 10. Error: System error: 10056: THOR ABORT Honorable mention: WARNING: Compiler/Server mismatch: Compiler: 6.4.2 community_6.4.2-1 Server: community_6.4.8- WARNING: SOAP 1.1 fault: SOAP-ENV:Client[no subcode] "An HTTP processing error occurred" Detail: [no detail] The Download: Tech Talks #HPCCTechTalks85
  • 86. Summary • Many compiler errors are common to everyone and can be easily analyzed. • As time goes on, your exposure to these common errors will point to quick and easy solutions. • Knowing what to do and where to go when you can’t decipher a message is critical for productivity. The Download: Tech Talks #HPCCTechTalks86
  • 87. Quick poll: Out of the top ten messages just presented, how many have you personally experienced? See poll on bottom of presentation screen
  • 88. Questions? Bob Foreman Senior Software Engineer LexisNexis Risk Solutions Robert.Foreman@lexisnexisrisk.com The Download: Tech Talks #HPCCTechTalks88
  • 89. • Have a new success story to share? • Want to pitch a new use case? • Have a new HPCC Systems application you want to demo? • Want to share some helpful ECL tips and sample code? • Have a new suggestion for the roadmap? • Be a featured speaker for an upcoming episode! Email your idea to Techtalks@hpccsystems.com • Visit The Download Tech Talks wiki for more information: https://wiki.hpccsystems.com/display/hpcc/HPCC+Systems+Tech+Talks Mark your calendar for the April 19 Tech Talk! Topics include Developing A Custom, Pluggable HPCC Systems Security Manager Watch our Events page for details. Submit a talk for an upcoming episode! 89 The Download: Tech Talks #HPCCTechTalks
  • 90. A copy of this presentation will be made available soon on our blog: hpccsystems.com/blog Thank You!

Hinweis der Redaktion

  1. .
  2. Picture about Peter Vs Mercator – one for coastline, one for area, check out the sizes of Greenland and Africa Lesson: Projections distort the data!