1. EconomicGeology
Vol. 64, 1969, pp. 538-550
A SimplifiedStatisticalTreatmentof GeochemicalData
by GraphicalRepresentation
CLAUDE LEPELTIER
Abstract
Inthecourseofamineralexplo•-ationsponsoredbytheUnitedNationsDevelopment
Programmein two selectedzonesof Guatemala,a streamsedimentreconnaissancewas
carriedout,andgraphicalmethodsof interpretationwereattemptedin the searchfor a
simplifiedstatisticaltreatmentof about25,000geochemicalresults. The data were
groupedby drainageandlithologicalunits,andthefrequencydistributionsof theabun-
danceof Cu,Pb,Zn andMo werestudiedin theformof cumulativefrequencycurves.
The fourelementsappearto be approximatelylognormallydistributed.Background,
coefficientsof deviationandthresholdlevelsweregraphicallyestimated.Examplesare
givenof simpleandcomplexpopulations.Mineralassociationswerestudiedby correla-
tion diagrams.
Contents
PAGE
Introduction ................................. 538
Difficultyof the statisticalapproachin the caseof 539
stream sedimentsurvey ..................... 539
Adjustmentto a lognormaldistribution......... 539
Definitions ................................. 539
Constructionof the cumulativefrequencycurve 542
Comparisonwith histograms.................. 543
Informationgivenby cumulativefrequencycurves 544
Background ................................ 544
Deviation .................................. 544
Threshold .................................. 544
Examples .................................. 545
Advantagesof cumulativefrequencycurves .... 546
The coefficients of deviation ................... 546
Correlation diagrams ......................... 548
Conclusion .................................. 550
References ................................... 550
Introduction
Ti•, United Nations Mineral Exploration Pro-
grammein Guatemalareliedheavilyon geochemical
prospecting.During one year (1967) 60 percent
of thetotalProjectareawascoveredsystematically
by a geochemicalreconnaissancecarried out in the
drainagesystems. Nine thousandstreamsediment
samples were collected over about 12,000 km2
(roundedfigures). All the sampleswereanalyzed
for copperand zinc, and the total numberthinned
out to approximately4,000 before being run for
lead and molybder/um.Finally about25,000geo-
chemicalresultswere availablefor compilationand
interpretation.As theyaccumulated,it becameap-
parent that high-contrastanomalieswhich are obvi-
• This article is publishedwith the authorizationof the
United Nations. The opinionsexpressedare not necessarily
endorsedby this Organization.
ous targets for follow-up operationswould not be
encountered but rather more subtle features not so
easyto pinpointand interpret.
The interpretationphaseof the surveywas char-
acterizedby two essentialfeatures:the greatamount
of data to be analyzedand the lack of precisionof
these data.
Samplingand analyticalmethodsmust sacrifice
precisionfor speeddue to the natureof geochemical
prospecting,and the first consequenceof .this fact
is that an isolatedresult has little meaningin geo-
chemistry. It must be part of a populationas
numerousand homogeneousas possible.Indeedin
all kindsof phenomena,individualinaccuraciesshade
off progressivelywhen observationis extendedto
larger and larger populations.
The first phaseof geochemicalinterpretationis to
condenselarge massesof numerical data and ex-
tract from them the essential information. The most
objectiveand reliableway to do it (and sometimes
the only one) is statistically. Large setsof num-
bers,cumbersomeand difficultto interpret,may be
reducedto a usefulform by the use of descriptive
statistics.This is bestdoneby thegraphicalrepre-
sentationof the frequencydistributionof a given
set of data; then the averagevalue, an expression
of the degreeof variation aroundthe average,and
the limit above which the anomalies start are im-
mediatelyand preciselydeterminedas well as the
existenceof oneor severalpopulationsin the sur-
veyed area.
Thistreatmentof thedataalsosimplifiesthecom-
parisonof the geochemicalbehaviorof an element
in variousgeologicalsurroundingsor of several
elementsin the samelithologicalunit.
I amgratefulto Mr. HenryH. Meyer,Project
Manager of the Guatemalaand E1 SalvadorMineral
538
2. SIMPLIFIED STATISTICAL TREATMENT OF GEOCHEMICAL DATA 539
Surveys, and to Mr. Stephen S. Steinhauserfor
technicalcriticismand muchhelpfuldiscussion.
Difficulty of Statistical Approach in Stream
Sediment Surveys
A reliablestatisticalinterpretationrequiresthat
a great quantity of data be treated and that these
databehomogeneous.
In drainagereconnaissancesurveys,the first con-
dition is easily filled but not the second. As a
matterof fact, the importanceof samplingtechnique
is sometimesoverlookedin this type of prospecting.
But even if given the appropriateattention, too
manytypesof riversand too manylithologicalunits
are generallysampledto result in a homogeneous
collectionof samples. The best way to limit the
inconvenienceof the heterogeneityof the samples
(particularlypH, organiccontentandgrain size) is
to splitthesurveyareaintodrainagesandlithological
units, when possible,and to make the statistical
interpretationfor eachof them separately. How-
ever, evenif this is done,the samedegreeof pre-
cision cannot be achieved as in the case of a soil
surveywheregoodhomogeneityis possible.
Adjustment to a Lognormal Distribution
Definitions
When dealingwith a large massof geochemical
data,the first stepis to findwhat sortof distribution
pattern best fits the various sets of observations.
And, thus far, the lognormaldistributionpattern
appearsto be the onemostapplicableto the results
of mostgeochemicalsurveys(Ahrens, 1957).
In geochemicalprospecting,we studythe content
of trace elementsin various natural materials, and
to say that the valuesare lognormallydistributed
meansthat the logarithmsof thesevalues are dis-
tributed followinga normal law (or Gauss'law)
well known as the bell-shapedcurve (Monjallon,
1963).
Many natural or economicphenomenacan be
expressedby a value varying between zero and
infinity, representedby a skeweddistributioncurve.
If, insteadof the actual value of the variable itself,
we plot its logarithm in abscissae,the frequency
curvetakesa symmetrical,bell-shapedform, typical
of the normal distribution. This happenswhen a
phenomenonis subjectto a proportionaleffect,that
isto saywhenindependentinitialcausesof variations
of the studiedvaluetake effectin a multiplicative
way. It is the case,for instance,for the distribution
of trace elementsin rocks, for the area of the dif-
ferent countries of the world, for the income of
individualsin a country,for thegrainsizein samples
of sedimentaryrocks,andothers(Coulomb,1959;
Cousins,1956).
In all theseexamples,the characterstudiedfol-
lows the lognormallaw, which is probablymore
common than the normal one.
It is interestingto note here that the lognormal
law fits very well in the caseof low-gradedeposits
like gold but for high-gradedeposits,iron for in-
stance,the experimentaldistributionsare generally
negativelyskexvedbecauseof the limitationtowards
the high values. G. Matheron gives a thermo-
dynamicinterpretationof the proportionaleffectin
the caseof ore depositsand relatesit to the Mass
Action Law (Matheron, 1962). To the extentin
which geochemicalanomaliesare extrapolationsof
oredepositsthistheoryshouldapplyto geochemical
prospecting.
Constructionof the CumulativeFrequencyCurve
A lognormaldistributioncurveis definedby two
parameters:one dependenton the meanvalue,and
theotherdependenton thecharacterof value-distri-
bution. This latter parameteris a measureof the
rangeof distributionof values,that is whetherthe
distributioncoversa wide or narrow rangeof values.
The two parameterscan be determinedgraphically
as will be explainedon followingpages. For prac-
tical purposes,we work on cumulativefrequency
curves,and their constructionshallbe explainedby
meansof a concreteexample.
The various steps of this constructionare the
following:
(a) Selectionof a preciseset of data ("popula-
tion") as large and homogeneousas possible.
(b) Groupingof thevaluesintoan adequatenum-
ber of classes.
(c) Calculatingthe frequencyiof occurrencein
each classand plotting it againstthe classlimits;
thisgivesa diagramcalledthe "histogram."
(d) Smoothingthe histogram to get the fre-
quency curve.
(e) Plotting the cumulatedfrequenciesas ordi-
natesgives the cumulativefrequencycurve, which
is the integralof the frequencycurve.
(f) By replacingthe arithmeticordinate scale
with a probabilityscale the cumulativefrequency
curve is representedby .oneor more straightlines.
Examplesof lognormalfrequencycurvesare shown
in Figure 1.
Somebriefcommentsonthedifferentstepsfollow:
(a) The largerthepopulationto be analyzed,the
morepreciseandreliablethe results. If necessary,
asfewas50 valuesmaybetreatedstatisticallybut
3. 54O CL/I UDE LEPELTIER
Figure 1. Lognorm.i distributioncurves
Frequency,%
Value
Frequency,%
Arithmeticscale
Cumulatedfrequency• Q
0• Value
Logarithmicscale
Logarithmicscale
Cumulatedfrequency•.
99.99
,,
Lol•arithmicscale
Value
' ' , , •Value
the confidence limits must be calculated to see if
the analysisis meaningful.
(b) A correctgroupingof thevaluesismandatory
if someprecisionis to be achievedin the statistical
interpretation;too few classeswill resultin shading
out importantfeaturesof the curve; too many in
losing significantdetails amidst a cloud of erratic
ones. The results are distributed in classes,the
modulusof whichshouldbeproportionalto thepre-
cisionof theanalyses:the moreprecisethe analyses,
the smallerthe modulus. The logarithmicinterval
must be adaptedto the variation amplitudeof the
valuesand to the precisionof the analyticalmethods
(Miesh, 1967).
In statistics,workingwith 15 to 25 intervals(or
classes)is recommended.As a rule, the width of
a class,expressedlogarithmically,mustbekeptequal
to or smallerthanhalf of standarddeviation(Shaw,
1964).
For geochemicalpurposes,it is convenientto work
with 10 to 20 pointson the cumulativefrequency
line, that is to saywith 9 to 19 intervalsor classes.
There are three variables to consider: the number
of points(n) necessaryto constructa correctline;
the range of distributionof the values (R), ex-
pressedas the ratio of the highestto the lowest
valueof the population;andthe width of the classes
expressedlogarithmically(log. int.) which has to
be selectedin functionof the two first parameters.
Thesethreevariablesarelinkedby the relation:
log. int. -
log R
In mostof thecasesR variesfrom6 to 300 (experi-
mental averagevalues), then, with (n) varying
from 10 to 20, log R from 0.78 to 2.48, the extreme
valuesfor the logarithmicinterval will be:
0.78
log. int. - - 0.039
20
log. int. -
2.48
10
- O.25
The 0.10 wasselectedas the bestsuitedlogarithmic
interval for the classes because it suits most distri-
4. SIMPLIFIED STATISTICAL TREATMENT OF GEOCHEMICAL DATA 541
bution,giving reasonablenumberof classesand a
gooddefinitionof thecurve. In caseof veryreduced
dispersionof the values around the mean, it may
be necessaryto use 0.05, and if the dispersionis
speciallylarge, 0.2 will be chosen. When the
logarithmicintervalis selected,it is easyto calculate
a table giving the classlimits in ppm. The only
precautionis to avoid startingwith a round value
sothat no analyticalresultswill fall on the limit of
two classes. The most useful and commonlyem-
ployedin geochemicalwork is the 0.1 log. int. classs
table,a part of whichis givenbelow:
classlimit (log) .. 0.07, 0.17, 0.27, 0.37, 0.47, 0.57
classlimit (ppm) . 1.17,1.48,1.86,2.34,2.95,3.72
It can be extended in both directions as far as
necessary.
(c-d) After selectingthe classtable, the values
are groupedand the frequencycalculatedfor each
class (in percentage);then the frequenciesare
plotted againstthe class limits (the latter being
logarithmicallycalculated,ordinaryarithmetic-arith-
metic paper must be used), giving a histogram
whichis smoothedto a frequencycurve. But histo-
gramsare oftenmisleading,beingstronglyaffected
by slight changesin classintervals, and frequency
curves are difficult to draw and handle: for instance,
it is necessaryto determinethe inflexionpointsof
the curve in order to evaluate the standard deviation.
Practically,the histogram-frequencycurve step is
skippedand the cumulativefrequencydirectlycon-
structed. However, note here an advantageof the
histogram: it clearly illustrates the effect of the
sensitivityof the analyticalmethodand more pre-
ciselythe biasbroughtto the low valuesby the use
of colorimetric scales of standards. As a matter of
fact, experienceshows that there is an inevitable
concentrationof the readings,whoeverthe analyst,
onthevaluesactuallyrepresentedin thecolorimetric
scale. For instance,in the caseof copper,the lower
part of the standardcolorimetricscalereads0,2,4,7
ß . . ppm. Usually this resultsin an excessof 2,
4 and 7 values,and a conspicuouslack of 1, 3, 5
ppm values. This is of importancefor a correct
constructionof the frequencycurve, and the raw
valuesmustoftenbe correctedby extrapolatingthe
generalshapeof the curve.
(e-f) By plotting the cumulatedfrequenciesas
ordinatesinsteadof the frequencies,one obtainsthe
integralcurveof the preceding. It hasthe form of
a straightlinewhenusingtheappropriategraphpaper
(probability-log),and it is the one used in geo-
chemicalpresentationand interpretationof the re-
sults. Then two questionshave to be answered:
where to start accumulatingthe frequencies,and
where to plot the cumulatedfrequencies?
As for the first point, the normalprocedurefol-
lowedby many authorsis to start cumulatingthe
Ftgure 2 - CumulativelYequeu•yDtltributton for Zn and Cu
5o
5
2.5
0,5
O.X
0.3. 0.2 0.5 X 2
5. 542 CL•IUDE LEPELTIER
Figure3. Confidencelimits(Pl, P2}=t0.05probobllitylevel-
Cumulative
Frequency
inO/o Numberof
samples
150
4OO
5OO
2 000
5OOO
10 000
Source:A. t ie•zou,Initiatioupratiquei laslatistiwe,
GauthiefVillars,Parrs,1961.
frequenciesfrom the lowestvaluestoward the high-
est (Fig. 1) (Hubaux, 1961; Termantand White,
1959). However,one has to considera property
of the probabilityscaleusedas ordinates:the values
zero and 100% are rejectedat the infinite; it does
not matter for zero becausezero% never occurs,
but in each case the last cumulatedfrequencyis
100%, and this value is impossibleto plot, lost
for the curve. Then consideringthe lack of pre-
cisionin the low valuesand the importanceof the
high ones for the determination of the threshold
level, I considerit muchbetter to cumu,late the fre-
quenciesfrom the hi#hestto the lowestvalues;thus,
the 100% will correspondto the lowestclassand be
eliminated.
As for the secondpoint, the curve beingan in-
tegral one, the ordinates must be plotted at class
limits and not at class center; then, since one
cumulatesthe frequenciesfrom the highest values
to thelowest,cumulatedfrequenciesareto beplotted
againstthelowerclasslimits. Usingtheclasscenter
will entailan errorof excesson thecentraltendency
parameters(backgroundand threshold)but not on
the dispersionparameter(coefficientof deviation).
This error, or difference,varieswith the type of
classesused and is easily calculated(6% for the
0.05logarithmicclassinterval,12% for the0.1 log.
int. and 26% for the 0.2 log. int.). If the class
limit is used,curvesconstructedfrom differentlog.
int. classescan be directly comparedwithout cor-
rection.
Let us take a concreteexample:the distribution
of Zn in the quaternaryalluvialdepositsof BlockI
(Fig. 2). There are 989 resultsrangingfrom 10
to230ppm.
230
population:N-- 989 range:R- - 23
10
The best classinterval is selectedas explained
above'
log.int. logR 1.36=0.097n 14
A 0.1 log. intervalwill give 14 intervals,which is
acceptable.Usually, the histogram-frequencycurve
step is skippedand the cumulativefrequencydia-
gramdirectlyconstructed.
In Figure 2, the points fit fairly well along a
straightline, suggestinga lognormaldistributionof
zinc in the alluvial deposits. Actually, the points
never fit the line exactly,but this doesnot matter
providedthey stay in a channeldelimitedby the
confidencelimits usually taken at the 5% prob-
ability level. This confidenceinterval has been
drawnon Figure2 by usinga graph (Fig. 3), which
avoidsfastidiouscalculationand givesa fairly good
precisionfor the cumulativefrequencyvaluesbe-
tween5% and 95%. The width of the confidence
channelis inverselyproportionalto the importance
of the populationconsidered:the biggerthe popula-
tion, the narrower the confidence interval. To
check that a distribution fits a lognormalpattern,
oneshouldusethe Pearson'stest (Rodionov,1965;
Vistelius, 1960), but this longer operationis gen-
erally not warranted in this type of interpretation
and, for practical purposes,the graphical control
describedaboveis satisfactory.
Comparison with Histograms
For comparisonpurposesthe cumulativefrequency
curve for Cu in the Motagua drainage (Fig. 2)
was also constructed,then, in Figure 4, the cor-
respondinghistogramsand frequencycurvesfor Cu
and Zn. Figures2 and 4 presentthe samedata in
two differentways. Beforeenumeratingand com-
6. A SIMPLIFIED STATISTICAL TREATMENT OF GEOCHEMICALDATA 543
Figure4. Histogrum.nd frequencycurvefor Znmd Cu
I 1.5 2 3 4 5 6 7 8 OI0 15 • 30 40 50 GO70-80 100 150 200 300
x-' I I I T-'I Illll I Z.
_ '•.
ß
.- I I tz.
- I
.
• •øC.t ': C.• q/ I
:'"'•'• • I•. • I•. • I
.......,,
.....
19l 1.• 1,• 1,t l,• l,• ],1l 4.1 i• LII 9.l 11.1 14,818.f l•,4 19,• •1.l 1.81 1.9 14,1•,•
mentingon the advantagesof the former presenta-
tion over the latter, an interesting feature of the
histogram should be mentioned: in the case of
colorimetricdeterminationsmade in the lower range
of sensitivityof the analyticalmethod,the histogram
showsclearlythe biasintroducedin the readingsby
the humanfactorandby the accuracyandsensitivity
limits of the method. This effect is illustrated for
copperin Figure 4, where the classesincludinga
colorimetric standard are shaded and the value of
the standarditself is givenas a larger figure (1, 2,
4... ppm); thecumulationof thefrequencyreduces
this effect,particularlyif it is startedfrom the high
values,but it may be necessaryto bring somecoro
rectionsto the low value frequenciesin order to
constructa precisedistributioncurve.
ComparingFigures2 and4, oneseesimmediately
ihatit is easierto comparetwostraightlinesthan
two overlappingbell-shapedcurves; many more
populationscan be presentedon the samediagram
by usingcumulativefrequencycurvesthan by using
histograms. Cumulative frequency curves are of
easierconstructionand more precisethan ordinary
frequencycurves;it is simplerto draw a line that
fits a setof pointsthan to draw a bell-shapedcurve
with inflexionpoints.
Information Givenby Cumulative
Frequency Curves
The mainpurposein constructingthe cumulative
frequencycurvefor a givenpopulationis to check
if it fits a lognormaldistribution,and if it does,to
estimategraphicallyitsbasicparameters:background
(b), coefficientsof deviation(s, s',s") andthreshold
level (t).
(b) givesan idea of the averageconcentration
levelof the elementsin a givensurrounding.
(s) expressesthe scatterof the valuesaround
(b): it correspondsto thespreadof thevaluesand
their range,from the lowestto the highest.
(t) is a complexnotionwhichmightbe termed
"conditional":statisticallyit dependson the prob-
abilitylevelchosen;geologically,and for practical
purposes,it is supposedto betheupperlimit of the
fluctuationsof (b): it dependson (b) and (s).
Thevaluesequaltoor higherthan(t) areconsidered
anomalous.
Adjustmentto thelognormallaw is generallythe
casewhensoilsamplesare considered:in thedrain-
age reconnaissancesurveyin Guatemala,we found
that trace element contents in stream sediments
appearalsoto be lognormallydistributed.
7. 544 CLAUDE LEPELTtER
Background
A straightline denotesa singlepopulationlog-
normallydistributed. In this simplecase,the back-
ground value (b) is given by the intersectionof
the line with the 50% ordinate. In the examples
given in Figure 2, we have:
backgroundvaluefor copper.. b (Cu) --9.2 ppm
backgroundvaluefor zinc .... b (Zn) = 48 ppm
Of course, these values must be rounded off; it
will be illusoryto implya precisionfar out of reach
of theanalyticalmethods. In the illustratedexample,
10 and 50 ppm are taken as reasonablygoodap-
proximations of the backgroundlevels.
In the case of a perfect frequencydistribution
curve, the backgroundthus calculatedcorresponds
to the mode (mostfrequent)and median(50% of
the valuesabove,50% belowit) values,and is the
geometricmeanof theresults. Thisgeometricmean
is a more significantvalue that the arithmeticmean.
It is also a more stable statistic, less subject to
changewith theadditionof newdataandlessaffected
byhighvalues.
Deviation
Before explaininghow to determinegraphically
the deviationcoefficient,an essentialpropertyof the
normal distribution(i.e., fitting the "bell-shaped"
curve) mustberecalledhere:
(b) beingthemedianvalueand (s) the standard
deviation then:
68.26% of the populationfalls betxveenb-s
and b + s
95.44% of the populationfalls bet•veenb- 2s
and b + 2s
99.74% of the populationfalls bet•veenb- 3s
and b + 3s
This holdstrue in the caseof the lognormaldis-
tribution since the logarithms of the values are
normallydistributed.Then,roundingoff theabove-
mentionedpercentagesand taking (b) as the back-
ground,we cansaythat 68% of the populationfalls
betweenb-s and b-I-s or that 32% is outside
theselimits. The distributioncurvebeingsymetrical
aroundan axis of abscissa(b) (Fig. 4), 16% of
thevalueswill fall aboveb -I-s and16% belowb -- s.
In Figure 2, the valuesb+s and b-s will be
obtainedby projectingthe intersectionof the dis-
tribution line with the ordinates 16 and 84% on
the abscissaaxis. Working with logarithms,one
has to consider the ratios and not the absolute values
thus established.Taking the sameexampleof Cu
(Fig. 2), onedeterminesthepointsP (at the16%
ordinate)and.4. 0.4 isthegeometricalexpression
of the deviation:it is inverselyproportionalto the
slopeof the line. We call it the geometricdeviation
(s'); it has no dimension:it is a factor obtained
by dividingthe value read in .4 by the value read
inO:
21
s' - - 2.28
9.2
Then multiplyingor dividingthebackgroundvalue
by the geometricdeviationwill give the upper and
lower limits of a rangeincluding68% of the popula-
tion (from b-s to b+s, or A'A on the figure).
Multiplying or dividing by the squareof the geo-
metricdeviationgivesa rangeincludingabout95%
of thevalues(b -- 2sto b + 2s).
Becauseall the reasoningis made on logarithms,
it is also necessaryto expressthe deviationby a
logarithm: the coefficientof deviantion(s) is the
logarithm(base10) of thegeometricdeviation(s').
s' = 2.28
s = logs• = 0.36
It will be seenlater that it might be interesting
to consider a third deviation index: the relative
deviation(s") sometimescalledcoefficientof vari-
ation. It is expressedas a percentage:
$
s"= 100•
0.36
s" = 100 - 3.9%
9.2
Threshold
After the backgroundand the coefficientof devi-
ation,thethird importantparameteris thethreshold
level (t), whichis a functionof thetwo former. It
has been seenthat in the caseof symmetricaldis-
tribution(eithernormalor lognormal)95% of the
individual values fall between b + 2s and b- 2s,
that is to say that only 2.5% of the population
exceedsthe upperlimit b + 2s. This upper limit
is conventionallytaken as the thresholdlevel (t)
above which the values are considered as anomalies:
logt = (logb) + 2s
or to avoidusinglogarithms:
t = b Xs '2
t = 9.2 X 5.2 = 47.8 ppm
Practically,(t) as well as (b), is read directly
on the graphas the abscissaof the intersectionof
the distributionline with the 2.5% ordinate. In
thisexampleonereads47 ppm,andthe slightdif-
ference is due to the rounding off of the exact
8. A SIMPLIFIED STATISTICAL TREATMENT OF GEOCHEMICAL DATA 545
ordinate2.28%to2.5%. Thisshowstheimportance
of the deviationin the estimationof the threshold;
two populationsmayhavethe samebackgroundbut,
nevertheless,different thresholdsif their coefficients
of deviationare different. In Figure2, the threshold
is five timesthe backgroundfor Cu and only 2.7
times for Zn.
In all the foregoing, I have consideredthe sim-
plestcase:a singlelognormalpopulation,the dia-
grammaticexpressionof which is a straight line.
However,when constructingcumulativefrequency
curves,a brokenline is frequentlyobtainedsug-
gestingthat the set of data consideredconsistsof a
complexpopulationor of differentones. Whenever
possiblein practice,the interpretationis madeon
sets of data selected so as not to include more than
two different distributions; for instance, a litho-
logicalunitmayincludetwo typesof mineralization
showingup in soilor sedimentsamples;onerepre-
sentativeof the normal or backgroundcontentof
thematerialsampled,andtheother,a superimposed
mineralization related to ore.
Examples
The three main casesof non-homogeneousdis-
tributionthat are the mostlikely to occurare, in
decreasingfrequencyorder:
a. an excess of high values in the considered
population;
b. a mixture of two populationsin a given set of
data; and
c. an excessof low valuesin the consideredpopu-
lation.
Thesethree casesare representedgraphicallyin
Figures 5. They correspondto real distributions
encounetredin the Guatemalandrainagesurveyand
appearas solid lineswith slopebreakson the dia-
gram. Some indicationsare given below showing
how to interpretsuchlines.
CopperDistribution (in a lithologicalunit). The
cumulativefrequencyline (Fig. 5) showsa break
to a flatter slope at the 30% level. This is the
casewhen there is an excessof high valuesin the
population;thehistogramwill givea frequencycurve
skewedto the right, in the direction of the high
values(positiveskewhess). If the populationwas
lognormallydistributed,the main branchOatshould
extendas a straightline in Oz whereas,in this case,
Ox isliftedto Oy whichmeansthatinsteadof having
2.5% ofthevalues30ppmor greater,thereare17%
of them. The abscissaof the breakingpoint, O,
(in this case 18 ppm) indicatesthe limit above
whichthereis a departurefromthenorm (i.e., from
thelognormaldistribution),an excessof highvalues.
In this case,backgroundand coefficientsof deviation
are calculated with the main branch Oat. The
abscissaof the breakingpointmay be conveniently
9. 546 CLAUDELEPELTIER
taken as threshold value if the break occurs above
the normalthresholdlevelof 2.5%. If, however,
the breakoccursbelow2.5% level (at pointp for
instance) the thresholdshouldbe taken as usual
(abscissaof pointP). Positivelybrokendistribu-
tionlinesare themoreinterestingbecausetheyin-
dicatean excessoverthebackgroundmineralization.
MolybdenumDistribution(in a lithologicalunit).
The cumulative distribution line shows two breaks:
first a positive,thena negativeone. Sucha graph
is the expressionof a dual distribution,suggesting
the existenceof two distinctpopulationsin the set
of dataconsidered.It givesa double-peakedhisto-
gram. We shall considerhere only the most fre-
quentcaseof a main"background"populationmixed
with a smalleroneof higheraveragevalue,the two
of thembeinglognormallydistributed. On the dia-
gram (Fig. 5), branchA correspondsto the main
or normal population,branch B to the anomalous
populationand the central branchA q- B to a mix-
ture of the two. By splitting the data at a value
takenaroundthe middleof A q-B (at 4 ppmfor
instance),it is possibleto separatethetotalpopula-
tion into two elementaryonesappearingas a and
b on the diagram. The generalbackgroundwill be
taken with branch A and the threshold as the abscissa
of themiddleof branchA q-B, thoughthe threshold
of populationa mayalsobe considered,but we have
not enoughexamplesof suchcomplexdistributions
to make definite recommendations,and we lacked
computingfacilitiesto calculatetheoreticaldistribu-
tions. The coefficients of deviation must be cal-
culatedseparatelyfor distributionsa and b.
Zinc Distribution (in a drainageunit). The
negativelybrokenline on Figure 5 is the expression
of anexcessof lowvaluesin anessentiallylognormal
distribution;in this case,the histogramis skewed
to the left, towardtheselow values(negativeskew-
ness). Providedtheir proportionis not too high
(20% or lessor instance),they do not interfere
in the interpretation,which is done on the main
branchof the distributionline in the usualway.
This excessof low valuesmaybe dueto the inclu-
sionin thepopulationof a low-backgroundlithologi-
cal unit or, moreoften,to poor sampling(for in-
stance,collectingan importantset of sedimentsam-
plesthataretoocoarse).
When the resultsdo not fit a lognormaldistribu-
tion, an explanationmaygenerallybe foundamong
these three factors: (1) lack of homogeneityin
sampling,(2) complexgeology(imprecisionin the
lithologicalboundaries),and (3) analyticalerrors.
It shouldalsobekeptin mindthat someelements
in somesurroundingsmaynot be lognormallydis-
tributed.
Advantages of Cumulative Frequency Curves
Plottingthedistributionof an elementin a selected
unit as cumulativefrequencycurve on probability
graphpaperis the easiestand mostpreciseway to
presenta great amountof data (for instance,pre-
sentingFigure 5 ashistogramsandfrequencycurves
will resultin an overloadedand illegiblediagram).
All the characteristicparametersof the distribution
can be estimated without cumbersome calculations.
Comparisonbetweenvariouspopulationsare easy
andcomplexdistributionsareclearlyidentified.Fur-
thermore,the adjustmentto a lognormaldistribution
canbe checkedgraphically.
Comparingthegeochemicalfeaturesof the various
unitsof a surveyareais importantin assessingtheir
mineral potential. This is convenientlydone by
plottingthe correspondingdistributionson the same
diagram for instance Cu distribution in three or
four different drainagesin the case of a stream
sedimentreconnaissance.Distribution heterogenei-
ties will be spotted and the correspondingunits
selectedfor further investigations. On a broader
scale,the geochemicalbehaviorof trace elementsin
a given geologicalenvironmentfrom different coun-
tries or metallogenicprovincescan be readilycom-
pared. This isanapproachto a betterunderstanding
of the distributionlawsof traceelementsin naturally
occurringmaterials.
The Coefficients of Deviation
A lognormaldistributionis completelydetermined
by two parameters:the geometricmean (b) and
the coefficientof deviation (s). It has been seen
that the absolutedeviationcan be expressedas a
geometricfactors' or, morecommonly,asa logarith-
mic coefficients. The term "deviation"is preferred
to "dispersion"which might be more expressive,
becausethereis nogeneticimplicationin the concept
of statisticaldispersionwhereasthere is one in the
notion of geochemicaldispersion;however, many
peopleusethe term "dispersion"in statisticalinter-
pretationof geochemicaldata.
The coefficientof deviationis a dispersionindex
specificfor the distributionof a given elementin a
given environment and expressesthe degree of
homogeneityof this distribution. When rocks are
considered,a similarityin thecoefficientof deviation,
togetherwith similar averagevalues,may indicate
similar geochemicalprocessesin their formation.
It is possiblethat a givenvalueof s corresponds
to eachtype of mineralizationin a lithologicalunit.
Confirmingthis assumptionwould requirevery ex-
tensivegeological-statisticalstudiesencompassingall
metallogeniccases.
There is also a relationshipbetweenthe back-
ground (b) and the coefficientof deviations(s)
10. SIMPLIFIED STATISTICAL TREATMENT OF GEOCHEMICAL DATA 547
whichis the expressionof thegeochemicallaw which
statesthat the dispersionof an elementis inversely
proportionalto its abundance. This is expressed
very clearly by the relativddispersions" (or rela-
tive deviation),a percentagerelatedto b and s as
follows:
averageabsolutedeviations (graphicallyestimated
in Fig. 6) alsodecreaseswhen the abundanceof the
element increases.
The weightedmeanvaluesof b, s and s" for each
elementhave beencalculatedseparatelyfor Blocks
I and II:
s
s"= 100•
The higher the background,the lower the relative
deviation. This is bestshownon a log/log correla-
tion diagram by plotting s" as abscissaand b as
ordinate. Figure 6, for instance,showsthe variation
of s"in functionof b in the differentlithologicalunits
of Blocks I and II, for Cu, Zn, Pb and Mo. The
diagram has been constructedby taking, for each
element,the extreme valuesfor b and s" thus deter-
mining parallelogramsincludingall the individual
values. One seesimmediatelythat there is an in-
verse linear relationshipbetweenb and s" (which
is evident from the definitionof s") and that the
Block I b s s" Block II b s s"
Zn 55. 0.23 0.42
Cu 8. 0.34 4.2
Pb 6.8 0.32 4.7
Mo 0.38 0.37 97.5
Zn 70. 0.17 0.24
Cu 8. 0.30 3.8
Pb 5.8 0.30 5.2
Mo 0.35 0.40 125.
The factthat the absolutedeviationfor Pb is equal
to or slightlylower than that for copperis due to
two factors:(1) thesensitivitylimit of theanalytical
methodfor lead,whichentaileda numberof assump-
tions and extrapolationsin the interpretation--de-
terminationof b and s, and (2) the existenceof
somePb mineralizedzonesin the surveyareawhere
bwashighandslow.
$
Figure6. Correlationdiagramb/s"for blocksI andII
Alllilholo•icllunits
6.2 0.3 0.4 O.i 0.8 ! 2 ,3 4 $ $ 78910
,=o.lsJ2o.2'1
11. 548 CLAUDE LEPELTIER
Figure7. CorrelationdiagramCu/Zn
cu(pp,.) •
III • ßIII
ß IllIll / ' x .,
Ill , / X , Ill
Ill I " • Ill
III , ,,,
Cu :
ß. - Il
/' • I II
Ill .... :-_ . tll•III
'.A......• • • Ill,,i
III Jill
III ...............', IIII II
III ; •11
Ill , III
III ill
5670910 2 3 4 5 67891• 2 3 4 5 67091ffi0
N1=nI +n3=168
In Figure 6, it is also interestingto note the
variationsof the dispersionof the sameelementin
different lithological units which is particularly
noticeablefor copper;the width of eachparallelo-
gram indicatesthe rangeof variationof s for each
element.
The coefficientof deviation is a very important
character of the distribution of an element in a
givensurrounding;it is probablyrelatedto the type
of geochemicaldispersion,mechanicalor chemical,
and consequentlymight give an indicationof the
type of anomalyencountered:syngeneticor epi-
genetic. It appearsthata highercoefficientof devi-
ationindicatesa preponderantlymechanicaldisper-
sion,but this hasnot beenproved. Much remains
to be done in this field.
Correlation Diagrams
In the caseof a polymetallicmineralization,with
two or moreelementslognormallydistributed,there
isgenerallya positivecorrelationbetweenthem;for
instancebetweenleadandzinc,a samplehighin Pb
iscommonlyalsohighin Zn. Thisgeologicconcept
of a relationshipbetweentwo-typesof mineralization
(onlyqualitativeandrathervague)maybe substi-
tutedby a precisefactor,the coefficientof correla-
tionp, whichgivesa rigorousmeasureof their de-
gree of dependency.In the caseof geochemical
prospecting,p measuresthedegreeof dependencyof
two lognormalvariablesnamelythe tenorsof two
elementsin a samplepopulation(Matheron,1962).
The coefficientpalwaysfallsbetween-1 and+ 1.
p--o meansa completeindependencebetweenthe
twoelements,p-- --+-1indicatesa functionalrelation-
ship,director inverse,betweenthem(it is a linear
relationshipbetweenthe logarithmsof the tenors).
SimplifiedCalculationof p.--Thereis a graphical
way to estimatep, slightlylessprecisebut much
fasterthan the completestatisticalcalculation:con-
structinga correlationcloudin full log.coordinates
(Fig. 7, 8). Eachsampleof thepopulationunder
studyis plottedfollowingits t•vo coordinates:its
tenor in element •/ and its tenor in element B and
the totalpopulationappearsas a cloudof points.
13. 550 CL,ZIUDE LEPELTIER
may be usedto assessmineralassociationsof ele-
mentsin naturalsamples. The correlationdiagram
showswhethertwo elementsare spatiallyassociated
andif onemaybeusedasa pathfinderfor theother.
Let us considertwo examples'the relationshipof
Cu/Zn in the drainageof the SuchiateRiver (Fig.
7) andtherelationshipof Pb/Zn in theRio Grande
drainage(Fig. 8).
Thefirstexample,in Figure7, isintendedonlyto
illustrate the lack of relationshipbetweentwo types
ofmineralization.The cloudofpointshasnodefinite
shape,but it canbe dividedinto threezones'one
aroundthe intersectionpointof the axes,including
the majorityof the pointswhichare spreadmore
orlessequallyamongthefourquadrants;anelliptical
one,markedCu, in therangehigh-Cu/background-
Zn values;and a third one, includingonly a fe•v
high-Zn/background-Cupoints.Thisshowsthat,in
the Suchiatedrainage,thereis norelationshipwhat-
soever between the Cu and Zn mineralization, that
theCu anomalyis moreimportantthanthatfor Zn
and that the two anomaliesare well separated
spatially.All thisisexpressedbythecoefficientof
correlation'
p = -0.11
Its low absolutevalue indicatesa nearlycomplete
independenceof the two mineralizations,with a
tendency'to inverserelationship(negativevalue).
On thecontrary,Figure8 showsan exampleof
directrelationshipbetweentwotypesof mineraliza-
tion. In the Rio Grande drainage,Pb and Zn are
associated'the correlation cloud is an elongated
ellipsewhosemainaxishasa 45ø slopeandthe
correlationcoefficientt•---+0.87. In thisdrainage,
lead and zinc anomalies will have the same pattern
andwill bespatiallyrelated.In similargeological
conditions,oneelementmaybeusedasa pathfinder
for the other.
Conclusion
In theGuatemalangeochemicalreconnaissance,the
statisticalanalysisof thedata,althoughelementary,
was usefulin outliningsubduedanomalouspatterns
in a complexgeochemicalsurrounding,but much
moreinformationcancertainlybe extractedfrom the
analytical results by a more thorough, computer-
oriented, treatment.
The graphicalmethodsdescribedabovehave the
great advantageof beingquick,cheapand easyto
use in the field without any specialmathematical
knowledge. It is a convenientand syntheticway
to presenta greatamountof geochemicaldata,and
I think it mightbe usefulto any geologistinvolved
in geochemicalprospecting.
UNITED NATIONS MINERAL SURVEY,
GUATEMALACITY, GUATEMALA,
January20; March 28,1969
REFERENCES
Ahrens,L. H., 1957,The lognormaldistributionof the
elements--a fundamental law of geochemistry: Geochim.
et Cosmochim. Acta, v. 11, no. 4.
Coulomb,R., 1959,Contribution3,la C•ochimiede l'uranium
dansles granitesintrusifs:RapportC.E.A. 1173,Centre
d'EtudesNucleires de Saclay, France.
Cousins,C. A., 1956,The value distributionof economic
minerals with special referenceto the Witwatersrand
Gold Reefs: Geol. Soc. South Africa Trans. v. LIX.
Hubaux,A., 1961,Representationgraphiquedesdistributions
d'oligo-•l•ments:Ann.Soc.G•ol.Belgique,T. LXXXIV--
Mars 1961.
Termant,C. B., andWhite, M. L., 1959,Studyof the dis-
tribution of somegeochemicaldata: ECON.GEOL.,V. 54,
p. 1281--1290.
Matheron,G., 1962,Trait• de g•ostatistiqueappliqu•e,tome
1: M•moire no. 14 du Bureau de RecherchesG•ologiques
et MiniSres, Paris.
Miesh,A. T., 1967,Methodsof computationfor estimating
geochemicalabundance--U. S. GeologicalSurvey Pro-
fessionalPaper 574-B.
Monjallon,A., 1963,Introduction3. la m•thodestatistique:
Vuibert, Paris.
Rodionov,D. A., 1965,Distributionfunctionsof the elements
and mineral contentsof igneousrocks: ConsultantBureau,
New York.
Shaw,D. M., 1964,Interpretationg•ochimiquedes•l•ments
en trace dans les roches cristallines: Masson et Cie,
Paris.
Vistelius,A. B., 1960,The skewfrequencydistributionsand
fundamentallaw of the geochemicalprocesses:Journal of
Geol. Jan. 1960.