Occupations are a crucial resource for historical research in a wide variety of fields. This presentation indicates the size of the error that is made when combining data from the two major classification schemes OCCHISCO and HISCO. Next it shows how Linked Data provides a solution to circumvent this and similar issues.
Advancing the comparability of occupational data through Linked Open Data
1. Richard Zijdeman [richard.zijdeman at iisg.nl]
Kathrin Dentler
Rinke Hoekstra
Albert Meroño-Peñuela
Advancing the comparability of
occupational data through
Linked Open Data
HISCO workshop
Historical Population Database of Transylvania
Cluj, Romania
June 18, 2016
2. ... it is market position, and especially position in the occupational
division of labour, which is fundamental to the generation of
structured inequalities. The life chances of individuals and
families are largely determined by their position in the market and
occupation is taken to be its central indicator ... .
(Rose and Harrison, 2010)
2
3. 3
Occupations are important as dependent variables (occupational
attainment studies) and independent variables (occupation
stratification studies) in educational (and occupational) status
attainment, health, voting, consumption, marriage etc.
(Ganzeboom, 2008)
4. Occupations are one of the few indicators of social position that
are available in:
• large quantities
• different time periods
• various societies
• at the individual level (smallest level of detail)
4
5. Lack of comparability
• Many different occupational classifications
• Differences in mobility studies could results
from different classification methods
(Kaelble 1985)
5
Charles Booth (1886-1903)
6. HISCO
• Historical International Standard Classification of Occupations
• Put together by a large number of institutes
• Based on ILO’s ISCO ’68
• Occupations retrieved from registers
• 1675 occupational codes
6
7. Current solution: 2-step procedure
Code into the concept, first:
• Classify into the concept (HISCO)
• Link the measure of stratification to the concept (e.g. SOCPO,
HISCAM)
7
8. New problems
1. What concept?
• Historical International Standard Classification (HISCO)
• OCCHISCO
• PST
2. Not all measures link to all concepts
• E.g. no link between OCCHISCO and HISCAM
3. Adaptability of concepts (new versions)
8
9. Is this a substantive problem?
Illustrative example:
• Subset of SAME occupational titles from NAPP and HISCO
• Link these occupations to HISCAM
• For HISCO directly provided by HISCAM people
• For OCCHISCO indirectly through a mapping
9
13. So yes, this is problematic
• ‘Lost’ 41% explained variance
• Cf. regression models: usually not above 30%
• HISCAM often both as dependent and independent variable
13
14. New problems
1. What concept?
• Historical International Standard Classification (HISCO)
• OCCHISCO
• PST
2. Not all measures link to all concepts
• E.g. no link between OCCHISCO and HISCAM
3. Adaptability of concepts (new versions)
14
15. Towards a solution
• Linked Data (Berners-Lee, 2006)
• Define Resources (books, respondents, etc.) with a URI
• Present URI’s as URL’s
• Describe Resources using so called ’triples’
15
16. An example of a triple
16
Margaret Miner
works as
PropertyResource Value
23. Case study: DBpedia
- Structured data behind Wikipedia
- Information on all kinds of topics, also occupations
- Add HISCO codes to DBpedia occupations
- Let’s try and do this live: http://yasgui.org/short/VJfZvnx6x
23
24. Caveats
• We did not check the technique on a really big scale
(e.g. NAPP data)
• Sharing code remains a collective action problem
(but less of a coordination problem)
24
26. Outlook
• Linkage to texts (occupations in newspapers)
• Linkage to public resources: Wikipedia
• Combine Machine Learning and Linked Data for automated
occupational coding
26