Presentation by Geoffrey Bilder at Crossref London LIVE, 26th September 2017. New initiatives at Crossref including organisational and grant identifiers.
15. Test Data Preparation
● Normalise each dataset to a standard JSON format
– Identifier, name, country (& code)
● Collect sample affiliation data
– Use affiliation data from CrossRef & ORCID
– Manually match affiliations against candidate datasets
(benchmark dataset)
● Benchmark dataset contains 100 affiliations
– Removed erroneous data, some repeats
17. Matching
● Take benchmark dataset of 100 affiliations
● Match each affiliation against each dataset
– Search against a simple Elastic Search index
– Use 2 approaches: “basic” and “institution”
● Generate reports for each dataset
– Details of best match for each affiliation
● Generate overall summary across datasets
● Finally, produce weighted score
– % coverage x % successful matches
– Normalises for current dataset coverage
47. • “Payment Icons” - Stuart Colville, https://github.com/muffinresearch/payment-
icons
• DataCite, THOR, and ORCID logos via their respective organisations.
• All other images via a paid subscription to The Noun Project
Credits
How hard can it be? The rose-tinted view is that there is one canonical name, but we know that the name varies depending on the glasses you are wearing.