Small Data, Big Benefits: Mining for End User Relationships
In today’s environment publishers need more user interest and engagement in order to keep institutional subscriptions and submissions strong and growing.
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Small Data Yields Big Benefits: Leverage Metadata to Gain Insights
1. Small Data, Big Benefits:
You’re Sitting on a Gold Mine
SSP – Concurrent 2A
June 2 2016
Christine Orr
2. Your data is (potentially) your most valuable asset
Knowledge
Information
Data
3. Ringgold: Trusted data partner & part of the
scholarly comms infrastructure
• Ringgold is a small, for-profit, database publisher
• 2003: Identify Database & the Ringgold ID are born out of a project to
disambiguate the institutional subscribers to OUP journals
• This was not a unique problem
We’ve been helping publishers turn
their data into an asset for a decade
with the Ringgold Identifier & data
quality expertise
4. 420,000 Institutions in
scholarly communications
Metadata
Hierarchies
Ringgold
ID
Univ Calif
System
UCLA
Med School Coll of L&S
UCSD
Div of
Physical Sci
UCLA: Univ Calif Los
Angeles8783
UCLA: Universidad
Centroccidental Lisandro
Alvarado
33177
UCLA: Universidad
Contemporanea de las
Americas
376277
SizeTier
Subject Sector
+
MoreLocation
5. Small Data – what do we mean?
• Metadata elements
• Persistent IDs
• Individual user records
• Transactional data: usage,
payments, etc.
The big picture depends on every
detail: accurate & interconnected
data
6. What is preventing us from using our data to
maximum advantage?
• Entity management: Jens-Peter
Mueller or J-P. Müller? Uni
Hannover or Hanover College?
• Data quality: Incomplete, outdated,
duplicates, inaccurate.
• Interoperability: Systems,
languages, data silos based on
functions.
7. Who are our end users?
• Readers
• Authors
• Reviewers
• Editors
• Members
• Student members
• Meeting attendees
• Volunteers
• Faculty
8. Who are our Institutions?
Affiliations of
Individuals:
• Authors
• Editors
• Reviewers
• Volunteer
• Funder of research
• Payer of APC
• Member
institutions
• Licensees
• Subscribers
• Consortia
9. • Market intelligence about authors and institutions
• Facility to create targeted marketing & outreach
• Author and subscriber information mapped together
• Knowledge of where research funding is concentrated
• Reduction in time taken calculating open access charges (APCs)
• Ability to resolve conflicts of interest (author & reviewer from same inst)
Big Benefits: When individuals & affiliations are joined
together, publishers gain knowledge and capabilities
10. What solutions exist?
Entity Management: Adopt persistent
IDs as appropriate
• People: ORCID
• Places: Ringgold ID, ISNI, Funder ID
• Things: DOIs
Data quality
• Retrospective cleanup
• Improve data capture process &
eliminate free text
• Disambiguate & merge duplicate
records
Interoperability
• Tools & standard data for integration
• Systems-dependent
11. ORCID + Ringgold IDs: Joining people + places
• Joining People + Places
12. Thank you
Christine Orr
Sales Director
Christine.Orr@ringgold.com
www.ringgold.com
Ringgold’s Mission
To provide identifiers and structured data to
power the efficient exchange of information
throughout the scholarly research community
...and beyond.
Editor's Notes
We all have a wealth of data about our end users: but are we making the most of it?
In order to realize big benefits, you have to improve the state of your source data, individual records, so that it can be transformed into knowledge, and actionable information.
This is something that Ringgold has been involved in since our founding – helping publishers improve and transform their source data so that it can be used to maximum advantage. Very often my clients are surprised at the results of a data cleansing project Ringgold will do for them: they’ll say, I had no idea we had so many corporate accounts! Or the amount of existing business they have among a set of consortia members. Improving the their source data often changes their strategy, their priorities, and how they handle individual customers. Those small bits of data can now be aggregated into a body of trusted knowledge, and a valuable business asset.
First, I want to briefly introduce Ringgold and where we fit in:
We have experience helping publishers turn their raw data into an asset which can be used for myriad business purposes, and we do this via the application of persistent identifiers & structured metadata. We are a small for profit database publisher, founded to solve a publisher’s problem of having poor source data related to subscribers – they wanted to be able to improve operations and gain greater insight, and they realized they could only do this by getting down to the individual record level, disambiguating each unique institution, and improving the quality of the data held about each one.
In our founders work w OUP, it became evident this was not a unique problem, and that the whole community could benefit from a persistent ID for institutions. So, we began building an authority file of institutions, each one uniquely identified via the Ringgold ID.
We have become part of the infrastructure that allows scholarly information to be more easily exchanged, and a trusted data partner to support publishers and others in their quest for enhanced data quality.
So, the Identify Database was born out of this need, and here’s a brief overview of what the database is:
Unique identifiers, like a passports, are universally understood & recognized, and offers smooth passage from one system to another. Agnostic of language, territory.
A few identifiers which are broadly adopted include ORCIDs for individual researchers, ISSNs for journals, DOI’s for digital content, and the Ringgold Identifier for institutions.
Our Ringgold IDs are part of the Identify Database, which covers all manner of institutions playing any sort of role in scholarly comms, such as universities, funders, publishers, commercial entities, hospitals, etc. In addition to applying a unique ID to each organization, we go further and describe them in great detail, and join all related records together into family trees or hierarchies. We provide this connective tissue between universities and their subject departments, or companies and their subsidiaries, so that our users can make sense of the complex networks.
Identify is an institutional authority file, understood by more than 70 publishers and intermediaries in the scholarly space.
And the more accurate your “small data” is, and the more interconnected it is, the more you can do with it, the more benefits you can realize.
Unfortunately, there are impediments to us having an accurate & trustworthy big picture:
At Ringgold, we see a lot of publisher source data, and the issues generally fall into three categories:
Entity management: Not understanding which person, place, or thing, to which the data refers.
Poor data quality: Missing data elements, duplicate records, flat out incorrect data (city & country not matching up)
Interconnectedness: Throughout the publications lifecycle, data about our constituents passes through many different systems. For example, an end user might interact w a peer review system, subscription fulfillment, TOC alerts for readers. Also, most of us deal with a global user base, libraries around the world, and an array of publishing services systems. How can we help those systems to connect so that we can join up our data.
Solve these three problems, and you are on your way to transforming your data into a business asset.
It’s often the same person! ORCIDs can help disambiguate here, where the same individual might show up again & again in various data silos according to their role. Understanding the full impact, all the touchpoints you have with an individual or an institution, all adds to the big picture which is so helpful for marketing and outreach.
And it’s often the same with institutions: one organization will interact with a publisher in an array of functions.
Story: years ago I was working to understand the full financial impact of institutions that had subscribed to my publisher’s journals, and also had paid APCs for our OA journal. B/c we had no unique ID for those institutions, this manual reconciliation took weeks instead of an hour.
The fact that our institutions & people often play multiple roles, makes the need for disambiguation (entity management) and systems interoperability key to the solution, so that we can make proper decisions about our key organizations.
Both operational and strategic benefits if we can join these two entity types together accurately:
We are talking today about marketing & outreach benefits, but there are many more things we can do with properly joined-up data.
Ability to not only leverage that information to our benefit – creating knowledge, but to our end-users’ benefits: creating a more seamless and user-centric experience when dealing with us as publishers.
How can we help to make meaningful, actionable connections?
Adopt & embed PIDs into your workflows.
Data quality: Depending on your requirements, you may opt to both perform a cleanup on several years of old records, or simply start w a clean slate & improve them going forward. One way to do that is to improve your data capture process, so the record is clean from the get-go. Example: we now have clients using Identify to capture the authoritative affiliation for users at the moment the record is created in a peer-review or membership system. Example: we have many clients now who are using Identify to allow individuals to select their affiliation (rather than enter it as free text – which is the death of good data). Scholarone & Aries- iin peer review systems, CCC for APC calculation.
3. There are as many options for this as we have clients.
I’ve already mentioned ORCID, so I wanted to highlight how they have applied Identify in order to join up researchers with their affiliations: Rather than having users enter their employment and educational affiliations using free text – which would just need to get cleaned up at some point in the future – ORCID has embedded Identify’s institutions in the back-end of their registration system. Now, researchers can instantly create a clean link in their record to the right affiliation. It’s a researcher-friendly process, appearing to them as a simple drop-down menu, but powerfully embedding authoritative institutional metadata the ORCID record. We’ve made a frictionless and unambiguous connection between a person and a place.
Thank you, and with that I will turn the mic over to Jenni Rankin of Annual Reviews.