SSP Presentation V2.0

1. Essential Engineering Intelligence Adventures in Constructing an Engineering Domain Model What The IET learned when we decided to ‘semantically enrich’ ours and others data and prototype some concept products… 1 May 2014 Alison Haggar, Product Manager SSP Annual Meeting Session [Preparing for Tomorrows Stakeholders Today: Wear it, Map it, Augment it!]

2. Essential Engineering Intelligence What we did 2 Sector coverage Renewable energy Market coverage Worldwide for research trends app, UK only for other apps 4 x use cases Academic Researcher, Product Development Engineer, Product Manager, Consultant 4 x work flows One per use case 4 x applications One per use case IET data feeds Inspec 3rd Party data feeds Emerald Group Publishing Limited, RenewableUK, Ofgem, CambridgeIP Data model Based on Inspec thesaurus (controlled and uncontrolled terms)

3. Essential Engineering Intelligence Our Demo 3

4. Essential Engineering Intelligence A Domain Model for Engineering 4 People Organisations Inspec Wind Turbines Email Job title Address Name website Name Develop / are developed by Employ / work for Specialise in / are developed by

5. Essential Engineering Intelligence Here’s some of things we built in the prototype…  Research Trends  Renewables Directory  Generate your own Power 5

6. Essential Engineering Intelligence Geolocation 6

7. Essential Engineering Intelligence Profiles of People 7

8. Essential Engineering Intelligence Profiles of Organisations 8

9. Essential Engineering Intelligence Visualisation: Trend analysis 9 Q. Has research into ‘solar absorber-convertors’ peaked or is it still growing? Q. Has research into areas closely related to ‘solar absorber-convertors’ peaked or is it still growing?

10. Essential Engineering Intelligence The Renewables Directory  Companies categorised as developers, manufacturers, installers and/or suppliers  Filters appropriate to the technology  Social media 10

11. Essential Engineering Intelligence Generate your own Power App  Search by postcode for average windspeeds  Hire a wind gauge  Request a windspeed survey 11

12. Essential Engineering Intelligence Not only products… but Services  ‘Self service’ enrichment modules  A hosted enrichment service  Client data analytics  Engineering Market analytics  Knowledge Hubs 12

13. Essential Engineering Intelligence Publisher Applications  Get statistical support for your content acquisition strategy by:  Identifying influential and emerging research trends  Visualising journal strengths and weaknesses over time  Comparing one journal to another  Comparing your publications to other publishers’ content to identify USPs  Find and contact potential peer reviewers:  By specialism, geography and influence + continue to monitor suitability  Keep your authors up-to-date:  Real-time notification of new citations + annual citation statements  Increase your citation statistics by:  Suggesting possible citations to researchers 13

14. Essential Engineering Intelligence Final thoughts At the IET we believe that 1. The future of indexing is Expert Curated Domain Models “The future of data curation is a competition between information graphs” Sayeed Choudhury, Associate Dean for Research Data management, Johns Hopkins University 2. The IET Engineering Domain Model is a massively powerful tool “Your data must out-perform the sum of its parts! And produce solutions – not just more questions” D R Worlock, Digital Strategy Advisor and Consultant 14

15. Essential Engineering Intelligence Be Part of our Journey Product Manager: Alison Haggar ahaggar@theiet.org +44 1438 765611 “If you have engineering content then we can help you to discover its hidden potential” 15

Hinweis der Redaktion

Good morning, as it says on the slide, my name is Alison Haggar and I’m here today to tell you about an adventure we’ve been on at the IET for some time now. I joined the IET’s Knowledge division towards the back end of 2012 with a brief to look at new ways of presenting knowledge to both our current customers and to potential new customers. The team had already done quite a bit of research and we therefore already knew that what users wanted were detailed answers to specific questions rather than links to lists of documents containing information that might or might not answer their questions. And they wanted to achieve this using an interface that had as near to Google simplicity as possible. We also had quite a lot of content – 14 million + records in Inspec alone. The next question was how to marry the two without creating ‘just another website’ – very quickly therefore, I focussed in on natural language processing and semantic enrichment technologies.
Now the ultimate goal was to be able to answer questions across the whole field of engineering and technology – but I felt that this breadth of scope was partly the reason for a lack of progress to date. We needed to focus in on a small area of engineering, build something, play with some data, answer some specific questions and then work out how to scale this up to a wider audience. So we reduced to the scope to renewable energy and focussed on content providers who were geographically close and therefore easy to work with face-to-face. We also selected 4 representative user types: academic researchers and product development engineers who fall within our current customer base and product managers and consultants who don’t. We then carried out a series of in depth interviews with representatives of these users to find out what information they needed to support them in their roles and which bits they currently find difficult to source. The research was used to spec up a prototype with some very specific goals: The first being to test natural language processing and semantic enrichment techniques against manually curated data to see how well the reality actually lives up to the hype Secondly, to investigate data storage, the building of an API and a re-usable front end to help us assess production costs for a commercial implementation Thirdly, to investigate what mix of content we would need to answer the questions our research had thrown up? How easy that content would be to source and also what it would cost to source. And last but certainly not least, how best to make the products and services uncovered during the prototyping process available to end users.
And this is a screen shot of the prototype we built. Specifically, it shows a UI that allows people to select the market vertical they work in and then their job role. Once selected, these criteria are used to both interpret search results and to provide answers to the following questions: Who is carrying out research into technology x (and related fields)? Is research into technology x (and related fields) increasing or decreasing? What companies specialise in technology x near location y? What companies near location y operate/develop/manufacture/install technology x? Is location y suitable for installing renewable technology x? Can you show me information about operating renewable technology x installations near to location y? In addition to answering these questions we also created profiles for all the people and organisation entities we extracted during the enrichment process –about 1 million people and 100k organisations. I’ll show you some screen shots of the prototype a bit later in my presentation but first I’d like to return to some of the underlying work we had to do in order to be able to answer the questions I just listed.
The very first thing was look at our data. Did we have the information we needed within Inspec to answer these questions and was it held in the right format to make this possible? The answer in both cases was no. In addition to Inspec data we partnered with Emerald Group Publishing Limited, RenewableUK (a UK based renewable energy trade association specialising in wind and wave), Ofgem (the UK independent National Regulatory Authority for electricity and gas) and CambridgeIP (a UK based IP consultancy) who all provided data to us free of charge for prototyping purposes. We also needed to model this data in the way shown on the slide. We did this in conjunction with our development partners, Ontotext AD and 67 Bricks. We had a bit of a head start because we already own an engineering and technology taxonomy, the Inspec Thesaurus, which contains around 20,000 terms, manually compiled by subject matter experts over the last 40 years. But we also knew that to do what we wanted to do with the prototype, we needed to model a much broader set of concepts and relationships including people, organisations, places, activities, products and publications. And each of these entities or sets of things would need to be subdivided into subsets and subsets of subsets until we reached a level of granularity where each entity within a set had the same attributes and relationships associated with them. Just to give you an idea of how powerful this way of arranging your data is, the simple structure on this slide would allow a system to answer quite complex questions like: “Can you give me the names and email addresses of people working for organisations who develop wind turbines? An example closer to home requires another set of entities, namely publications. This would then allow us to answer questions like: “Who were the top contributors in the field of wind turbine motor research between 2012 and 2013?” or “is wind turbine motor research a growth area?”. If we then add patent data into the mix we can start to answer even more complex questions like “wind turbine research was identified as a research growth area in 2010, how may patents have been filed since then?” In other words, we can use our engineering domain model to assess how quickly academic research leads to product innovation. So let’s go back to the prototype and see some of the questions and answers in context.
The very first thing was look at our data. Did we have the information we needed within Inspec to answer these questions and was it held in the right format to make this possible? The answer in both cases was no. In addition to Inspec data we partnered with Emerald Group Publishing Limited, RenewableUK (a UK based renewable energy trade association specialising in wind and wave), Ofgem (the UK independent National Regulatory Authority for electricity and gas) and CambridgeIP (a UK based IP consultancy) who all provided data to us free of charge for prototyping purposes. We also needed to model this data in the way shown on the slide. We did this in conjunction with our development partners, Ontotext AD and 67 Bricks. We had a bit of a head start because we already own an engineering and technology taxonomy, the Inspec Thesaurus, which contains around 20,000 terms, manually compiled by subject matter experts over the last 40 years. But we also knew that to do what we wanted to do with the prototype, we needed to model a much broader set of concepts and relationships including people, organisations, places, activities, products and publications. And each of these entities or sets of things would need to be subdivided into subsets and subsets of subsets until we reached a level of granularity where each entity within a set had the same attributes and relationships associated with them. Just to give you an idea of how powerful this way of arranging your data is, the simple structure on this slide would allow a system to answer quite complex questions like: “Can you give me the names and email addresses of people working for organisations who develop wind turbines? An example closer to home requires another set of entities, namely publications. This would then allow us to answer questions like: “Who were the top contributors in the field of wind turbine motor research between 2012 and 2013?” or “is wind turbine motor research a growth area?”. If we then add patent data into the mix we can start to answer even more complex questions like “wind turbine research was identified as a research growth area in 2010, how may patents have been filed since then?” In other words, we can use our engineering domain model to assess how quickly academic research leads to product innovation. So let’s go back to the prototype and see some of the questions and answers in context.
This slide shows the answer to the first question in the list: Who is carrying out research into technology x (and related fields)? Because of the entities we chose to model and the relationships we made between them we were able to rank authors both in terms of number of publications over time as well as by number of citations. In addition, we used the Geonames API to enable us to plot the location of authors based on the address of the organisation they were most recently affiliated to. We also pulled in tweets based on the search term as well as creating profiles for the people and organisations we identified.
And here’s an example of a profile: We have provided this gentleman’s email address so he can be contacted We have identified those areas he specialises in And we have provided links to individuals associated with the organisation he is most recently linked to as well as a list of organisations he himself has been associated with over time. On the second tab we have provided abstracts for all his publications as well as an individual tag cloud. And on the third tab we have linked to patent abstracts where he is identified as one of the Authors Because of the way the data is stored, every reference to an author or an organisation is also a link to their profile. We are still exploring how to extend person profiles to include social media and news but disambiguating people and especially linking people to their twitter feeds is a hard problem and requires manual input and QA.
Profiles of organisations were a little easier to disambiguate and so have more detail in them. For example, we were able to provide additional contact information, company logos and descriptions as well as links to social media and news information scraped for the companies’ websites. We also investigated the possibility of listing similar organisations based on groups of tags – results were mixed and this area of functionality still needs further work! Finding lists of products was also hard but we built a scraping tool to help us and did some manual QA and editing on the results. Finally, we included some third party data sets by linking to the Open Corporates API for company and financial data and to a number of sources for patent data including CambridgeIP, American open source patent data and the European patent office API. The main learning points for us were that while it is possible to create profiles based on Inspec data these are very limited and to make them really useful you need to combine this data with many other data sources as well. You also need to manually complete profiles and to regularly review and update them.
Moving on, the second question we wanted to answer was: Is research into technology x (and related fields) increasing or decreasing? As you can see on the slide we provided this for individual technical concepts and then also for those concepts related to the original by the various relationships modelled in the Inspec Thesaurus. Once again, we produced a list of publication abstracts with links to the full text and to the profiles of individuals and organisations identified in the abstracts.
Because we had decided to focus on renewable energy our next application is more obviously focussed in this area. It was built to answer two questions: What companies specialise in technology x near location y? and what companies near location y operate/develop/manufacture or install technology x? The IET already has a directory product, E&T Marketplace, which is compiled in a more traditional manner. We were keen to explore what could be done by a combination of web scrapping and semantic enrichment techniques to reduce production costs and add additional functionality. As you might expect, the results weren’t clear cut. We found identifying relevant companies was easy, but extracting product information was hard. We were able to pull out interesting and relevant filters, identify relevant social media and news feeds and incorporate 3rd party data sets but the information we extracted again needed some manual curation and would obviously require regular updating to remain relevant.
And this is the third application we built to answer the last two questions on my original list: is location y suitable for installing renewable technology x? and can you show me information about operating renewable technology x installations near to location y? This was the one that we developed in conjunction with RenewableUK, a trade association specialising in supporting wind energy technologies. The app is an interactive version of a pdf brochure they currently publish on their website. Once again, we incorporated a number of third party data sets including open data published by the UK government on average wind speeds per square kilometre for the entire UK. The idea here was to allow users to see if there was any point in considering installing a wind turbine at their address. If it was a possibility, then to help them hire a low cost wind gauge to check the reality themselves and then, if the results over three months were promising, to put them in touch with a local installer. We also used more open data to create a searchable database of all UK wind turbine installations including output statistics and information on the operators and owners. Contact details are provided to enable individuals or organisations considering installing wind turbines to get in touch with existing owners for advice on planning, which can be fraught, tariffs etc.
Once we’d completed the prototype we held workshops with all the organisations that had provided us with content. The aim of these workshops was to review the prototype and to gauge their interest in creating a production version. What was really interesting was that while they were interested in doing this they were more interested in the technology they had produced these applications and how it might help them to understand their own content and the markets that they operate in themselves. This has led us to consider providing a number of services in advance of producing full blown knowledge hubs. A pre-trained enrichment module based on the engineering domain model we are developing from our class leading Inspec thesaurus A hosted version of this service so you don’t even need to do the QA A set of content analytics applied to your data And a further set of engineering market analytics enabling you to compare your content to your competitors content In the latter two cases you don’t even have to understand anything about semantic enrichment in order to benefit from it and you don’t need to re-organise the way you store your data. We would take care of that for you.
We believe that these services and the combination of automated enrichment and manual QA and editorial services would allow you as publishers to improve your content acquisition strategies, to develop your peer review networks, to provide real time notifications to authors and even to increase the number of citations of your publications by pro-actively suggesting citations opportunities to researchers in relevant communities.
A few final thoughts then. Here at the IET we are really excited about what is possible when you mix semantic enrichment software with a domain model for engineering and an existing indexing and QA service. The work we have done on our prototype suggests that this framework applied to a breadth of content types and sources really can create truly linked data which will make a real difference to end users be they academic researchers, product managers, practising engineers or consultants or indeed, anyone else working in engineering who has a question that needs answering.
My final message to you all, therefore, is this: if you have engineering data and would like to discover more about what it could tell you, either in isolation or in conjunction with the wider Inspec engineering Universe then PLEASE get in touch with either myself, David Smith or Daniel Smith. We would love you to be part of our journey.

SSP Presentation V2.0

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (18)

Andere mochten auch

Andere mochten auch (12)

Ähnlich wie SSP Presentation V2.0

Ähnlich wie SSP Presentation V2.0 (20)

SSP Presentation V2.0

Hinweis der Redaktion