AI-SDV 2020: Special Hypertext Information Treatment in is Special Hypertext Information Treatment out Willem-Geert Lagemaat (Lighthouse IP Group, The Netherlands)
With all new technologies and intelligence one may think that all information issues will be solved in the (near) future. However, one of the most fundamental issues at hand is that with the lack of reliable, quality information there is no useable output to work with in the first place. This presentation looks at the global challenges that we are still faced with today relating to content that will keep us from truly intelligent discovery in the future if nothing is done.
Ähnlich wie AI-SDV 2020: Special Hypertext Information Treatment in is Special Hypertext Information Treatment out Willem-Geert Lagemaat (Lighthouse IP Group, The Netherlands)
Top 10 Information technology trend 2022.docxAdvance Tech
Ähnlich wie AI-SDV 2020: Special Hypertext Information Treatment in is Special Hypertext Information Treatment out Willem-Geert Lagemaat (Lighthouse IP Group, The Netherlands) (20)
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
AI-SDV 2020: Special Hypertext Information Treatment in is Special Hypertext Information Treatment out Willem-Geert Lagemaat (Lighthouse IP Group, The Netherlands)
2. Why the title of this presentation…?
Special Hypertext Information Treatment in
is
Special Hypertext Information Treatment out
Special Hypertext Information Treatment in
is
Special Hypertext Information Treatment out
3. What are we going to talk about:
(Focussing on patents)
- Technology development and our habits of using
- New generation of users and expectation
- AI development
- Training models
- The essence of data
- To ignore or not to ignore
- Data completeness
- Data issues
- Road ahead
- Conclusion
4.
5. Technology development and our habits of using
- Many new developments
- Adoption of new tools and the usages
- Our acceptance is changing
- Accepting all sorts of data usage, we even give it away
- Free data – never free data (the coffee example)
- We rely on apps and technology and trust data almost without questioning
- With new technoloy our emotion wins from our rational
6.
7. New generation of users and expectation
The generation of users that comes next is even more based on the acceptance that
things are there and always have been there.
This year and last year many senior people from the IP industry have been going into
retirement and with that many years of experience, but more important of
knowledge has walked out of the professional arena.
The challenge is now that the problems that they know to be in the data (so also in
the tools that use that data) is no longer there. And that there is a new generation of
users who think that data just is and always has been, without understanding the
issues that still exist. With that the needed economical power and willingness to
invest into improving the issues that exit dissapears.
8.
9. AI development
So there is this great buzz around AI. AI is the next world wonder, and it will solve all
our problems. It is so great it does things we cant understand.
First of all, AI is invented by people. Not by machines. The machines do what they
are trained to do. And then they come up with things that are logically explainable.
So if we train AI on datasets that are incomplete, they are not going to create reliable
results.
10.
11. Training models
A little background on AI and tech:
- All tech is trained used existing data
- All tech is trained using relationships build on existing data
- Existing data by nature is the definition of limitation ➔ No data, no tech
- Training models form the backbone of new developments
12.
13. The essence of data
Knowing that all tech information and knowledge is defined by the data used to train
it, the essence of any new tech is defined by the data used.
So data defines the reliability, durability, maintainability and stabilty of any and all
new tech. Therefore the essence of tech is data. So the essence is data.
And I am not trying to do the Telegraaf reasoning: I fit in my coat, my coat fits in my
bag therefore I fit in my bag….
But the essence is data. All great things aside, the inspiring interfaces, the great
representations and insights, they are all very impressive. But the essence is data.
14.
15. To ignore or not to ignore
So the essence is data, and there is sooooo much data. There are over 130 million
patents digitally, so why worry?
There is so much data that it must be complete. But it is not complete. However, that
only covers a few percent. So that leads to the question, is this important enough to
invest serious time on.
1% of 130 million is 1,3 million….
16.
17. Data completeness
Data completeness covers two major components.
The first one is: Do we have all publications?
The second one is: Is every publication complete?
Both of these topics are important, where the first is the most important of these
two. We are a point where everyone feels that all that should be there is there. But
that is not the case.
18. Data issues
Focussing particularly on patents for this presentation, some examples of issues that
currently still occur. The below is an example provided by the developers of a new AI
based tool, for whom these issues are a major blocker.
This data is provided by the relevant patenting authority, and the copy of the original
that is avaiable is the only version that can be accessed.
19.
20. Road ahead
Years ago I had a clash with the a patent office on patent archives. I wrote an article
called the silent threat. The issue was that I was concerned about the deletion of
paper archives without the security that there were high quality digital versions
available. The problem is that today that problem is as big as it was 15 years ago.
With today’s expectation of the availability of global content the issue is actually
bigger then it was before. As the example from the page before was from one of the
most reputable and digitally focussed patent offices.
So:
- Data is still incomplete
- Priority has to be given to make particularly patent archives reliable and complete
- Investments will have to be done, but the challenge is who will pay the bill
- Users need to be aware of the gaps and issues
- Users need to understand that data requires investment and that they will have to
be part of the solution
- If we do not take these issues serious, there will only be more crapy data resulting
in unreliable results – but they will look great!
22. Conclusion
The essence of any technology development….
is embedded in the quality of the data it is using.
Simply put….
However great the technology is….this ➔
Can never turn into this ➔
23. Conclusion – ctd -
There is a need for an agency to take up the task at hand
and to analyse and secure that missing, bad quality and
incomplete patent specifications are retrieved (if
needed from the patent applicant who (if the office
doesn’t have a good version) would be the last resort
that should have a good version. This is a mammoth
task, but it is essential, as the number of cases covers
hunderds of thousands of records* and with that covers
significant state of the art. This should cover at least the
PCT Minimum documentation standard to secure that
the PCT minimum documentation actually represents all
relevant data.
* Based on random sampling of 5,000 records over the past 15 years