SlideShare ist ein Scribd-Unternehmen logo
1 von 55
May 31st, 2013 First SICSA MMI Information Retrieval Workshop
Looking beyond plain text for
document representation in
the enterprise
Arjen P. de Vries
arjen@acm.org
Centrum Wiskunde & Informatica
Delft University of Technology
Spinque B.V.
Outline
 Motivation
 Mixed structured and unstructured
sources
 Search by strategy
 Equip
 Open ends
Enterprise Information Needs
Hang Li et al. A new approach to intranet search based on information extraction. CIKM’05
Strategic and business
development needs
 What funding schemes are the primary source
of income?
 E.g., can we move to Europe when Dutch funding
dries up?
 Who has active relations with partner X?
 “Valorisation”; new national funding requirements
 What industry sectors do we depend upon?
 E.g., how many projects in smart cities? Green
energy? Cloud computing? Etc.
 How are strategic decisions implemented?
 E.g., has objective “move from Telecom toward ICT”
been achieved, and how does it develop over time?
A week in the life
Date: Wed, 15 May 2013 15:14:49 +0200
From: Theme Coordinator “INFORMATION”
To: Group Leaders Information Theme
Subject: List of company relations for internal CWI
distribution
Dear Information Theme Group Leaders,
The theme coordinators have been asked whether they: "een
lijstje kan maken met de bedrijfscontacten en daarbij aan te
geven van welke aard de contacten zijn".
Could you send me the names of Dutch companies you are currently
working with or have worked with in the recent past by the end
of Friday 17th May.
The Theme Coordinator
Date: Fri, 24 May 2013 11:33:04 +0200
From: Theme Coordinator Life Sciences
To: Group Leaders Life Sciences Team
Subject: Life Sciences: contacts with NL companies?
Dear all,
The CWI themes are currently collecting all contacts we have
with Dutch industry and companies (but also hospitals and TNO
etc.) in order to get an overview. I am doing this for
the theme "Life Sciences".
Can you please send me a list of your contacts with short
description?
Life Sciences Theme Coordinator
From: Project Leader Project X
Date: Sun, 26 May 2013 17:34:15 +0200
To: Project X
Subject: [Project X: 33] @WP-leiders
X-BeenThere: Project X @ Y.org
Beste WP-leiders,
Ik kreeg van Het Programma Management het volgende verzoek:
> Mag ik je vragen me een lijstje te sturen van welk EU
onderzoek en welk internationaal onderzoek er loopt bij de
partners gerelateerd aan Project X (internationale inbedding).
Dit is mijn meest urgente punt. Kunnen jullie zsm aan mij sturen
een lijstje met de volgende punten:
- lijst van lopende EU projecten waarbij mensen uit jouw WP
betrokken zijn; geef aub aan wi de partners zijn,
financieringsbron, of het een STREP (of NoE of ...) is, en of
jouw WP een participant of coordinator levert;
- lijst van aangevraagde EU projecten, met zelfde extra's
- lijst van eventuele andere internationale samenwerkingen die
niet door een formeel project zijn afgedekt
Stuur me de lijstjes aub zsm maar niet later dan dinsdag
18u. Bedankt voor jullie hulp. De Projectleider
Surely, academia is not like…
The High Cost of Not Finding Info
 If you employ 1000 knowledge workers:
 50% of content unindexed  $2.5
million/year
 6.25% of effort is spent reproducing
information that already exists 
$5 million/year
 Knowledge workers spend 15-25% of
their time on non-productive
information-related activities
Feldman and Sherman.
IDC Technical Report #29127, 2003
Butler Group Report: Enterprise Search and Retrieval. Oct-2006
“many organisations are frittering away up to 10% of their staff
costs on wasted effort because employees simply can’t find
the right information to do their jobs.”
So… “the real world”
 “Real” companies (as opposed to
academic institutions) attempt to address
these information needs a priori, by
setting up a Customer Relationship
Management system (CRM)
Shan L. Pan and Jae-Nam Lee, "Using e-CRM for a unified view of
the customer", Communications of the ACM 46(4) (2003): 95-99
However…
 So-called “Professionals” are well known
to focus on their own expertise
 They do not have (or take) the time to
maintain adequate descriptions of their
network, skills, projects etc. – neither for
most other types of “management
overhead”
We only need to organize ourselves!!
Funding Proposals
 Proposals submitted (are supposed to)
pass by the faculty’s (TUD) “contract
managers” or the institute’s (CWI)
“project bureau”
 E.g., checks for liability, IPR and valid budget
 Proposal and (partial) metadata are added to
a content management system (CMS)
 The CMS used at my faculty at TUD is DECOS; a
few other faculties plan to use Microsoft
Sharepoint; CWI deploys BSCW
Step 1
 Index all the proposals submitted with
your favourite IR system
Incompleteness
 The DECOS metadata entered is usually
incomplete from the start
 For many projects for example, only the coordinator
is entered as partner
 Also, a proposal’s metadata does not reflect
subsequent change; e.g., as in PuppyIR:
 People hired after funding secured
 Partner change when key person moved job
 Teams evolved
 Priorities shifted
 New tasks introduced and tasks (re-)assigned
 …
Incompleteness
 In general:
 A project’s proposal or even the contract
seldomly represents the project’s exact future
Inaccuracy
 Key information necessary for strategy &
business development scenarios missing
 Adding those is error-prone
 Infer domain (big data, green energy, cloud
computing, …) from keywords or content
 Extract names automatically
 Copy amounts manually; inconsistencies in
tables in proposal text are not uncommon
Incomplete & inaccurate Data
 Ambiguity
 When describing domain, e.g., cloud
computing vs. clouds in environmental models
 Names of people and companies involved
 Typos & OCR mistakes
 Entity resolution
 Amounts of funding per partner, own
contribution
 Funding request may not equal funding
received
The real world to rescue (1)
 Not much work gets done without
payments…
ERP
 All large organisations deploy Enterprise
Resource Planning (ERP) systems
 Typical modules include accounting, human
resources, manufacturing, and logistics
 ERP integrates the modules, data
storing/retrieving processes, and
management and analysis functionalities
 Baan, Oracle, PeopleSoft, SAP, …
More complete and more
accurate data from ERP
 Financial details of each project as executed
 Project leader
 People who are reimbursed from the project
 Exact duration of project activities
 ...
Step 2
 Index all the ERP data with your favourite
IR system
 Link the ERP project identifiers to the CMS
proposal identifiers
 Surprisingly, an n:m relationship…
DB +
The real world to rescue (2)
Institutional Repository
 Publication metadata helps validate
existing (and may even extend) the
management info required:
 Authors
 Author affiliations
 Projects and funding schemes (from
acknowledgements)?
 Again incomplete data though…
 Especially my faculty notoriously bad at
maintaining their part of the institutional
repository
Step 3
 Crawl the Institutional Repository using
the Open Archives Initiative (OAI)
harvesting protocol
 Index all the publications data with your
favourite DB + IR system
 Relate projects to publications by author
name, similar title, etc.
Result: Unified Access
 Proposals
 from an XML dump of the CMS
 Actual project administration
 from CSVs extracted from ERP
 Publications
 crawled using OAI, from the IRP
Schema
Heterogeneous content!
 BAAN-project (ERP)
 Decos-project (CMS)
 Decos-document (CMS attachments)
 Publication (Institutional Repository)
 Publication-document (Institutional Repository PDFs)
 Person (adress lists, ERP + CMS mentions)
 Company (CMS + ERP + document mentions)
 Subsidy (CMS)
 Department (address lists, CMS)
 Web addresses (extracted from documents)
 Topic (assigned to publications)
 Research programme (dependent on funding scheme)
Schema V2
How to search that graph???!
 Rank (un-/semi-)structured data to deal
with incompleteness & inaccuracies
 Structured data representation for
attributes including project revenu,
people’s names, starting dates, etc.
 Use cases varying from “expert search” to
“data cleaning” and “visual analytics”
Search by Strategy
 First, visually construct search strategies
by connecting “building blocks”
Search by Strategy
 First, visually construct search strategies
by connecting “building blocks”
 Next, generate the search engine specified
by that search strategy
Strategies: DB+IR query plans
 Database
Spinque: RDBMS (MonetDB)
BB1(in1,in2,in3, u1,u2)
in1 in2 in3
out
BB2(in1)
in1
out
• Data flow
Spinque: strategy
• Query: strategy made operational
Spinque: PRA
CREATE VIEW a AS
SELECT ..
CREATE VIEW b AS
SELECT ..
CREATE VIEW c AS
SELECT ..
Strategy
Relational DB
Probabilistic Relational Algebra
Strategy
Relational DB
• SQL
explicit probabilities
CREATE VIEW x AS
SELECT a1, a3,
1-prod(1-prob) AS prob
FROM y
GROUP BY a1, a3;
• PRA: probabilistic
relational algebra
(Fuhr and Roelleke,
TOIS 2001)
x = Project DISTINCT
[$1,$3](y);
Rank by Text
Expert Finding
Search User Interface
Search results
Result List Interactions
 Zoom in on item using “+”:
 Open item in left pane
 Shows results of item as query, using a
result-type specific search strategy
 Goal to provide contextually most related nodes
from underlying graph
 Marking any item red/yellow/green for
later usage
Browse by facet
Strategic and business
development needs
 What are our industry relations?
 Who of these partners collaborate with
more than one group?
 What funding schemes support these
collaborations?
Note: relations between partners and departments, edge strength represents revenue
Note: relations between partners and departments, edge strength represents revenue
Multi party relations
Grouping of external relations
Foreign
Univ.
NL Univ.
Funding
agency
Public NL
Public
foreign
Private
sector
Multi party relations
Grouping of external relations
Foreign
Univ.
NL Univ.
Funding
agency
Public NL
Public
foreign
Private
sector
Note: External relations with at least two departments; node size w.r.t. number of relations
Initial Findings
 The integrated search helps improve
recall, reducing the effort involved and
leading to higher quality analyses
 Many things that could be done even
more automatically (albeit not perfectly)
seem less important than expected
 We use very simple rules to extract URIs and
companies; no information extraction yet
 Information professional will always look into
results in detail
Open issues
 Integrate visualization
 Idea: select result list and facet
 Too many facets
 Idea: group facets
 Result explanations
 Idea: describe path through graph
 Entity support ++
Open issues
 What strategy is good? Why?
 Idea: test using past usage data
 What are the right user roles?
 Who should do the searches?
 Who should write strategies?
~ who writes the SQL queries in traditional DB?
 Human in the loop for retrieval, but not
yet for indexing…
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

The “use” of an electronic resource from a social network analysis perspective
The “use” of an electronic resource from a social network analysis perspectiveThe “use” of an electronic resource from a social network analysis perspective
The “use” of an electronic resource from a social network analysis perspectiveMarie Kennedy
 
Cluster Based Web Search Using Support Vector Machine
Cluster Based Web Search Using Support Vector MachineCluster Based Web Search Using Support Vector Machine
Cluster Based Web Search Using Support Vector MachineCSCJournals
 
Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Talis Consulting
 
Faceted Navigation (LACASIS Fall Workshop 2005)
Faceted Navigation (LACASIS Fall Workshop 2005)Faceted Navigation (LACASIS Fall Workshop 2005)
Faceted Navigation (LACASIS Fall Workshop 2005)Bradley Allen
 
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...IRJET Journal
 
Information Retrieval and Social Media
Information Retrieval and Social MediaInformation Retrieval and Social Media
Information Retrieval and Social MediaArjen de Vries
 
An imperative focus on semantic
An imperative focus on semanticAn imperative focus on semantic
An imperative focus on semanticijasa
 
992 sms10 social_media_services
992 sms10 social_media_services992 sms10 social_media_services
992 sms10 social_media_servicessiyaza
 
Future of Journalism - civil discourse technologies
Future of Journalism - civil discourse technologiesFuture of Journalism - civil discourse technologies
Future of Journalism - civil discourse technologiesSimon Buckingham Shum
 
Loops of humans and bots in Wikidata
Loops of humans and bots in WikidataLoops of humans and bots in Wikidata
Loops of humans and bots in WikidataElena Simperl
 
Designing a second generation of open data platforms
Designing a second generation of open data platformsDesigning a second generation of open data platforms
Designing a second generation of open data platformsYannis Charalabidis
 
Social Media Mining: An Introduction
Social Media Mining: An IntroductionSocial Media Mining: An Introduction
Social Media Mining: An IntroductionAli Abbasi
 
Press Kit -LiMoSINe Project
Press Kit -LiMoSINe ProjectPress Kit -LiMoSINe Project
Press Kit -LiMoSINe ProjectLiMoSINe Project
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain
 
Detecting fake news_with_weak_social_supervision
Detecting fake news_with_weak_social_supervisionDetecting fake news_with_weak_social_supervision
Detecting fake news_with_weak_social_supervisionSuresh S
 
A COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBO
A COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBOA COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBO
A COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBOijaia
 
Semantic Security : Authorization on the Web with Ontologies
Semantic Security : Authorization on the Web with OntologiesSemantic Security : Authorization on the Web with Ontologies
Semantic Security : Authorization on the Web with OntologiesAmit Jain
 
Relational Navigation Brings Social Computing and Semantic Technology Computi...
Relational Navigation Brings Social Computing and Semantic Technology Computi...Relational Navigation Brings Social Computing and Semantic Technology Computi...
Relational Navigation Brings Social Computing and Semantic Technology Computi...Bradley Allen
 

Was ist angesagt? (20)

The “use” of an electronic resource from a social network analysis perspective
The “use” of an electronic resource from a social network analysis perspectiveThe “use” of an electronic resource from a social network analysis perspective
The “use” of an electronic resource from a social network analysis perspective
 
Cluster Based Web Search Using Support Vector Machine
Cluster Based Web Search Using Support Vector MachineCluster Based Web Search Using Support Vector Machine
Cluster Based Web Search Using Support Vector Machine
 
Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Linked Data Workshop Stanford University
Linked Data Workshop Stanford University
 
Faceted Navigation (LACASIS Fall Workshop 2005)
Faceted Navigation (LACASIS Fall Workshop 2005)Faceted Navigation (LACASIS Fall Workshop 2005)
Faceted Navigation (LACASIS Fall Workshop 2005)
 
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
 
Information Retrieval and Social Media
Information Retrieval and Social MediaInformation Retrieval and Social Media
Information Retrieval and Social Media
 
An imperative focus on semantic
An imperative focus on semanticAn imperative focus on semantic
An imperative focus on semantic
 
992 sms10 social_media_services
992 sms10 social_media_services992 sms10 social_media_services
992 sms10 social_media_services
 
Future of Journalism - civil discourse technologies
Future of Journalism - civil discourse technologiesFuture of Journalism - civil discourse technologies
Future of Journalism - civil discourse technologies
 
Loops of humans and bots in Wikidata
Loops of humans and bots in WikidataLoops of humans and bots in Wikidata
Loops of humans and bots in Wikidata
 
Designing a second generation of open data platforms
Designing a second generation of open data platformsDesigning a second generation of open data platforms
Designing a second generation of open data platforms
 
Social Media Mining: An Introduction
Social Media Mining: An IntroductionSocial Media Mining: An Introduction
Social Media Mining: An Introduction
 
Press Kit -LiMoSINe Project
Press Kit -LiMoSINe ProjectPress Kit -LiMoSINe Project
Press Kit -LiMoSINe Project
 
Semantic Web - Introduction
Semantic Web - IntroductionSemantic Web - Introduction
Semantic Web - Introduction
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
 
Detecting fake news_with_weak_social_supervision
Detecting fake news_with_weak_social_supervisionDetecting fake news_with_weak_social_supervision
Detecting fake news_with_weak_social_supervision
 
A COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBO
A COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBOA COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBO
A COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBO
 
Semantic Security : Authorization on the Web with Ontologies
Semantic Security : Authorization on the Web with OntologiesSemantic Security : Authorization on the Web with Ontologies
Semantic Security : Authorization on the Web with Ontologies
 
Document(2)
Document(2)Document(2)
Document(2)
 
Relational Navigation Brings Social Computing and Semantic Technology Computi...
Relational Navigation Brings Social Computing and Semantic Technology Computi...Relational Navigation Brings Social Computing and Semantic Technology Computi...
Relational Navigation Brings Social Computing and Semantic Technology Computi...
 

Andere mochten auch

What to do when one size does not fit all?!
What to do when one size does not fit all?!What to do when one size does not fit all?!
What to do when one size does not fit all?!Arjen de Vries
 
20090914 Petamedia Irp5
20090914 Petamedia Irp520090914 Petamedia Irp5
20090914 Petamedia Irp5Arjen de Vries
 
How to build the next 1000 search engines?!
How to build the next 1000 search engines?! How to build the next 1000 search engines?!
How to build the next 1000 search engines?! Arjen de Vries
 
Searching Political Data by Strategy
Searching Political Data by StrategySearching Political Data by Strategy
Searching Political Data by StrategyArjen de Vries
 
How to Search Annotated Text by Strategy?
How to Search Annotated Text by Strategy?How to Search Annotated Text by Strategy?
How to Search Annotated Text by Strategy?Arjen de Vries
 
Combining Rule-based and Information Retrieval Techniques to assign Software ...
Combining Rule-based and Information Retrieval Techniques to assign Software ...Combining Rule-based and Information Retrieval Techniques to assign Software ...
Combining Rule-based and Information Retrieval Techniques to assign Software ...yguarata
 
NTCIR-12 task proposal: Short Text Conversation (STC)
NTCIR-12 task proposal: Short Text Conversation (STC)NTCIR-12 task proposal: Short Text Conversation (STC)
NTCIR-12 task proposal: Short Text Conversation (STC)Tetsuya Sakai
 
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented Information Retrie...
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrie...SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrie...
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented Information Retrie...Pablo Castells
 
Project Proposal Topics Modeling (Ir)
Project Proposal    Topics Modeling (Ir)Project Proposal    Topics Modeling (Ir)
Project Proposal Topics Modeling (Ir)Svitlana volkova
 
Crowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and ApplicationsCrowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and ApplicationsMatthew Lease
 
Proposal of an Advanced Retrieval System for NobleQur'an - Thesis defending
Proposal of an Advanced Retrieval System for NobleQur'an - Thesis defending  Proposal of an Advanced Retrieval System for NobleQur'an - Thesis defending
Proposal of an Advanced Retrieval System for NobleQur'an - Thesis defending Assem CHELLI
 
When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CER...
When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CER...When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CER...
When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CER...Daniel Valcarce
 
Search Me: Designing Information Retrieval Experiences
Search Me: Designing Information Retrieval ExperiencesSearch Me: Designing Information Retrieval Experiences
Search Me: Designing Information Retrieval ExperiencesJoe Lamantia
 
Deep image retrieval learning global representations for image search
Deep image retrieval  learning global representations for image searchDeep image retrieval  learning global representations for image search
Deep image retrieval learning global representations for image searchUniversitat Politècnica de Catalunya
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationArjen de Vries
 
Research Proposal
Research ProposalResearch Proposal
Research Proposaldinsmoor
 
Introduction to Information Retrieval & Models
Introduction to Information Retrieval & ModelsIntroduction to Information Retrieval & Models
Introduction to Information Retrieval & ModelsMounia Lalmas-Roelleke
 
The Research Proposal
The Research ProposalThe Research Proposal
The Research Proposalguest349908
 

Andere mochten auch (20)

What to do when one size does not fit all?!
What to do when one size does not fit all?!What to do when one size does not fit all?!
What to do when one size does not fit all?!
 
20090914 Petamedia Irp5
20090914 Petamedia Irp520090914 Petamedia Irp5
20090914 Petamedia Irp5
 
How to build the next 1000 search engines?!
How to build the next 1000 search engines?! How to build the next 1000 search engines?!
How to build the next 1000 search engines?!
 
Searching Political Data by Strategy
Searching Political Data by StrategySearching Political Data by Strategy
Searching Political Data by Strategy
 
How to Search Annotated Text by Strategy?
How to Search Annotated Text by Strategy?How to Search Annotated Text by Strategy?
How to Search Annotated Text by Strategy?
 
Combining Rule-based and Information Retrieval Techniques to assign Software ...
Combining Rule-based and Information Retrieval Techniques to assign Software ...Combining Rule-based and Information Retrieval Techniques to assign Software ...
Combining Rule-based and Information Retrieval Techniques to assign Software ...
 
NTCIR-12 task proposal: Short Text Conversation (STC)
NTCIR-12 task proposal: Short Text Conversation (STC)NTCIR-12 task proposal: Short Text Conversation (STC)
NTCIR-12 task proposal: Short Text Conversation (STC)
 
Big data hadoop rdbms
Big data hadoop rdbmsBig data hadoop rdbms
Big data hadoop rdbms
 
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented Information Retrie...
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrie...SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrie...
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented Information Retrie...
 
Project Proposal Topics Modeling (Ir)
Project Proposal    Topics Modeling (Ir)Project Proposal    Topics Modeling (Ir)
Project Proposal Topics Modeling (Ir)
 
Crowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and ApplicationsCrowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and Applications
 
Proposal of an Advanced Retrieval System for NobleQur'an - Thesis defending
Proposal of an Advanced Retrieval System for NobleQur'an - Thesis defending  Proposal of an Advanced Retrieval System for NobleQur'an - Thesis defending
Proposal of an Advanced Retrieval System for NobleQur'an - Thesis defending
 
When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CER...
When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CER...When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CER...
When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CER...
 
Search Me: Designing Information Retrieval Experiences
Search Me: Designing Information Retrieval ExperiencesSearch Me: Designing Information Retrieval Experiences
Search Me: Designing Information Retrieval Experiences
 
Deep image retrieval learning global representations for image search
Deep image retrieval  learning global representations for image searchDeep image retrieval  learning global representations for image search
Deep image retrieval learning global representations for image search
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and Recommendation
 
Research Proposal
Research ProposalResearch Proposal
Research Proposal
 
Model Example of Research Proposal
Model Example of Research Proposal Model Example of Research Proposal
Model Example of Research Proposal
 
Introduction to Information Retrieval & Models
Introduction to Information Retrieval & ModelsIntroduction to Information Retrieval & Models
Introduction to Information Retrieval & Models
 
The Research Proposal
The Research ProposalThe Research Proposal
The Research Proposal
 

Ähnlich wie Beyond Plain Text: Unified Enterprise Info Access

KM Initiatives at DePaul University
KM Initiatives at DePaul UniversityKM Initiatives at DePaul University
KM Initiatives at DePaul UniversityKM Chicago
 
Project matching summary_04.02.11_final
Project matching summary_04.02.11_finalProject matching summary_04.02.11_final
Project matching summary_04.02.11_finalSuresh Fernando
 
Project matching summary_04.02.11_final
Project matching summary_04.02.11_finalProject matching summary_04.02.11_final
Project matching summary_04.02.11_finalSuresh Fernando
 
An SDLC for SharePoint
An SDLC for SharePointAn SDLC for SharePoint
An SDLC for SharePointgvaughan
 
Information Architecture: Putting the "I" back in IT
Information Architecture:  Putting the "I" back in ITInformation Architecture:  Putting the "I" back in IT
Information Architecture: Putting the "I" back in ITLouis Rosenfeld
 
Share Point Summit 2010 - Selling SharePoint to Decision Makers
Share Point Summit 2010 - Selling SharePoint to Decision MakersShare Point Summit 2010 - Selling SharePoint to Decision Makers
Share Point Summit 2010 - Selling SharePoint to Decision MakersRich Blank
 
Real World End to End machine Learning Pipeline
Real World End to End machine Learning PipelineReal World End to End machine Learning Pipeline
Real World End to End machine Learning PipelineSrivatsan Srinivasan
 
Pptsample dm km_mis
Pptsample dm km_misPptsample dm km_mis
Pptsample dm km_misLouie AU
 
New Wave Collaboration And Enterprise 2.0
New Wave Collaboration And Enterprise 2.0New Wave Collaboration And Enterprise 2.0
New Wave Collaboration And Enterprise 2.0Daniel Pritchett
 
Keynote@CADE2018_HalukDemirkan
Keynote@CADE2018_HalukDemirkanKeynote@CADE2018_HalukDemirkan
Keynote@CADE2018_HalukDemirkanHaluk Demirkan
 
Capstone Project OverviewThe purpose of this capstone project is.docx
Capstone Project OverviewThe purpose of this capstone project is.docxCapstone Project OverviewThe purpose of this capstone project is.docx
Capstone Project OverviewThe purpose of this capstone project is.docxhumphrieskalyn
 
The K-State Online Canvas LMS Data Portal and Five Years of Activated Third-P...
The K-State Online Canvas LMS Data Portal and Five Years of Activated Third-P...The K-State Online Canvas LMS Data Portal and Five Years of Activated Third-P...
The K-State Online Canvas LMS Data Portal and Five Years of Activated Third-P...Shalin Hai-Jew
 
How do social technologies change knowledge worker business processes km me...
How do social technologies change knowledge worker business processes   km me...How do social technologies change knowledge worker business processes   km me...
How do social technologies change knowledge worker business processes km me...Martin Sumner-Smith
 
Bb0020 managing information
Bb0020  managing informationBb0020  managing information
Bb0020 managing informationsmumbahelp
 
Ea S Presentation Mc D 20090824
Ea S Presentation   Mc D   20090824Ea S Presentation   Mc D   20090824
Ea S Presentation Mc D 20090824Doug McDavid
 
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...Sri Ambati
 
Course structure 108 computers in management
Course structure   108 computers in managementCourse structure   108 computers in management
Course structure 108 computers in managementKinshook Chaturvedi
 

Ähnlich wie Beyond Plain Text: Unified Enterprise Info Access (20)

KM Initiatives at DePaul University
KM Initiatives at DePaul UniversityKM Initiatives at DePaul University
KM Initiatives at DePaul University
 
Project matching summary_04.02.11_final
Project matching summary_04.02.11_finalProject matching summary_04.02.11_final
Project matching summary_04.02.11_final
 
Project matching summary_04.02.11_final
Project matching summary_04.02.11_finalProject matching summary_04.02.11_final
Project matching summary_04.02.11_final
 
ProjeX
ProjeXProjeX
ProjeX
 
An SDLC for SharePoint
An SDLC for SharePointAn SDLC for SharePoint
An SDLC for SharePoint
 
Information Architecture: Putting the "I" back in IT
Information Architecture:  Putting the "I" back in ITInformation Architecture:  Putting the "I" back in IT
Information Architecture: Putting the "I" back in IT
 
Share Point Summit 2010 - Selling SharePoint to Decision Makers
Share Point Summit 2010 - Selling SharePoint to Decision MakersShare Point Summit 2010 - Selling SharePoint to Decision Makers
Share Point Summit 2010 - Selling SharePoint to Decision Makers
 
Real World End to End machine Learning Pipeline
Real World End to End machine Learning PipelineReal World End to End machine Learning Pipeline
Real World End to End machine Learning Pipeline
 
Pptsample dm km_mis
Pptsample dm km_misPptsample dm km_mis
Pptsample dm km_mis
 
New Wave Collaboration And Enterprise 2.0
New Wave Collaboration And Enterprise 2.0New Wave Collaboration And Enterprise 2.0
New Wave Collaboration And Enterprise 2.0
 
Keynote@CADE2018_HalukDemirkan
Keynote@CADE2018_HalukDemirkanKeynote@CADE2018_HalukDemirkan
Keynote@CADE2018_HalukDemirkan
 
Capstone Project OverviewThe purpose of this capstone project is.docx
Capstone Project OverviewThe purpose of this capstone project is.docxCapstone Project OverviewThe purpose of this capstone project is.docx
Capstone Project OverviewThe purpose of this capstone project is.docx
 
The K-State Online Canvas LMS Data Portal and Five Years of Activated Third-P...
The K-State Online Canvas LMS Data Portal and Five Years of Activated Third-P...The K-State Online Canvas LMS Data Portal and Five Years of Activated Third-P...
The K-State Online Canvas LMS Data Portal and Five Years of Activated Third-P...
 
Theme Discussions
Theme DiscussionsTheme Discussions
Theme Discussions
 
How do social technologies change knowledge worker business processes km me...
How do social technologies change knowledge worker business processes   km me...How do social technologies change knowledge worker business processes   km me...
How do social technologies change knowledge worker business processes km me...
 
Bb0020 managing information
Bb0020  managing informationBb0020  managing information
Bb0020 managing information
 
Ea S Presentation Mc D 20090824
Ea S Presentation   Mc D   20090824Ea S Presentation   Mc D   20090824
Ea S Presentation Mc D 20090824
 
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
 
Data Scientist Enablement roadmap 1.0
Data Scientist Enablement roadmap 1.0Data Scientist Enablement roadmap 1.0
Data Scientist Enablement roadmap 1.0
 
Course structure 108 computers in management
Course structure   108 computers in managementCourse structure   108 computers in management
Course structure 108 computers in management
 

Mehr von Arjen de Vries

Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen) Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen) Arjen de Vries
 
Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6) Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6) Arjen de Vries
 
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)Arjen de Vries
 
Web Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineWeb Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineArjen de Vries
 
Information Retrieval intro TMM
Information Retrieval intro TMMInformation Retrieval intro TMM
Information Retrieval intro TMMArjen de Vries
 
ACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC ChairsACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC ChairsArjen de Vries
 
Data Science Master Specialisation
Data Science Master SpecialisationData Science Master Specialisation
Data Science Master SpecialisationArjen de Vries
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big DataArjen de Vries
 
Bigdata processing with Spark - part II
Bigdata processing with Spark - part IIBigdata processing with Spark - part II
Bigdata processing with Spark - part IIArjen de Vries
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with SparkArjen de Vries
 
TREC 2016: Looking Forward Panel
TREC 2016: Looking Forward PanelTREC 2016: Looking Forward Panel
TREC 2016: Looking Forward PanelArjen de Vries
 
The personal search engine
The personal search engineThe personal search engine
The personal search engineArjen de Vries
 
Better Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain KnowledgeBetter Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain KnowledgeArjen de Vries
 
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013Arjen de Vries
 
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Arjen de Vries
 
Twente ir-course 20-10-2010
Twente ir-course 20-10-2010Twente ir-course 20-10-2010
Twente ir-course 20-10-2010Arjen de Vries
 
Context Adaptation in Image Search
Context Adaptation in Image SearchContext Adaptation in Image Search
Context Adaptation in Image SearchArjen de Vries
 

Mehr von Arjen de Vries (19)

Doing a PhD @ DOSSIER
Doing a PhD @ DOSSIERDoing a PhD @ DOSSIER
Doing a PhD @ DOSSIER
 
Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen) Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen)
 
Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6) Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6)
 
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
 
Web Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineWeb Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search Engine
 
Information Retrieval intro TMM
Information Retrieval intro TMMInformation Retrieval intro TMM
Information Retrieval intro TMM
 
ACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC ChairsACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC Chairs
 
Data Science Master Specialisation
Data Science Master SpecialisationData Science Master Specialisation
Data Science Master Specialisation
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big Data
 
Bigdata processing with Spark - part II
Bigdata processing with Spark - part IIBigdata processing with Spark - part II
Bigdata processing with Spark - part II
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with Spark
 
TREC 2016: Looking Forward Panel
TREC 2016: Looking Forward PanelTREC 2016: Looking Forward Panel
TREC 2016: Looking Forward Panel
 
The personal search engine
The personal search engineThe personal search engine
The personal search engine
 
Better Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain KnowledgeBetter Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain Knowledge
 
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
 
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?
 
Twente ir-course 20-10-2010
Twente ir-course 20-10-2010Twente ir-course 20-10-2010
Twente ir-course 20-10-2010
 
Context Adaptation in Image Search
Context Adaptation in Image SearchContext Adaptation in Image Search
Context Adaptation in Image Search
 
Diversity (in Media)
Diversity (in Media)Diversity (in Media)
Diversity (in Media)
 

Kürzlich hochgeladen

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Kürzlich hochgeladen (20)

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

Beyond Plain Text: Unified Enterprise Info Access

  • 1. May 31st, 2013 First SICSA MMI Information Retrieval Workshop Looking beyond plain text for document representation in the enterprise Arjen P. de Vries arjen@acm.org Centrum Wiskunde & Informatica Delft University of Technology Spinque B.V.
  • 2. Outline  Motivation  Mixed structured and unstructured sources  Search by strategy  Equip  Open ends
  • 3. Enterprise Information Needs Hang Li et al. A new approach to intranet search based on information extraction. CIKM’05
  • 4. Strategic and business development needs  What funding schemes are the primary source of income?  E.g., can we move to Europe when Dutch funding dries up?  Who has active relations with partner X?  “Valorisation”; new national funding requirements  What industry sectors do we depend upon?  E.g., how many projects in smart cities? Green energy? Cloud computing? Etc.  How are strategic decisions implemented?  E.g., has objective “move from Telecom toward ICT” been achieved, and how does it develop over time?
  • 5. A week in the life
  • 6. Date: Wed, 15 May 2013 15:14:49 +0200 From: Theme Coordinator “INFORMATION” To: Group Leaders Information Theme Subject: List of company relations for internal CWI distribution Dear Information Theme Group Leaders, The theme coordinators have been asked whether they: "een lijstje kan maken met de bedrijfscontacten en daarbij aan te geven van welke aard de contacten zijn". Could you send me the names of Dutch companies you are currently working with or have worked with in the recent past by the end of Friday 17th May. The Theme Coordinator
  • 7. Date: Fri, 24 May 2013 11:33:04 +0200 From: Theme Coordinator Life Sciences To: Group Leaders Life Sciences Team Subject: Life Sciences: contacts with NL companies? Dear all, The CWI themes are currently collecting all contacts we have with Dutch industry and companies (but also hospitals and TNO etc.) in order to get an overview. I am doing this for the theme "Life Sciences". Can you please send me a list of your contacts with short description? Life Sciences Theme Coordinator
  • 8. From: Project Leader Project X Date: Sun, 26 May 2013 17:34:15 +0200 To: Project X Subject: [Project X: 33] @WP-leiders X-BeenThere: Project X @ Y.org Beste WP-leiders, Ik kreeg van Het Programma Management het volgende verzoek: > Mag ik je vragen me een lijstje te sturen van welk EU onderzoek en welk internationaal onderzoek er loopt bij de partners gerelateerd aan Project X (internationale inbedding). Dit is mijn meest urgente punt. Kunnen jullie zsm aan mij sturen een lijstje met de volgende punten: - lijst van lopende EU projecten waarbij mensen uit jouw WP betrokken zijn; geef aub aan wi de partners zijn, financieringsbron, of het een STREP (of NoE of ...) is, en of jouw WP een participant of coordinator levert; - lijst van aangevraagde EU projecten, met zelfde extra's - lijst van eventuele andere internationale samenwerkingen die niet door een formeel project zijn afgedekt Stuur me de lijstjes aub zsm maar niet later dan dinsdag 18u. Bedankt voor jullie hulp. De Projectleider
  • 9. Surely, academia is not like…
  • 10. The High Cost of Not Finding Info  If you employ 1000 knowledge workers:  50% of content unindexed  $2.5 million/year  6.25% of effort is spent reproducing information that already exists  $5 million/year  Knowledge workers spend 15-25% of their time on non-productive information-related activities Feldman and Sherman. IDC Technical Report #29127, 2003 Butler Group Report: Enterprise Search and Retrieval. Oct-2006 “many organisations are frittering away up to 10% of their staff costs on wasted effort because employees simply can’t find the right information to do their jobs.”
  • 11. So… “the real world”  “Real” companies (as opposed to academic institutions) attempt to address these information needs a priori, by setting up a Customer Relationship Management system (CRM) Shan L. Pan and Jae-Nam Lee, "Using e-CRM for a unified view of the customer", Communications of the ACM 46(4) (2003): 95-99
  • 12.
  • 13. However…  So-called “Professionals” are well known to focus on their own expertise  They do not have (or take) the time to maintain adequate descriptions of their network, skills, projects etc. – neither for most other types of “management overhead”
  • 14. We only need to organize ourselves!!
  • 15. Funding Proposals  Proposals submitted (are supposed to) pass by the faculty’s (TUD) “contract managers” or the institute’s (CWI) “project bureau”  E.g., checks for liability, IPR and valid budget  Proposal and (partial) metadata are added to a content management system (CMS)  The CMS used at my faculty at TUD is DECOS; a few other faculties plan to use Microsoft Sharepoint; CWI deploys BSCW
  • 16.
  • 17. Step 1  Index all the proposals submitted with your favourite IR system
  • 18. Incompleteness  The DECOS metadata entered is usually incomplete from the start  For many projects for example, only the coordinator is entered as partner  Also, a proposal’s metadata does not reflect subsequent change; e.g., as in PuppyIR:  People hired after funding secured  Partner change when key person moved job  Teams evolved  Priorities shifted  New tasks introduced and tasks (re-)assigned  …
  • 19. Incompleteness  In general:  A project’s proposal or even the contract seldomly represents the project’s exact future
  • 20. Inaccuracy  Key information necessary for strategy & business development scenarios missing  Adding those is error-prone  Infer domain (big data, green energy, cloud computing, …) from keywords or content  Extract names automatically  Copy amounts manually; inconsistencies in tables in proposal text are not uncommon
  • 21. Incomplete & inaccurate Data  Ambiguity  When describing domain, e.g., cloud computing vs. clouds in environmental models  Names of people and companies involved  Typos & OCR mistakes  Entity resolution  Amounts of funding per partner, own contribution  Funding request may not equal funding received
  • 22. The real world to rescue (1)  Not much work gets done without payments…
  • 23. ERP  All large organisations deploy Enterprise Resource Planning (ERP) systems  Typical modules include accounting, human resources, manufacturing, and logistics  ERP integrates the modules, data storing/retrieving processes, and management and analysis functionalities  Baan, Oracle, PeopleSoft, SAP, …
  • 24. More complete and more accurate data from ERP  Financial details of each project as executed  Project leader  People who are reimbursed from the project  Exact duration of project activities  ...
  • 25. Step 2  Index all the ERP data with your favourite IR system  Link the ERP project identifiers to the CMS proposal identifiers  Surprisingly, an n:m relationship… DB +
  • 26. The real world to rescue (2)
  • 27. Institutional Repository  Publication metadata helps validate existing (and may even extend) the management info required:  Authors  Author affiliations  Projects and funding schemes (from acknowledgements)?  Again incomplete data though…  Especially my faculty notoriously bad at maintaining their part of the institutional repository
  • 28. Step 3  Crawl the Institutional Repository using the Open Archives Initiative (OAI) harvesting protocol  Index all the publications data with your favourite DB + IR system  Relate projects to publications by author name, similar title, etc.
  • 29. Result: Unified Access  Proposals  from an XML dump of the CMS  Actual project administration  from CSVs extracted from ERP  Publications  crawled using OAI, from the IRP
  • 31. Heterogeneous content!  BAAN-project (ERP)  Decos-project (CMS)  Decos-document (CMS attachments)  Publication (Institutional Repository)  Publication-document (Institutional Repository PDFs)  Person (adress lists, ERP + CMS mentions)  Company (CMS + ERP + document mentions)  Subsidy (CMS)  Department (address lists, CMS)  Web addresses (extracted from documents)  Topic (assigned to publications)  Research programme (dependent on funding scheme)
  • 33. How to search that graph???!  Rank (un-/semi-)structured data to deal with incompleteness & inaccuracies  Structured data representation for attributes including project revenu, people’s names, starting dates, etc.  Use cases varying from “expert search” to “data cleaning” and “visual analytics”
  • 34. Search by Strategy  First, visually construct search strategies by connecting “building blocks”
  • 35. Search by Strategy  First, visually construct search strategies by connecting “building blocks”  Next, generate the search engine specified by that search strategy
  • 36. Strategies: DB+IR query plans  Database Spinque: RDBMS (MonetDB) BB1(in1,in2,in3, u1,u2) in1 in2 in3 out BB2(in1) in1 out • Data flow Spinque: strategy • Query: strategy made operational Spinque: PRA CREATE VIEW a AS SELECT .. CREATE VIEW b AS SELECT .. CREATE VIEW c AS SELECT .. Strategy Relational DB
  • 37. Probabilistic Relational Algebra Strategy Relational DB • SQL explicit probabilities CREATE VIEW x AS SELECT a1, a3, 1-prod(1-prob) AS prob FROM y GROUP BY a1, a3; • PRA: probabilistic relational algebra (Fuhr and Roelleke, TOIS 2001) x = Project DISTINCT [$1,$3](y);
  • 40.
  • 43. Result List Interactions  Zoom in on item using “+”:  Open item in left pane  Shows results of item as query, using a result-type specific search strategy  Goal to provide contextually most related nodes from underlying graph  Marking any item red/yellow/green for later usage
  • 44.
  • 45.
  • 47.
  • 48. Strategic and business development needs  What are our industry relations?  Who of these partners collaborate with more than one group?  What funding schemes support these collaborations?
  • 49. Note: relations between partners and departments, edge strength represents revenue
  • 50. Note: relations between partners and departments, edge strength represents revenue
  • 51. Multi party relations Grouping of external relations Foreign Univ. NL Univ. Funding agency Public NL Public foreign Private sector Multi party relations Grouping of external relations Foreign Univ. NL Univ. Funding agency Public NL Public foreign Private sector Note: External relations with at least two departments; node size w.r.t. number of relations
  • 52. Initial Findings  The integrated search helps improve recall, reducing the effort involved and leading to higher quality analyses  Many things that could be done even more automatically (albeit not perfectly) seem less important than expected  We use very simple rules to extract URIs and companies; no information extraction yet  Information professional will always look into results in detail
  • 53. Open issues  Integrate visualization  Idea: select result list and facet  Too many facets  Idea: group facets  Result explanations  Idea: describe path through graph  Entity support ++
  • 54. Open issues  What strategy is good? Why?  Idea: test using past usage data  What are the right user roles?  Who should do the searches?  Who should write strategies? ~ who writes the SQL queries in traditional DB?  Human in the loop for retrieval, but not yet for indexing…