Read more here: http://pingar.com/
This paper reports findings on desirable interface features for different
search tasks in the biomedical domain. We conducted a user study where
we asked bioscientists to evaluate the usefulness of autocomplete, query
expansions, faceted refinement, related searches and results preview
implementations in new pilot interfaces and publicly available systems
while using baseline and their own queries. Our evaluation reveals that
there is a preference for certain features depending on the search task.
In addition, we touch on the current pain point of faceted search: the
acquisition of faceted subject metadata for unstructured documents.
We found a strong preference for prototypes displaying just a few facets
generated based on either the query or the matching documents.
This document discusses using a genetic algorithm to improve search visibility by expanding user queries. It explains that genetic algorithms can be applied to information retrieval by representing candidate solutions as chromosomes, evaluating their fitness, and evolving new generations through selection, crossover and mutation. The paper presents previous work applying genetic algorithms for query expansion and relevance feedback. It then describes the experiment conducted to implement a genetic algorithm over 500 generations to select optimal keywords for expanding queries and evaluate the approach on sample query results.
Open domain question answering system using semantic role labelingeSAT Publishing House
1. The document describes a proposed open domain question answering system that uses semantic role labeling to extract answers from documents retrieved from the web.
2. The system consists of three modules: question processing, document retrieval, and answer extraction. Semantic role labeling is used in the answer extraction module to identify answers based on the question type.
3. An evaluation of the proposed system showed it achieved higher accuracy compared to a baseline system using only pattern matching for answer extraction.
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
The first element of the search process is the query.
The user query being on an average restricted to two or three
keywords makes the query ambiguous to the search engine.
Given the user query, the goal of an Information Retrieval
[IR] system is to retrieve information which might be useful
or relevant to the information need of the user. Hence, the
query processing plays an important role in IR system.
The query processing can be divided into four categories
i.e. query expansion, query optimization, query classification and
query parsing. In this paper an attempt is made to evaluate the
performance of query processing algorithms in each of the
category. The evaluation was based on dataset as specified by
Forum for Information Retrieval [FIRE15]. The criteria used
for evaluation are precision and relative recall. The analysis is
based on the importance of each step in query processing. The
experimental results show that the significance of each step
in query processing and also the relevance of web semantics
and spelling correction in the user query.
Query Recommendation by using Collaborative Filtering ApproachIRJET Journal
This document proposes a system called QDMiner to mine query facets from the top search results for a query. It uses collaborative filtering techniques to recommend the top-k results that are most relevant to a user's interests.
QDMiner first retrieves the top search results from a search engine. It then mines frequent lists from the HTML tags and free text within the results to identify query facets. It groups common lists and ranks the facets and items based on their appearances. QDMiner represents the search results in two models: the Unique Website Model and Context Similarity Model, to order the query facets.
To recommend results, QDMiner uses collaborative filtering techniques including item-based and user-based
A Survey on Automatically Mining Facets for Queries from their Search ResultsIRJET Journal
This document summarizes research on automatically mining query facets from search results. Query facets provide useful summaries of a query by grouping related terms and phrases. The document reviews existing methods for query recommendation and facet extraction. It also proposes an unsupervised technique to mine query facets from top search results without additional domain knowledge. The technique aims to help users better understand queries and explore information through faceted search.
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : ...IJNSA Journal
In health research, one of the major tasks is to retrieve, and analyze heterogeneous databases containing
one single patient’s information gathered from a large volume of data over a long period of time. The
main objective of this paper is to represent our ontology-based information retrieval approach for
clinical Information System. We have performed a Case Study in the real life hospital settings. The results
obtained illustrate the feasibility of the proposed approach which significantly improved the information
retrieval process on a large volume of data over a long period of time from August 2011 until January
2012
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...ijdmtaiir
-In this study a comprehensive evaluation of two
supervised feature selection methods for dimensionality
reduction is performed - Latent Semantic Indexing (LSI) and
Principal Component Analysis (PCA). This is gauged against
unsupervised techniques like fuzzy feature clustering using
hard fuzzy C-means (FCM) . The main objective of the study is
to estimate the relative efficiency of two supervised techniques
against unsupervised fuzzy techniques while reducing the
feature space. It is found that clustering using FCM leads to
better accuracy in classifying documents in the face of
evolutionary algorithms like LSI and PCA. Results show that
the clustering of features improves the accuracy of document
classification
This document discusses using a genetic algorithm to improve search visibility by expanding user queries. It explains that genetic algorithms can be applied to information retrieval by representing candidate solutions as chromosomes, evaluating their fitness, and evolving new generations through selection, crossover and mutation. The paper presents previous work applying genetic algorithms for query expansion and relevance feedback. It then describes the experiment conducted to implement a genetic algorithm over 500 generations to select optimal keywords for expanding queries and evaluate the approach on sample query results.
Open domain question answering system using semantic role labelingeSAT Publishing House
1. The document describes a proposed open domain question answering system that uses semantic role labeling to extract answers from documents retrieved from the web.
2. The system consists of three modules: question processing, document retrieval, and answer extraction. Semantic role labeling is used in the answer extraction module to identify answers based on the question type.
3. An evaluation of the proposed system showed it achieved higher accuracy compared to a baseline system using only pattern matching for answer extraction.
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
The first element of the search process is the query.
The user query being on an average restricted to two or three
keywords makes the query ambiguous to the search engine.
Given the user query, the goal of an Information Retrieval
[IR] system is to retrieve information which might be useful
or relevant to the information need of the user. Hence, the
query processing plays an important role in IR system.
The query processing can be divided into four categories
i.e. query expansion, query optimization, query classification and
query parsing. In this paper an attempt is made to evaluate the
performance of query processing algorithms in each of the
category. The evaluation was based on dataset as specified by
Forum for Information Retrieval [FIRE15]. The criteria used
for evaluation are precision and relative recall. The analysis is
based on the importance of each step in query processing. The
experimental results show that the significance of each step
in query processing and also the relevance of web semantics
and spelling correction in the user query.
Query Recommendation by using Collaborative Filtering ApproachIRJET Journal
This document proposes a system called QDMiner to mine query facets from the top search results for a query. It uses collaborative filtering techniques to recommend the top-k results that are most relevant to a user's interests.
QDMiner first retrieves the top search results from a search engine. It then mines frequent lists from the HTML tags and free text within the results to identify query facets. It groups common lists and ranks the facets and items based on their appearances. QDMiner represents the search results in two models: the Unique Website Model and Context Similarity Model, to order the query facets.
To recommend results, QDMiner uses collaborative filtering techniques including item-based and user-based
A Survey on Automatically Mining Facets for Queries from their Search ResultsIRJET Journal
This document summarizes research on automatically mining query facets from search results. Query facets provide useful summaries of a query by grouping related terms and phrases. The document reviews existing methods for query recommendation and facet extraction. It also proposes an unsupervised technique to mine query facets from top search results without additional domain knowledge. The technique aims to help users better understand queries and explore information through faceted search.
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : ...IJNSA Journal
In health research, one of the major tasks is to retrieve, and analyze heterogeneous databases containing
one single patient’s information gathered from a large volume of data over a long period of time. The
main objective of this paper is to represent our ontology-based information retrieval approach for
clinical Information System. We have performed a Case Study in the real life hospital settings. The results
obtained illustrate the feasibility of the proposed approach which significantly improved the information
retrieval process on a large volume of data over a long period of time from August 2011 until January
2012
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...ijdmtaiir
-In this study a comprehensive evaluation of two
supervised feature selection methods for dimensionality
reduction is performed - Latent Semantic Indexing (LSI) and
Principal Component Analysis (PCA). This is gauged against
unsupervised techniques like fuzzy feature clustering using
hard fuzzy C-means (FCM) . The main objective of the study is
to estimate the relative efficiency of two supervised techniques
against unsupervised fuzzy techniques while reducing the
feature space. It is found that clustering using FCM leads to
better accuracy in classifying documents in the face of
evolutionary algorithms like LSI and PCA. Results show that
the clustering of features improves the accuracy of document
classification
The document summarizes three usability studies conducted to evaluate the BioText search engine interface. Study 1 was a pilot study that found users wanted caption search capabilities. Study 2 explored displaying related gene/protein terms and found users preferred categories over checkboxes. Study 3 confirmed hypotheses that users preferred full text searching and figure display. Overall, the studies provided insights into improving biomedical search interfaces based on user needs.
TWO LEVEL SELF-SUPERVISED RELATION EXTRACTION FROM MEDLINE USING UMLSIJDKP
The biomedical research literature is one among many other domains that hides a precious knowledge, and
the biomedical community made an extensive use of this scientific literature to discover the facts of
biomedical entities, such as disease, drugs,etc.MEDLINE is a huge database of biomedical research
papers which remain a significantly underutilized source of biological information. Discovering the useful
knowledge from such huge corpus leads to various problems related to the type of information such as the
concepts related to the domain of texts and the semantic relationship associated with them. In this paper,
we propose a Two-level model for Self-supervised relation extraction from MEDLINE using Unified
Medical Language System (UMLS) Knowledge base. The model uses a Self-supervised Approach for
Relation Extraction (RE) by constructing enhanced training examples using information from UMLS. The
model shows a better result in comparison with current state of the art and naïve approaches
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...Maulik Kamdar
1) ReVeaLD is an interactive search platform that allows biomedical researchers to query linked open data sources using natural language queries or a visual interface.
2) It addresses challenges of accessing heterogeneous biomedical data sources by providing a domain-specific language and query templates to form SPARQL queries for multiple data sources.
3) The system was evaluated on tasks involving formulating queries using the domain-specific language concepts and linked open data catalog, and it was found that familiarity with the domain-specific language concepts and a smaller set of concepts improved query formulation times.
Context Driven Technique for Document ClassificationIDES Editor
In this paper we present an innovative hybrid Text
Classification (TC) system that bridges the gap between
statistical and context based techniques. Our algorithm
harnesses contextual information at two stages. First it extracts
a cohesive set of keywords for each category by using lexical
references, implicit context as derived from LSA and wordvicinity
driven semantics. And secondly, each document is
represented by a set of context rich features whose values are
derived by considering both lexical cohesion as well as the extent
of coverage of salient concepts via lexical chaining. After
keywords are extracted, a subset of the input documents is
apportioned as training set. Its members are assigned categories
based on their keyword representation. These labeled
documents are used to train binary SVM classifiers, one for
each category. The remaining documents are supplied to the
trained classifiers in the form of their context-enhanced feature
vectors. Each document is finally ascribed its appropriate
category by an SVM classifier.
This document summarizes a presentation on using FAIR principles to build a Cancer Research Data Commons. It discusses three main themes: experiences implementing FAIR, capturing the diversity of cancer science data, and allowing more people to comply with FAIR. It also describes plans to build a Cancer Data Aggregator that would provide unified access to multiple cancer data repositories via an API. Proposals to build this aggregator are due by August 15, 2019.
A Semantic Retrieval System for Extracting Relationships from Biological Corpusijcsit
The World Wide Web holds a large size of different information. Sometimes while searching the World Wide Web, users always do not gain the type of information they expect. In the subject of information extraction, extracting semantic relationships between terms from documents become a challenge. This
paper proposes a system helps in retrieving documents based on the query expansion and tackles the extracting of semantic relationships from biological documents. This system retrieved documents that are relevant to the input terms then it extracts the existence of a relationship. In this system, we use Boolean
model and the pattern recognition which helps in determining the relevant documents and determining the place of the relationship in the biological document. The system constructs a term-relation table that accelerates the relation extracting part. The proposed method offers another usage of the system so the
researchers can use it to figure out the relationship between two biological terms through the available information in the biological documents. Also for the retrieved documents, the system measures the percentage of the precision and recall.
https://www.youtube.com/watch?v=Y_-o-4rKxUk
Machine learning powered metabolomic network analysis
Dmitry Grapov PhD,
Director of Data Science and Bioinformatics,
CDS- Creative Data Solutions
www.createdatasol.com
Metabolomic network analysis can be used to interpret experimental results within a variety of contexts including: biochemical relationships, structural and spectral similarity and empirical correlation. Machine learning is useful for modeling relationships in the context of pattern recognition, clustering, classification and regression based predictive modeling. The combination of developed metabolomic networks and machine learning based predictive models offer a unique method to visualize empirical relationships while testing key experimental hypotheses. The following presentation focuses on data analysis, visualization, machine learning and network mapping approaches used to create richly mapped metabolomic networks. Learn more at www.createdatasol.com
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...IJERD Editor
The document describes the BioNav system, which categorizes large numbers of biomedical literature search results from PubMed using the MeSH concept hierarchy. BioNav constructs an initial navigation tree by attaching PubMed citations to relevant MeSH concepts. It then reduces this tree by removing empty nodes. Unlike static navigation interfaces, BioNav dynamically selects a small subset of concept nodes to display at each step based on estimated user navigation cost. This allows users to efficiently explore concepts of interest and find relevant citations from large result sets.
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
The document describes the BioNav system, which provides a dynamic navigation interface for querying biomedical databases like PubMed. BioNav categorizes query results using the MeSH concept hierarchy and constructs a navigation tree. It then reveals only a subset of concept nodes at each step to minimize expected navigation cost for the user. The system architecture includes a web interface, middle layer, navigation system, and database. BioNav was found to significantly reduce average navigation costs compared to traditional static interfaces through experimental evaluation.
Acs collaborative computational technologies for biomedical research an enabl...Sean Ekins
This document discusses enabling more open and collaborative approaches to drug discovery through computational technologies. It argues that pre-competitive data sharing could help integrate historical knowledge and deliver high value. Open drug discovery may be a better approach than the traditional closed model. Tools and open interfaces could facilitate more open collaboration between different sectors involved in biomedical research. Mobile apps may help scientists access and share data more easily. Crowdsourcing approaches could engage more contributors to knowledge bases.
Using data mining methods knowledge discovery for text miningeSAT Journals
Abstract Many data mining techniques have been proposed for mining useful patterns in text documents. However, how to effectively use and update discovered patterns is still an open research issue, especially in the domain of text mining. Since most existing text mining methods adopted term-based approaches, they all suffer from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern (or phrase)-based approaches should perform better than the term-based ones, but many experiments do not support this hypothesis. Proposed work presents an innovative and effective pattern discovery technique which includes the processes of pattern deploying and pattern evolving, to improve the effectiveness of using and updating discovered patterns for finding relevant and interesting information. Keywords:-Text mining, text classification, pattern mining, pattern evolving, information filtering.
The document discusses different theories used in information retrieval systems. It describes cognitive or user-centered theories that model human information behavior and structural or system-centered theories like the vector space model. The vector space model represents documents and queries as vectors of term weights and compares similarities between queries and documents. It was first used in the SMART information retrieval system and involves assigning term vectors and weights to documents based on relevance.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Clinical Information Models (CIMs) expressed as archetypes play an essential role in the design and development of current Electronic Health Record (EHR) information structures. Although there exist many experiences about using archetypes in the literature, a comprehensive and formal methodology for archetype modeling does not exist. Having a modeling methodology is essential to develop quality archetypes, in order to guide the development of EHR systems and to allow the semantic interoperability of health data. In this work, an archetype modeling methodology is proposed. This paper describes its phases, the inputs and outputs of each phase, and the involved participants and tools. It also includes the description of the possible strategies to organize the modeling process. The proposed methodology is inspired by existing best practices of CIMs, software and ontology development. The methodology has been applied and evaluated in regional and national EHR projects. The application of the methodology provided useful feedback and improvements, and confirmed its advantages. The conclusion of this work is that having a formal methodology for archetype development facilitates the definition and adoption of interoperable archetypes, improves their quality, and facilitates their reuse among different information systems and EHR projects. Moreover, the proposed methodology can be also a reference for CIMs development using any other formalism.
This document summarizes a presentation about ensuring data quality in the PHIS+ consortium, which integrates clinical and administrative data across multiple children's hospitals for comparative effectiveness research. It describes the process of developing common data models, semantically mapping local data elements to standards, collecting data using a toolkit with validation, processing the data through a platform to standardize terminology and storage, and conducting various automated and manual checks for data quality issues. These included checks for missing or invalid data, relationships between test results and specimens/cultures, and study-specific assessments through chart review. The final database contained over 4.5 million records across various domains with standardized coding to support health services research.
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
This document summarizes a research paper on developing user profiles from search engine queries to enable personalized search results. It discusses how current search engines generally return the same results regardless of individual user interests. The paper proposes methods to construct user profiles capturing both positive and negative preferences from search histories and click-through data. Experimental results showed profiles including both preferences performed best by improving query clustering and separating similar vs. dissimilar queries. Future work aims to use profiles for collaborative filtering and predicting new query intents.
Call for paper 2012, hard copy of Certificate, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJCER, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, research and review articles, IJCER Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathematics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer review journal, indexed journal, research and review articles, engineering journal, www.ijceronline.com, research journals,
yahoo journals, bing journals, International Journal of Computational Engineering Research, Google journals, hard copy of Certificate,
journal of engineering, online Submission
Domain ontology development for communicable diseasescsandit
This document discusses the development of a domain ontology for communicable diseases. The researchers developed an ontology with concepts like diseases, symptoms, and causes arranged in a taxonomy. They created over 600 concepts with properties and relations. The ontology development process included specification, conceptualization, creation of instances, and evaluation using a description logic reasoner to verify the concepts and relations were correctly represented. The ontology will be expanded to include more diseases and connections to related web content to provide information retrieval.
DOMAIN ONTOLOGY DEVELOPMENT FOR COMMUNICABLE DISEASEScscpconf
Web has become the very first resource to search for any kind of information. With the emergence of semantic web, our search queries have started generating more informed results.Ontologies are at the core of any semantic web application. They help in rapid development of
distributed systems by providing information on the fly. This key feature of distribution and
sharing of information has made ontologies as a new knowledge representation mechanism. A
mechanism which is strongly backed by a sound inference system. In this paper, we shall discuss the development, verification and validation of an ontology in a health domain.
The document discusses the query formulation process in information retrieval systems. It defines a query and explains that query formulation involves refining the original query entered by the user, such as through tokenization, normalization, and stemming of terms. This refinement stage is followed by a structural alteration stage where the query is segmented and expanded with related concepts. Effective query formulation improves search quality by better representing the user's intent.
The document summarizes three usability studies conducted to evaluate the BioText search engine interface. Study 1 was a pilot study that found users wanted caption search capabilities. Study 2 explored displaying related gene/protein terms and found users preferred categories over checkboxes. Study 3 confirmed hypotheses that users preferred full text searching and figure display. Overall, the studies provided insights into improving biomedical search interfaces based on user needs.
TWO LEVEL SELF-SUPERVISED RELATION EXTRACTION FROM MEDLINE USING UMLSIJDKP
The biomedical research literature is one among many other domains that hides a precious knowledge, and
the biomedical community made an extensive use of this scientific literature to discover the facts of
biomedical entities, such as disease, drugs,etc.MEDLINE is a huge database of biomedical research
papers which remain a significantly underutilized source of biological information. Discovering the useful
knowledge from such huge corpus leads to various problems related to the type of information such as the
concepts related to the domain of texts and the semantic relationship associated with them. In this paper,
we propose a Two-level model for Self-supervised relation extraction from MEDLINE using Unified
Medical Language System (UMLS) Knowledge base. The model uses a Self-supervised Approach for
Relation Extraction (RE) by constructing enhanced training examples using information from UMLS. The
model shows a better result in comparison with current state of the art and naïve approaches
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...Maulik Kamdar
1) ReVeaLD is an interactive search platform that allows biomedical researchers to query linked open data sources using natural language queries or a visual interface.
2) It addresses challenges of accessing heterogeneous biomedical data sources by providing a domain-specific language and query templates to form SPARQL queries for multiple data sources.
3) The system was evaluated on tasks involving formulating queries using the domain-specific language concepts and linked open data catalog, and it was found that familiarity with the domain-specific language concepts and a smaller set of concepts improved query formulation times.
Context Driven Technique for Document ClassificationIDES Editor
In this paper we present an innovative hybrid Text
Classification (TC) system that bridges the gap between
statistical and context based techniques. Our algorithm
harnesses contextual information at two stages. First it extracts
a cohesive set of keywords for each category by using lexical
references, implicit context as derived from LSA and wordvicinity
driven semantics. And secondly, each document is
represented by a set of context rich features whose values are
derived by considering both lexical cohesion as well as the extent
of coverage of salient concepts via lexical chaining. After
keywords are extracted, a subset of the input documents is
apportioned as training set. Its members are assigned categories
based on their keyword representation. These labeled
documents are used to train binary SVM classifiers, one for
each category. The remaining documents are supplied to the
trained classifiers in the form of their context-enhanced feature
vectors. Each document is finally ascribed its appropriate
category by an SVM classifier.
This document summarizes a presentation on using FAIR principles to build a Cancer Research Data Commons. It discusses three main themes: experiences implementing FAIR, capturing the diversity of cancer science data, and allowing more people to comply with FAIR. It also describes plans to build a Cancer Data Aggregator that would provide unified access to multiple cancer data repositories via an API. Proposals to build this aggregator are due by August 15, 2019.
A Semantic Retrieval System for Extracting Relationships from Biological Corpusijcsit
The World Wide Web holds a large size of different information. Sometimes while searching the World Wide Web, users always do not gain the type of information they expect. In the subject of information extraction, extracting semantic relationships between terms from documents become a challenge. This
paper proposes a system helps in retrieving documents based on the query expansion and tackles the extracting of semantic relationships from biological documents. This system retrieved documents that are relevant to the input terms then it extracts the existence of a relationship. In this system, we use Boolean
model and the pattern recognition which helps in determining the relevant documents and determining the place of the relationship in the biological document. The system constructs a term-relation table that accelerates the relation extracting part. The proposed method offers another usage of the system so the
researchers can use it to figure out the relationship between two biological terms through the available information in the biological documents. Also for the retrieved documents, the system measures the percentage of the precision and recall.
https://www.youtube.com/watch?v=Y_-o-4rKxUk
Machine learning powered metabolomic network analysis
Dmitry Grapov PhD,
Director of Data Science and Bioinformatics,
CDS- Creative Data Solutions
www.createdatasol.com
Metabolomic network analysis can be used to interpret experimental results within a variety of contexts including: biochemical relationships, structural and spectral similarity and empirical correlation. Machine learning is useful for modeling relationships in the context of pattern recognition, clustering, classification and regression based predictive modeling. The combination of developed metabolomic networks and machine learning based predictive models offer a unique method to visualize empirical relationships while testing key experimental hypotheses. The following presentation focuses on data analysis, visualization, machine learning and network mapping approaches used to create richly mapped metabolomic networks. Learn more at www.createdatasol.com
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...IJERD Editor
The document describes the BioNav system, which categorizes large numbers of biomedical literature search results from PubMed using the MeSH concept hierarchy. BioNav constructs an initial navigation tree by attaching PubMed citations to relevant MeSH concepts. It then reduces this tree by removing empty nodes. Unlike static navigation interfaces, BioNav dynamically selects a small subset of concept nodes to display at each step based on estimated user navigation cost. This allows users to efficiently explore concepts of interest and find relevant citations from large result sets.
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
The document describes the BioNav system, which provides a dynamic navigation interface for querying biomedical databases like PubMed. BioNav categorizes query results using the MeSH concept hierarchy and constructs a navigation tree. It then reveals only a subset of concept nodes at each step to minimize expected navigation cost for the user. The system architecture includes a web interface, middle layer, navigation system, and database. BioNav was found to significantly reduce average navigation costs compared to traditional static interfaces through experimental evaluation.
Acs collaborative computational technologies for biomedical research an enabl...Sean Ekins
This document discusses enabling more open and collaborative approaches to drug discovery through computational technologies. It argues that pre-competitive data sharing could help integrate historical knowledge and deliver high value. Open drug discovery may be a better approach than the traditional closed model. Tools and open interfaces could facilitate more open collaboration between different sectors involved in biomedical research. Mobile apps may help scientists access and share data more easily. Crowdsourcing approaches could engage more contributors to knowledge bases.
Using data mining methods knowledge discovery for text miningeSAT Journals
Abstract Many data mining techniques have been proposed for mining useful patterns in text documents. However, how to effectively use and update discovered patterns is still an open research issue, especially in the domain of text mining. Since most existing text mining methods adopted term-based approaches, they all suffer from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern (or phrase)-based approaches should perform better than the term-based ones, but many experiments do not support this hypothesis. Proposed work presents an innovative and effective pattern discovery technique which includes the processes of pattern deploying and pattern evolving, to improve the effectiveness of using and updating discovered patterns for finding relevant and interesting information. Keywords:-Text mining, text classification, pattern mining, pattern evolving, information filtering.
The document discusses different theories used in information retrieval systems. It describes cognitive or user-centered theories that model human information behavior and structural or system-centered theories like the vector space model. The vector space model represents documents and queries as vectors of term weights and compares similarities between queries and documents. It was first used in the SMART information retrieval system and involves assigning term vectors and weights to documents based on relevance.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Clinical Information Models (CIMs) expressed as archetypes play an essential role in the design and development of current Electronic Health Record (EHR) information structures. Although there exist many experiences about using archetypes in the literature, a comprehensive and formal methodology for archetype modeling does not exist. Having a modeling methodology is essential to develop quality archetypes, in order to guide the development of EHR systems and to allow the semantic interoperability of health data. In this work, an archetype modeling methodology is proposed. This paper describes its phases, the inputs and outputs of each phase, and the involved participants and tools. It also includes the description of the possible strategies to organize the modeling process. The proposed methodology is inspired by existing best practices of CIMs, software and ontology development. The methodology has been applied and evaluated in regional and national EHR projects. The application of the methodology provided useful feedback and improvements, and confirmed its advantages. The conclusion of this work is that having a formal methodology for archetype development facilitates the definition and adoption of interoperable archetypes, improves their quality, and facilitates their reuse among different information systems and EHR projects. Moreover, the proposed methodology can be also a reference for CIMs development using any other formalism.
This document summarizes a presentation about ensuring data quality in the PHIS+ consortium, which integrates clinical and administrative data across multiple children's hospitals for comparative effectiveness research. It describes the process of developing common data models, semantically mapping local data elements to standards, collecting data using a toolkit with validation, processing the data through a platform to standardize terminology and storage, and conducting various automated and manual checks for data quality issues. These included checks for missing or invalid data, relationships between test results and specimens/cultures, and study-specific assessments through chart review. The final database contained over 4.5 million records across various domains with standardized coding to support health services research.
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
This document summarizes a research paper on developing user profiles from search engine queries to enable personalized search results. It discusses how current search engines generally return the same results regardless of individual user interests. The paper proposes methods to construct user profiles capturing both positive and negative preferences from search histories and click-through data. Experimental results showed profiles including both preferences performed best by improving query clustering and separating similar vs. dissimilar queries. Future work aims to use profiles for collaborative filtering and predicting new query intents.
Call for paper 2012, hard copy of Certificate, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJCER, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, research and review articles, IJCER Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathematics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer review journal, indexed journal, research and review articles, engineering journal, www.ijceronline.com, research journals,
yahoo journals, bing journals, International Journal of Computational Engineering Research, Google journals, hard copy of Certificate,
journal of engineering, online Submission
Domain ontology development for communicable diseasescsandit
This document discusses the development of a domain ontology for communicable diseases. The researchers developed an ontology with concepts like diseases, symptoms, and causes arranged in a taxonomy. They created over 600 concepts with properties and relations. The ontology development process included specification, conceptualization, creation of instances, and evaluation using a description logic reasoner to verify the concepts and relations were correctly represented. The ontology will be expanded to include more diseases and connections to related web content to provide information retrieval.
DOMAIN ONTOLOGY DEVELOPMENT FOR COMMUNICABLE DISEASEScscpconf
Web has become the very first resource to search for any kind of information. With the emergence of semantic web, our search queries have started generating more informed results.Ontologies are at the core of any semantic web application. They help in rapid development of
distributed systems by providing information on the fly. This key feature of distribution and
sharing of information has made ontologies as a new knowledge representation mechanism. A
mechanism which is strongly backed by a sound inference system. In this paper, we shall discuss the development, verification and validation of an ontology in a health domain.
The document discusses the query formulation process in information retrieval systems. It defines a query and explains that query formulation involves refining the original query entered by the user, such as through tokenization, normalization, and stemming of terms. This refinement stage is followed by a structural alteration stage where the query is segmented and expanded with related concepts. Effective query formulation improves search quality by better representing the user's intent.
Personalized web search using browsing history and domain knowledgeRishikesh Pathak
This document proposes a framework for improving personalized web search by constructing an enhanced user profile using both the user's browsing history and domain knowledge. The enhanced user profile is used to better suggest relevant web pages to the user based on their search query. An experiment found that suggestions made using the enhanced user profile performed better than using a standard user profile alone. The framework involves modeling the user, re-ranking search results, and displaying personalized results based on the enhanced user profile.
This document discusses techniques for personalizing search engine results using concept-based user profiles. It proposes six methods for creating user profiles that capture both positive and negative user preferences and interests based on concepts extracted from search queries and results. The methods use machine learning algorithms to learn weighted concept vectors representing user profiles. An evaluation found that profiles capturing both positive and negative preferences performed best. The goal is to resolve query ambiguity and increase result relevance by understanding each user's unique interests and preferences.
An Improved Mining Of Biomedical Data From Web Documents Using ClusteringKelly Lipiec
This document summarizes a research paper that proposes an improved method for mining biomedical data from web documents using clustering. Specifically, it develops an optimized k-means clustering algorithm to group similar biomedical documents together based on identifying relevant terms using the Unified Medical Language System (UMLS). The approach aims to more efficiently retrieve relevant biomedical documents for users. It compares the proposed method to the original k-means algorithm and finds it achieves an average F-measure of 99.06%, indicating more accurate clustering of biomedical web documents.
Co-Extracting Opinions from Online ReviewsEditor IJCATR
Exclusion of opinion targets and words from online reviews is an important and challenging task in opinion mining. The
opinion mining is the use of natural language processing, text analysis and computational process to identify and recover the subjective
information in source materials. This paper propose a Supervised word alignment model, which identifying the opinion relation. Rather
than this paper focused on topical relation, in which to extract the relevant information or features only from a particular online reviews.
It is based on feature extraction algorithm to identify the potential features. Finally the items are ranked based on the frequency of
positive and negative reviews. Compared to previous methods, our model captures opinion relation and feature extraction more precisely.
One of the most advantages that our model obtain better precision because of supervised alignment model. In addition, an opinion
relation graph is used to refer the relationship between opinion targets and opinion words.
How to conduct_a_systematic_or_evidence_reviewEaglefly Fly
This document provides guidance on conducting a systematic or evidence-based literature review. It discusses defining search terms, identifying relevant articles through database searches and other methods, applying inclusion/exclusion filters to evaluate articles, synthesizing results, and summarizing the evidence found to determine the best intervention. The goal is to reduce bias and provide a comprehensive review of a topic through an explicit and transparent process.
This document provides an overview of information retrieval models, including vector space models, TF-IDF, Doc2Vec, and latent semantic analysis. It begins with basic concepts in information retrieval like document indexing and relevance scoring. Then it discusses vector space models and how documents and queries are represented as vectors. TF-IDF weighting is explained as assigning higher weight to rare terms. Doc2Vec is introduced as an extension of word2vec to learn document embeddings. Latent semantic analysis uses singular value decomposition to project documents to a latent semantic space. Implementation details and examples are provided for several models.
Custom-Made Ranking in Databases Establishing and Utilizing an Appropriate Wo...ijsrd.com
Custom Rating System which provides a facility to the users, that they can search and download best articles or anything on the system in the database. The article or anything can be any text content which can describe a product, a book, an institution, an application, a company or anything. This system consists of two set of users, one is the normal user and another is the administrator. The users have to register and login to the system first, in order to use the system. The users have the following privileges. Write Article and Upload Relevant Files, Post Related URL to each article for other users reference, Search and Read Article posted by other users, Rate the articles posted by other users. The articles which are written by any user are sent to the Administrator for Approval. After approval of the articles by the administrator, they are available for the users to search and download. Based on the Description provided in an article, it can be searched by any registered user on the system. The user can see the article, download a file if available and the user can rate the article based on the article. Rating can be given in terms of 1 Star to 5 Star. The users can search the article. The list of articles displayed can be sorted based on following parameters: Rating, Popularity (Number of Clicks on the Article),Relevance (Based on number of matching keywords provided),All Articles uploaded by a specific user.
Quest Trail: An Effective Approach for Construction of Personalized Search En...Editor IJCATR
This document discusses developing a personalized search engine for software development organizations. It proposes using semantic analysis and genetic algorithms to personalize search results. Semantic analysis resolves ambiguity in queries by understanding their meaning, while genetic algorithms use machine learning to better understand user preferences over time. Quest analysis is also used to identify the goal or task behind a user's search by analyzing search logs at the quest level rather than query or session levels. Together these approaches aim to increase search relevance for users in software organizations by creating group profiles based on domain or project rather than individual user profiles.
IJRET-V1I1P5 - A User Friendly Mobile Search Engine for fast Accessing the Da...ISAR Publications
Mobile search engine is a meta search engine that imprisonments the user’s favorite in
the form of concepts by mining their click through data. But the search query is limited to small
words unlike those used when interacting with search engines through computers. It has become
popular because of presence of huge number of applications. Smartphone’s carry large amount of
personal information, such as user’s personal details, contacts, messages, emails, credit card
information, etc. User type specific search and finally Ontology based Search. Moreover opinion
mining is conducted to provide feedback and valuable suggestions given by the mobile users. Due
to the different characteristics of the content concepts and location concepts, use different
techniques for their concept extraction and ontology formulation. Moreover the individual users
can use this search engine, which runs on android platform. They can give feedbacks and
suggestions about the search result. Based on the feedback other users can get valuable
information about the services available in their location or nearby location.
Information filtering is the process of monitoring large amounts of dynamically generated information and identifying the subset of information likely to be of interest to a user based on their information needs. It represents the user's interests and identifies only pieces of information they would find interesting. There are three main categories of information filtering: collaborative filtering which uses recommendations from other users; content-based filtering which uses a comparison between item content and user profiles; and hybrid filtering which combines aspects of collaborative and content-based filtering. Feedback techniques can also be used to continually update and improve filtering.
A presentation that I gave at the Query Log Analysis: From Research to Best Practice Workshop 27 - 28 May 20098 in London, UK http://ir.shef.ac.uk/cloughie/qlaw2009/index.html
Ontological and clustering approach for content based recommendation systemsvikramadityajakkula
This document proposes a novel content-based recommendation system that uses ontological graphs and dynamic weighted ranking. It builds an adaptive ranking mechanism based on user selections and preferences to improve recommendation accuracy over time. The system segments data into ontological groups and identifies relationships between entities. It then calculates similarity between entities using feature vectors and ranks entities based on weights assigned to their connections in the ontological graph. These weights are updated dynamically based on user feedback to personalize recommendations for each user. The paper describes testing this approach in a recipe recommendation tool called RecipeMiner, which produced coherent recommendations that adapted to user preferences.
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...ijseajournal
This document evaluates the performance of structured and semi-structured tools for accessing bioinformatics databases. It compares the Sequence Retrieval System (SRS) and Entrez search tools for structured data retrieval to Perl and BioPerl programs for semi-structured data retrieval. The study retrieves gene information from the European Bioinformatics Institute and National Centre for Biotechnology Information databases using each method. It finds that semi-structured tools provide an alternative to structured tools, though each approach has advantages and disadvantages for certain types of queries.
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...ijseajournal
This document evaluates the performance of structured and semi-structured tools for accessing bioinformatics databases. It compares the Sequence Retrieval System (SRS) and Entrez search tools for structured data retrieval to Perl and BioPerl programs for semi-structured data retrieval. The study retrieves gene information from the European Bioinformatics Institute and National Center for Biotechnology Information databases using each method. It finds that semi-structured tools provide an alternative to structured tools, though each approach has advantages and disadvantages for certain types of queries.
The document describes a proposed framework called UPS for privacy-preserving personalized web search. The UPS framework aims to generalize user profiles for each query according to user-specified privacy requirements, while balancing privacy risk and personalization utility. Two key contributions of the proposed system are: 1) Supporting runtime profiling to dynamically generalize user profiles on a per-query basis; and 2) Allowing for customization of privacy requirements by users to designate sensitive topics in their profiles. Algorithms are proposed to generalize profiles to optimize these metrics during the personalization process.
Ähnlich wie Search Interface Feature Evaluation in Biosciences (20)
Pingar DiscoveryOne controls your unstructured content cost by identifying documents that are candidates for disposal. DiscoveryOne provides a Content Inventory to identify low-hanging fruit for immediate archival and disposal. A second pass allows you to identify content that could be migrated to an Enterprise Content Management System.
Read more about Pingar here: http://pingar.com/
It is best practice to enrich existing content with correct and sufficient metadata before documents are migrated to a new Enterprise Content Management System. It decreases the cost and effort. Furthermore, DiscoveryOne Content Enrichment auto-categorizes your documents according to its true content, instead of focusing on previously created insufficient or incorrect metadata.
Learn more: http://pingar.com/content-enrichment/
Avoid expensive electronic dumping grounds by auto-tagging contentZanda Mark
Automatically categorizing and tagging content in intranets and enterprise content management systems allows information to be retrieved more quickly and efficiently. This avoids expensive electronic dumping grounds where important information is difficult to find. By categorizing content, users can rapidly identify documents by type or topic. Pingar software automatically categorizes and tags content, allowing for faster searches and information retrieval across knowledge bases, records, and throughout organizations.
To learn more visit: http://pingar.com
The data volume in
enterprises is going to
grow 50 x year-over-year
between now and 2020
59% middle managers
of large companies
miss important
information
almost every day because
they cannot find it!
To learn more visit: http://pingar.com
The storage requirements for organizations is growing between 45-60% each year. Much of this content is emails and documents rather than databases. This means that your IT storage budget is probably growing at least 5% annually just to keep up.
How Text Analytics Increases Search RelevanceZanda Mark
To learn more visit: http://pingar.com/discoveryone/
Findability is the ease of which someone can locate the information they want. Often, it is confused with search – but search is just one method of achieving findability. Search allows people to enter in words that they hope are contained in the content they want to retrieve. Findability includes any method of locating this content, including but not limited to searching. Pingar DiscoveryOne improves findability.
To learn more visit: http://pingar.com/discoveryone/
DiscoveryOne Content Enrichment is the easiest way to improve search, enable defensible deletion and identify document security risks. By reading, categorizing and tagging documents, DiscoveryOne automatically creates metadata. This metadata can be used in systems such as enterprise search, document management, email and CRM.
To learn more visit: http://pingar.com/discoveryone/
Pingar DiscoveryOne lets enterprises identify opportunities and risks hidden in the text of corporate and web data. It reduces the cost of document storage, security and compliance. DiscoveryOne’s powerful text analytics engine reads millions of pages in hours identifying what content matters most to analysts and information management professionals.
Pingar DiscoveryOne will point you to the trends, topics and issues exposed in those documents, posts, articles and emails. Within minutes, you can drill down to the right content out of millions of documents so you can be a step ahead.
Will the improvement in Sharepoint 2016 search increase user adaption?Zanda Mark
The document discusses a study comparing search functionality between SharePoint 2013 and SharePoint 2016. While some minor differences were observed, like occasional variations in total result counts, the quality of search results were found to be effectively identical. This suggests the improvements in SharePoint 2016 would not enhance users' ability to locate documents in large libraries relative to SharePoint 2013. However, the study did not examine SharePoint 2016's potential for increased adaptation. Overall, the results indicate search quality is unchanged for the scenario studied, but adaptation was not ruled out as a potential future benefit.
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"sameer shah
Embark on a captivating financial journey with 'Financial Odyssey,' our hackathon project. Delve deep into the past performance of two companies as we employ an array of financial statement analysis techniques. From ratio analysis to trend analysis, uncover insights crucial for informed decision-making in the dynamic world of finance."
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
2. ABSTRACT
This paper reports findings on desirable interface features for different
search tasks in the biomedical domain. We conducted a user study where
we asked bioscientists to evaluate the usefulness of autocomplete, query
expansions, faceted refinement, related searches and results preview
implementations in new pilot interfaces and publicly available systems
while using baseline and their own queries. Our evaluation reveals that
there is a preference for certain features depending on the search task.
In addition, we touch on the current pain point of faceted search: the
acquisition of faceted subject metadata for unstructured documents.
We found a strong preference for prototypes displaying just a few facets
generated based on either the query or the matching documents.
Topics:
Design Tools and Techniques, Measurement, User Interfaces, Search User
Interfaces, UI Design, Human Factors, Qualitative User Study
01
Search Interface Feature Evaluation in Biosciences
3. INTRO
Interface features are elements of search user interfaces, which facilitate
the search process. Examples of such features are autocomplete and query
expansion suggestions, faceted navigation, and document surrogates in
search results previews. Vast research exists on the usefulness of interface
features on the web [6], although less so in the biomedical domain.
We identified and addressed two open questions in studies of search user
interfaces: Queries and search tasks can be classified into categories [7],
but how should interface differ depending on the task? Faceted navigation
has been demonstrated useful for search in structured data [11], but which
approach to generating facet categories for unstructured documents works
best?
We conducted a qualitative user study to systematically evaluate
techniques to compute and present individual search features. In
extended interviews, bioscientists rated the usefulness of features in
common interfaces on baseline and their own queries, which we classified
as browsing, gathering information, and search for facts. Side-by-side
comparison allowed us to identify clear preferences in interface features
depending on these tasks.
After an overview of state-of-the-art in computing the features and
user-study outcomes, we discuss search tasks in the biomedical
domain. We then turn to the experiment and participating systems.
Finally, we discuss how the study was conducted and its findings.
02
Search Interface Feature Evaluation in Biosciences
1
4. Related work can be grouped based on the studied features.
Autocomplete (autosuggest) provides dynamic search suggestions
as the user types the query. Commonly suggestions originate from existing
query logs, but could also be computed using biomedical terminology
resources [9]. Users intuitively interact with autocomplete, increasingly so
with time [1]. It is recommended to display results before suggesting new
queries and to compute suggestions starting from all existing terms [6].
Query expansion suggests alternative query terms when users’ guesses
result in only a few or incorrect results. Such terms can be computed from
query logs or thesauri. Users react positively to search expansions as long
as their number is limited [6].
Faceted refinement helps to narrow down results based on a
dimension (facet) of the searched item. Computing facets for products,
accommodation or any structured data search is straightforward; however
search in unstructured documents limits facets to pre-existing metadata
such as author, subject headings or social bookmarking tags, and such may
not exist. Various approaches address this shortcoming. Named entity
extraction can generate facets such as people and organizations names [12].
Hierarchical clustering of search results allows using clusters’ labels as facet
categories (e.g. clusty.com). In biomedicine, existing controlled vocabularies
and ontologies are used to derive facets [5, 9] (see section 4.1.1). Faceted
refinement is welcomed by users in all studies, but good execution is the
key [11].
03
Search Interface Feature Evaluation in Biosciences
“Faceted refinement
is welcomed by users
in all studies, but good
execution is the key.”
2RELATED WORK
5. Related searches are query suggestions that lead to new searches
by either changing the query focus or refining it. Such suggestions
can be derived dynamically from top search results [2]. A quarter
of search sessions made use of these suggestions, but their
effectiveness is questionable. A study of related searches in the
biomedical domain reports a strong desire for gene and organism
names that can be driven from bioscience ontologies [4].
Preview of results helps users judge the relevance of their searches
by listing surrogates for each document containing its title, URL,
preview and sometimes keywords. Document previews can be
most relevant sentences derived via query-based summarization
[10] or snippets that combine multiple sentences while replacing
their irrelevant parts with ellipses [3]. Studies indicate that users
prefer non-truncated sentences and the preview including
document summary should put all query terms in context [6].
The above studies provide insights into the usefulness and
effectiveness of interface features, but do not tell if there is a
preference in certain features depending on the search task.
Another gap is a comparison of different techniques that
implement faceted navigation in unstructured documents.
04
Search Interface Feature Evaluation in Biosciences
RELATED WORK
“The above studies
provide insights
into usefulness and
effectiveness of interface
features.”
6. Kellar et al. classify information seeking tasks into four major
categories and analyze how often people conduct and repeat these
tasks [7]. Nearly 50% of search queries relate to Transactions
(email, banking, shopping), all of which are frequently repeated.
Other queries are somewhat equally split between Browsing
(blogs, news), Information Gathering (e.g. graduate schools to
apply) and Fact Finding (e.g. weather forecasts). The latter three
are conducted by bioscientists in their daily work when they
browse for new publications, gather information on particular
genes, proteins or diseases, or search for facts, for instance in
biomedical databases. Our study takes this differentiation into
account when analyzing scientists’ rankings of interface features.
05
Search Interface Feature Evaluation in Biosciences
3SEARCH TASKS IN BIOSCIENCE
I need to collect publications
by others on connexins
& how they relate to our studies
BROWSING
FACT FINDING
7. The aim of this study is to identify which search interface features
are useful for searching the biomedical literature. Additionally, we
strived to understand which approaches to faceted navigation for
this domain work best. Based on the current knowledge of user
preferences we hypothesized that users prefer different interface
features depending on the search task.
4.1 Interface Features & Evaluated Systems
To test our hypothesis, we implemented two prototype systems. To
test the usefulness of the features more broadly, we included two
systems primarily used in the biomedical domain (PubMed) and
general search (Google), as well as additional publicly available
systems that handle such features in different ways in bioscience
(GoPubMed, Semedico, NextBio) and general search (Bing).
06
Search Interface Feature Evaluation in Biosciences
4EXPERIMENT DESCRIPTION
8. 4.1.1 Overview of the studied systems
PubMed (ncbi.nlm.nih.gov/pubmed) is the primary search engine used by
bioscientists for their research, as it comprises over 20 million citations
for biomedical literature from MEDLINE, life science journals, and books.
Articles are indexed with Medical Subject Headings (MeSH) [8]. GoPubMed
(gopubmed.com) is a semantic search engine for the biomedical domain. It
provides refinement of PubMed search results using the original hierarchy
of structured vocabularies: Gene Ontology (GO) and MeSH [5]. Semedico
(semedico.org) [9] is a faceted biomedical search system with a ranked
list interface. The facets are populated from a semantic index generated
by disambiguating words in articles to corresponding concepts in MeSH
and UniProt. Its hierarchy of top 20 categories was defined by biologists.
NextBio (nextbio.com) is a commercial ontology-based semantic framework
based on gene, tissue, disease and compound ontologies. It combines
literature with data such as clinical trials. Google (google.com) is the second
most common search engine used by scientists for their work. According
to our findings, it is often preferred to PubMed for searching methodology
and techniques (laboratory protocols). Bing (bing.com) handles queries,
ranking and some of the features we study in this paper somewhat different
to Google.
Our two prototypes were built to test additional ways of implementing and
representing features of a search user interface. We used the Pingar API
(pingar.com) for semantic analysis of queries and documents and Apache
Solr (lucene.apache.org/solr) for full-text indexing and searching. We
indexed the 85,000 articles in the Open Access PubMed dataset for this
purpose (ncbi.nlm.nih.gov/pmc/tools/openftlist). The prototypes allowed
us to test Pingar’s tools for generating query expansions, related searches,
keywords, summaries and taxonomy mapping, as well as Solr’s built-in
faceted search and snippet extraction features.
07
Search Interface Feature Evaluation in Biosciences
EXPERIMENT DESCRIPTION
“Our two prototypes were
built to test additional
ways of implementing and
representing features of a
search user interface.”
9. 4.1.2 Implementation of tested interface features
Only certain systems were selected for testing each feature. Here, we
list how each feature is supported by the systems we tested. Information
provided on systems’ websites and publications.
Autocomplete: PubMed employs Automatic Term Mapping that compares
and maps user’s search terms to lists of pre-indexed terms. GoPubMed
matches typed terms to MeSH and GO terms. Semedico places the
suggestions in a taxonomy tree allowing users to select a broader term as
their query. Synonyms are listed in brackets. NextBio lists matching genes,
compounds, SNPs, diseases, tissues, biogroups and authors. Google predicts
suggestions based on other users’ search activities – for certain queries
it analyzes just the last two words. Bing also computes suggestions using
user’s queries and boosts trending queries.
Query expansion: PubMed displays the “search details” that combine (sub)
headings, fields and Boolean. Users can edit them and re-submit. Semedico
displays terms identified in the query and users may remove one from
the search. Pingar suggests misspellings, grammatically similar terms and
synonyms as checkboxes to add to the query using OR.
08
Search Interface Feature Evaluation in Biosciences
EXPERIMENT DESCRIPTION
“Pingar suggests
misspellings,
grammatically similar
terms and synonyms.”
10. Faceted search: PubMed allows filtering by “free full text” or
“reviews” and shows the number of matching results in brackets.
Suggestions are displayed as links. GoPubMed categorizes
filtering suggestions into “Top Terms” (more specific) and
“Knowledge Base” (more generic), ordered by relevance.
Suggestions are displayed as checkboxes allowing multiple
selections. Semedico displays 9 top level MeSH terms as facet
categories each in a differently colored box. Per category, top 3
most frequent terms are shown (numbers in brackets). Expanding
leads to more terms or their child terms. Solr was chosen to
evaluate faceted refinement based on indexed metadata: journal
year and title, and keywords, generated by Pingar. Single and
multiple selections were tested (links vs. checkboxes). Pingar
dynamically generates facet categories by first mapping top 10
search results to terms in multiple biomedical taxonomies and
then walking up the taxonomy tree to find common broader terms.
A different variation of top 3 most relevant facets is displayed for
each query. Each facet lists top 5 most frequent terms and can be
expanded to see more. Some screenshots showed terms computed
by analyzing only text surrounding the query terms (QB), others
the entire content of the document (DB). The intention was to
evaluate whether search query should aid as a context when
computing facet values. We also tested preference over choosing
one or multiple terms per facet category (links vs. checkboxes).
09
Search Interface Feature Evaluation in Biosciences
EXPERIMENT DESCRIPTION
11. Related searches: PubMed suggests variations of the query in the “also
try” area, but such terms are not always available. Google offers two kinds
of related searches in different parts of the interface: Searches for things
of similar kind (e.g. “aquaporin” for “connexin”) and more specific searches
(e.g. “connexin 26” for “connexin”). Bing’s two areas designated to related
searches show the same suggestions formatted as one or two columns. The
suggestions are variations of the original query with added or modified
parts. Pingar also computes related searches, but instead of query logs, top
search results are analyzed for suggestions.
Results preview: We limited the evaluation of this interface feature to
prototype systems only and tested the following features: (1) Keywords are
usually defined by authors or extracted automatically to represent the key
topics in an article. We used Pingar API to compare two cases: extracting
keywords from the text surrounding the query terms and from the entire
document. The intention was to evaluate whether search query should
aid as a context when computing keywords. (2) Document preview is
commonly implemented using sentence snippets containing the query
terms. One prototype used Solr to extract top 3 such snippets per
document. Another one used Pingar’s query-based summary extraction tool
to display the top scoring sentence in the document, and the top scoring
paragraph on mouse-over.
step
10
Search Interface Feature Evaluation in Biosciences
EXPERIMENT DESCRIPTION
“Keywords are usually
defined by authors or
extracted automatically to
represent the key topics in
an article.”
12. 11
Search Interface Feature Evaluation in Biosciences
4.2 The Study
We run an exploratory short study with 6 bioscientists (2 faculty,
2 postdocs, 2 PhD students), where we explained them the 3 types
of search and asked them for examples of such searches they use
for their work: queries and resources/systems. The study was
conducted in person with each participant (10-15 min sessions).
We recruited 10 bioscientists to participate in the main study. All
of them are researchers in academia in various biological areas:
Developmental, Molecular, Cell, Evolutionary, Transcriptional
and Systems Biology, as well as Biochemistry, Immunology,
Genetics, Population Genetics, and Neuroscience. 2 of them are
faculty; 7 are postdoctoral researchers (having received PhDs
from 2005 to 2010) and 1 final year PhD student.
We selected the interface features we wanted to study for the 3
different types of search and identified systems that handle these
features in a different manner. We asked the participants, via email
prior the sessions, to supply us with 4 different queries they use.
We also asked for use frequency, resources, to elaborate on what
information they are looking for, and to fill out an informed
consent form. We selected two baseline queries based on our
exploratory study. We also selected one personal query from each
participant. The selection was done aiming at having a range for
the 3 search types. For each query we took screenshots in different
systems available for that query, and isolated the part of the system
that shows the interface feature in question (logos were removed).
EXPERIMENT DESCRIPTION
13. 12
Search Interface Feature Evaluation in Biosciences
EXPERIMENT DESCRIPTION
During the in-person sessions (each lasted 1-2 hours), we showed
participants in PowerPoint presentation screenshots for one baseline
query (of their choice) and for one of their own queries (Table 1). For each
feature, we asked them to rate overall usefulness and aesthetics using a 5
point Likert scale. Then for each system demonstrating possible handling
of each feature, we asked them to rate usefulness and aesthetics using
again the Likert scale. Finally for each feature, we asked them to rank the
systems in order of preference. Throughout the sessions, we applied the
talk aloud protocol and encouraged comments and suggestions.
Table 1. Queries and search type classification
14. Below we discuss participants’ reactions to content and overall usefulness
of the interface features. Given the space limitations, we plan to present
our aesthetics preference findings in a different venue. Overall, participants
told us that aesthetics are important (and need to be “good enough” to
use a system) but what really matters is the content. We would also like to
emphasize that all scores presented in this paper are for specific features
supported by each system and are not reflective of the systems as whole.
5.1 Interface Features vs. Search Tasks
Figure 1 summarizes how participants judged the usefulness of interface
features for their queries. The length of each bar equals the number of
ratings. All 4 browsing participants liked autocomplete, whereas 5 out of
6 participants with info gathering queries rated it as neutral and 1 out of 6
as not useful. This shows a clear difference in the usefulness of this feature.
For query expansions, participants expressed positive or neutral opinions
for
browsing and mixed opinions for the other search types. Faceted
refinement was rated mostly useful for all search types with equal neutral
scores for information gathering. Related searches got predominantly
neutral or negative reactions, except for browsing, for which they are
equally spread. Not surprisingly, it’s useful to see document previews
for all search types. Comments analysis shows that snippets are better
for browsing, whereas summaries with full sentences for finding specific
or specialized information. The results are intuitive and participant’s
comments confirm these. We encourage the interface designers to give
priority to “green” elements of the search display and do not bother much
about related searches for biologists. Most of them told us that their
searches are usually specific and even correctly suggested related searches
are not of interest. Access to query expansions is important to experts, but
should not be a default feature.
13
Search Interface Feature Evaluation in Biosciences
5RESULTS
“Faceted refinement
was rated mostly useful
for all search types.”
15. 5.2 System Rankings
For each interface feature participants ranked the systems in order of
preference. Table 2 shows the systems ranked at the top and at the bottom
(top/bottom two were considered if 5 or more systems were compared,
top/bottom one if 4 or less were compared).
5.3 Comparisons with Previous Findings
Our results agree with those reported by Scheider et al. that facets are well
received by bioscientists, but autocomplete is less important [9]. Scheider
et al. also argue that in biosciences, a large number of facets are needed
per query, which they grouped into collapsible tabs in Semedico. While
our participants were positive on collapsible tabs (judging by comments
but not explicit testing), 9 out of 10 did not want to see a large number of
faceted groups. The large number of choice and the inevitable redundancy
overwhelmed them. They commented that they would not spend time
inspecting the facets despite speculating that some might be useful. Divoli
et al. found that bioscientists like to refine searches by organism names [4]
and two our participants also commented they really liked Semedico’s facet
“Organisms” (again not explicitly tested). Our results also confirm findings
in [4] that users prefer selecting multiple suggestions using checkboxes.
14
Search Interface Feature Evaluation in Biosciences
RESULTS
Figure 1. Usefulness ratings for interface features & search tasks:
browsing (br), fact finding (ff) & information gathering (ig)
16. 5.4 More Findings on Interface Features
During the course of the study, participants provided us with interesting
suggestions that are not currently implemented by the systems. Below we
categorize them by feature.
Autocomplete was preferred when the major part of the query is typed
and users feel pigeon holed if suggestions come up with the first characters.
Specific suggestions work best here. Query expansions do not need to
include misspellings and close grammatical forms. These should be included
in search automatically. Overall biologists mostly refine and focus searches,
but not expand them. Faceted refinement is always desired, with
checkboxes that enable multiple selections. Too much information and too
many categories scare them away, as does redundancy of terms (across
categories and within each category). Simpler designs are better (e.g., not
too many colours) - this is why Pingar’s top 3 ranked facets with a few
values each scored highly. Users expect facet categories to reflect query
types, e.g. if the query mentions a disease, conditions should be shown, but
not other diseases. Many liked the ability to refine search by a specific
keyword related to their query, offered by GoPubMed’s “top terms” and
Pingar’s keywords. Some commented that year, publication and even the
entire faceted refinement column should not be displayed by default.
PubMed’s option to refine by reviews was highly favored. Some explained
that besides offering comprehensive information, reviews help to discover
important papers by navigating through the references. Pingar DB’s
document preview was preferred for general searches (baseline queries)
and Pingar QB for specific searches (their own queries). Related searches
are not desired, except for browsing. Scientists dislike clicking on links
leading them to new or broader searches.
step
15
Search Interface Feature Evaluation in Biosciences
RESULTS
Table 2. Top and bottom ranked systems
“Faceted refinement is always
desired, with checkboxes that
enable multiple selections.”
17. 16
Search Interface Feature Evaluation in Biosciences
This paper demonstrates user preferences for different search features
depending on search types in the biomedical domain. Although the search
tasks bioscientists perform are not clearly distinct from each other, in the
future we would like to study which tasks to prioritize and how to integrate
features on the interface to allow optimization for switching search types.
6CONCLUSIONS & FUTURE WORK
18. 17
Search Interface Feature Evaluation in Biosciences
7. Acknowledgments
We are extremely grateful to all participants for their contribution.
8. References
[1] Anick, P. and Kantamneni, R. 2008. A longitudinal study of
real-time search assistance adoption. In SIGIR’08: 701-702.
[2] Anick, P. 2003. Using terminological feedback for web
search refinement. In SIGIR’03: 88-95.
[3] Cutrell, E. and Guan, Z. 2007. An eye-tracking study of
information usage in web search. In CHI’07: 407-416.
[4] Divoli, A., Hearst, M.A. and Wooldridge, M.A. 2008.
Evidence for Showing Gene/Protein Name Suggestions in
Bioscience Literature Search, PSB 2008.
[5] Doms, A. and Schroeder, M. 2005. GoPubMed: Exploring
PubMed with the Gene Ontology.Nucl Acids Res,33:783-786
[6] Hearst, M.A. Search User Interfaces. Cambridge UP, 2009.
[7] Kellar, M., Watters, C. and Shepherd, M. 2007. A field study
characterizing web-based information seeking tasks. JASIST,
58(7), 999-1018.
[8] Lu, Z., Wilbur, J.W., et al. 2009. Finding query suggestions
for PubMed, AMIA Annu Symp Proc. 2009: 396–400.
[9] Schneider, A., Landefeld, R., Wermter, J. and Hahn, U.
(2009) Do users appreciate novel interface features for
literature search? Systems, Man and Cybernetics, SMC 2009.
[10] Tombros, A. and Sanderson, M. 1998. Advantages of query
biased summaries in IR. In SIGIR’98: 2-10.
[11] Tunkelang, D. Faceted Search. Morgan and Claypool, 2009.
[12] Tunkelang, D. 2006. Dynamic category sets: An approach
for facetted search. In SIGIR’06 Faceted Search Workshop.
ACKNOWLEDGMENTS & REFERENCES
19. DiscoveryOneTM
DiscoveryOne Content Enrichment is the
easiest way to improve search, enable
defensible deletion and identify document
security risks. By reading, categorizing
and tagging documents, DiscoveryOne
automatically creates metadata. This
metadata can be used in systems such as
enterprise search, document management,
email and CRM.
CONTENT ENRICHMENT
DiscoveryOne Content Inventory reads file
systems to present an overview of what
they contain. It identifies documents that
are redundant, outdated, trivial, or useful
and worth retaining. Usually, a file system
contains only 25% of relevant, valuable and
useful content, the rest are candidates for
disposal.
CONTENT INVENTORY
www.pingar.com | North America +1 408 663 2328 | Asia Pacific +64 9 950 3299 | Europe and Asia +91 80 4212 7047 | info@pingar.com
The study presented here was originally published as a workshop paper entitled: “Search interface feature evaluation in biosciences”
by Anna Divoli and Alyona Medelyan.
It was presented at the HCIR 2011 workshop that took place at Google’s Mountain View campus.