Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLucidworks
Spark can be used to improve the performance of importing and searching large datasets in Solr. Data can be imported from HDFS files into Solr in parallel using Spark, speeding up the import process. Spark can also be used to stream data from Solr into RDDs for further processing, such as aggregation, filtering, and joining with other data. Techniques like column-based denormalization and compressed storage of event data in Solr documents can reduce data volume and improve import and query speeds by orders of magnitude.
Improving Enterprise Findability: Presented by Jayesh Govindarajan, SalesforceLucidworks
1) Jayesh Govindarajan presented on improving enterprise search and findability at Salesforce. He discussed how enterprise search differs from consumer search, challenges with enterprise findability, and machine learning algorithms like LETOR that can be used.
2) Govindarajan explained that diversity of data, intentions, and customers makes enterprise search more complex than consumer search. Most enterprise search relies on simple ranking functions that may not reflect relevance well.
3) Machine learning algorithms like logistic regression and learning to rank can learn relevance from user behavior data like clicks and views. These algorithms output ranking models that can be deployed to search engines like Solr.
This document calls for action to transform the scholarly publishing system to open access. It outlines the challenges of the current subscription model, including rising journal costs that outpace inflation. Only about 15% of articles are currently open access. The document proposes developing a common strategy across countries to renegotiate publishing contracts and redirect subscription funds to support open access publishing and gold open access fees (APCs). It highlights Austria's efforts to transform its contracts with major publishers to provide increased open access. International cooperation is needed, as no single country can change the system alone. A coalition of countries has endorsed a goal of 100% open access by 2020 or 2025. The document encourages librarians and others to join the movement for transformation.
The document discusses Austria's transition to open science. It notes that key organizations in Austria, including the Austrian Science Fund, support open access mandates and funding for open access publishing. Universities Austria, a voluntary initiative with 55 member institutions, aims to transition all publications to gold open access by 2025. The document provides recommendations to support this transition, such as reorganizing publishing contracts, supporting international cooperation on publishing models, and introducing open access policies. It emphasizes that openness is important for science and that stakeholders must work together to transition to more open systems.
The document summarizes Austria's transition to open access. It notes that in 2012, Austria had only a few institutional repositories and was not a strong player in open access. It describes the voluntary Austrian National Action Plan on Open Access, with 57 member institutions working towards the goal of 100% gold open access by 2025. The action plan provides recommendations such as reorganizing publishing contracts, supporting international cooperation through funding, and introducing open access policies. The overall goal is to advance open access and open science through these coordinated efforts.
Can Repositories be fun? Thinking about repositoriesPatrick Danowski
This document discusses ways to make repositories more user-friendly. It outlines some of the challenges with the current IST Austria repository, including minimal usage and a focus on technical metadata over user experience. Some suggestions to improve the repository include auto-filling known metadata, simplifying fields, responsive design, and better integrating the repository with other research profiling systems and publication databases. The overall message is that repositories need to have a modern design and focus on providing full services to satisfy users rather than just storing content.
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLucidworks
Spark can be used to improve the performance of importing and searching large datasets in Solr. Data can be imported from HDFS files into Solr in parallel using Spark, speeding up the import process. Spark can also be used to stream data from Solr into RDDs for further processing, such as aggregation, filtering, and joining with other data. Techniques like column-based denormalization and compressed storage of event data in Solr documents can reduce data volume and improve import and query speeds by orders of magnitude.
Improving Enterprise Findability: Presented by Jayesh Govindarajan, SalesforceLucidworks
1) Jayesh Govindarajan presented on improving enterprise search and findability at Salesforce. He discussed how enterprise search differs from consumer search, challenges with enterprise findability, and machine learning algorithms like LETOR that can be used.
2) Govindarajan explained that diversity of data, intentions, and customers makes enterprise search more complex than consumer search. Most enterprise search relies on simple ranking functions that may not reflect relevance well.
3) Machine learning algorithms like logistic regression and learning to rank can learn relevance from user behavior data like clicks and views. These algorithms output ranking models that can be deployed to search engines like Solr.
This document calls for action to transform the scholarly publishing system to open access. It outlines the challenges of the current subscription model, including rising journal costs that outpace inflation. Only about 15% of articles are currently open access. The document proposes developing a common strategy across countries to renegotiate publishing contracts and redirect subscription funds to support open access publishing and gold open access fees (APCs). It highlights Austria's efforts to transform its contracts with major publishers to provide increased open access. International cooperation is needed, as no single country can change the system alone. A coalition of countries has endorsed a goal of 100% open access by 2020 or 2025. The document encourages librarians and others to join the movement for transformation.
The document discusses Austria's transition to open science. It notes that key organizations in Austria, including the Austrian Science Fund, support open access mandates and funding for open access publishing. Universities Austria, a voluntary initiative with 55 member institutions, aims to transition all publications to gold open access by 2025. The document provides recommendations to support this transition, such as reorganizing publishing contracts, supporting international cooperation on publishing models, and introducing open access policies. It emphasizes that openness is important for science and that stakeholders must work together to transition to more open systems.
The document summarizes Austria's transition to open access. It notes that in 2012, Austria had only a few institutional repositories and was not a strong player in open access. It describes the voluntary Austrian National Action Plan on Open Access, with 57 member institutions working towards the goal of 100% gold open access by 2025. The action plan provides recommendations such as reorganizing publishing contracts, supporting international cooperation through funding, and introducing open access policies. The overall goal is to advance open access and open science through these coordinated efforts.
Can Repositories be fun? Thinking about repositoriesPatrick Danowski
This document discusses ways to make repositories more user-friendly. It outlines some of the challenges with the current IST Austria repository, including minimal usage and a focus on technical metadata over user experience. Some suggestions to improve the repository include auto-filling known metadata, simplifying fields, responsive design, and better integrating the repository with other research profiling systems and publication databases. The overall message is that repositories need to have a modern design and focus on providing full services to satisfy users rather than just storing content.
The document discusses the challenges of building an e-only library for researchers at IST Austria. It describes how the library has transitioned to primarily electronic resources over time, growing its e-journal collection from 500 titles in 2010 to over 5,800 in 2014. However, building a completely e-only library faces obstacles, as some important books are not available in digital formats, and researchers still appreciate printed materials for certain tasks. The document concludes that a hybrid model is necessary to meet all user needs.
The document discusses open bibliographic data and recommends libraries make their data openly available by addressing technical issues, clarifying data ownership, increasing transparency, and applying open licenses. It encourages libraries to export their data, publish it online, describe it using common standards, and promote it within the open data community to advance open access to bibliographic information.
This document discusses using personal social networks as collaborative filters to find relevant information. It outlines that while search engines may return a high volume of results, most are irrelevant. Social networks could provide higher precision by leveraging relationships and shared interests among friends. The document proposes four solutions for social filtering: tagging, recommender systems, social search engines that integrate friends' shared content, and asking friends directly using social media. However, it notes that social filtering comes with its own tradeoffs and is not a perfect solution.
The document discusses opening up and sharing bibliographic data. It covers the history and development of the world wide web, semantic web, and open data movement. The presentation argues that libraries and cultural heritage institutions should make their bibliographic metadata openly available under open licenses in order to share information and resources with other institutions, users, and the world.
The document discusses open bibliographic data and new business models for libraries. Currently, libraries purchase bibliographic data or catalog materials themselves, sometimes resulting in redundant cataloging. The author proposes making bibliographic data openly available under public domain licenses to encourage reuse and new services around data curation, mapping, and scanning tables of contents on demand. Key challenges include determining an appropriate license and transforming data into the Web of Data. Several libraries have begun sharing their catalog data as open data.
Presentation given at the ASpB Conference in Karlsruhe, Germany (in German) ASpB = Arbeitsgemeinschaft der Spezial-Bibliotheken (Workinggroup for special libraries)
- Repositories provide early access to open data and services that enable discovery and reuse of content in one place. They allow for building collections, annotations, and persistent identifiers.
- Repositories could benefit from features that enable data exchange, object exchange, different interfaces, integration with other tools, linked data, and web services. They should also provide user control and help functionality.
- Lessons from successful web 2.0 services include being open, enabling sharing and collaboration, and getting integrated into user environments from the start.
The document discusses the challenges of building an e-only library for researchers at IST Austria. It describes how the library has transitioned to primarily electronic resources over time, growing its e-journal collection from 500 titles in 2010 to over 5,800 in 2014. However, building a completely e-only library faces obstacles, as some important books are not available in digital formats, and researchers still appreciate printed materials for certain tasks. The document concludes that a hybrid model is necessary to meet all user needs.
The document discusses open bibliographic data and recommends libraries make their data openly available by addressing technical issues, clarifying data ownership, increasing transparency, and applying open licenses. It encourages libraries to export their data, publish it online, describe it using common standards, and promote it within the open data community to advance open access to bibliographic information.
This document discusses using personal social networks as collaborative filters to find relevant information. It outlines that while search engines may return a high volume of results, most are irrelevant. Social networks could provide higher precision by leveraging relationships and shared interests among friends. The document proposes four solutions for social filtering: tagging, recommender systems, social search engines that integrate friends' shared content, and asking friends directly using social media. However, it notes that social filtering comes with its own tradeoffs and is not a perfect solution.
The document discusses opening up and sharing bibliographic data. It covers the history and development of the world wide web, semantic web, and open data movement. The presentation argues that libraries and cultural heritage institutions should make their bibliographic metadata openly available under open licenses in order to share information and resources with other institutions, users, and the world.
The document discusses open bibliographic data and new business models for libraries. Currently, libraries purchase bibliographic data or catalog materials themselves, sometimes resulting in redundant cataloging. The author proposes making bibliographic data openly available under public domain licenses to encourage reuse and new services around data curation, mapping, and scanning tables of contents on demand. Key challenges include determining an appropriate license and transforming data into the Web of Data. Several libraries have begun sharing their catalog data as open data.
Presentation given at the ASpB Conference in Karlsruhe, Germany (in German) ASpB = Arbeitsgemeinschaft der Spezial-Bibliotheken (Workinggroup for special libraries)
- Repositories provide early access to open data and services that enable discovery and reuse of content in one place. They allow for building collections, annotations, and persistent identifiers.
- Repositories could benefit from features that enable data exchange, object exchange, different interfaces, integration with other tools, linked data, and web services. They should also provide user control and help functionality.
- Lessons from successful web 2.0 services include being open, enabling sharing and collaboration, and getting integrated into user environments from the start.
14. Public Domain
• Reuse ohne Probleme moeglich
• egal ob als Teil oder Ganzes
• Freier Fluss von Daten
• groesster Reuse erreicht groesten Nutzen
Mittwoch, 2. Dezember 2009
15. Public Domain Waiver
• PD nicht als Lizenz möglich in D
• CC0 / Public Domain Data License
• Verpflichtung das keine eigene
Rechtsansprüche erhoben werden und
das alle abhängigen Lizenfragen geklärt
sind
Mittwoch, 2. Dezember 2009
16. Navigation
• Daten als Public Domain freigeben
• CC0 oder PDDL Weaver verwenden
• Daten in RDF transformieren
• Links nach aussen erstellen
Mittwoch, 2. Dezember 2009
17. Ziel
• Bibliothekarische Daten und Vokabulare
als Teil des Web of Data
Mittwoch, 2. Dezember 2009
18. Credits
• Bild Dead End by http://www.flickr.com/
photos/a_mason/19189587/
• Bild Ampel by Elke Wetzig http://
commons.wikimedia.org/wiki/
Image:Ampelmann_gruen.jpg
Mittwoch, 2. Dezember 2009