This document discusses search engines and how they work. It begins by explaining why custom search engines are built, such as to keep data local and customized or to improve performance. It then covers how search engines create indexes of terms mapped to document IDs to enable fast lookups. Building an index requires preprocessing documents through steps like tokenization, normalization, and filtering. The quality of search results depends on the preprocessing being identical for queries and documents. Open source tools can help build search indexes, while challenges include dealing with the tradeoffs of space, complexity and preprocessing time required for fast searches.
Introduction to Search Systems - ScaleConf Colombia 2017Toria Gibbs
Often when a new user arrives on your website, the first place they go to find information is the search box! Whether they are searching for hotels on your travel site, products on your e-commerce site, or friends to connect with on your social media site, it is important to have fast, effective search in order to engage the user.
A Search Index is Not a Database Index - Full Stack Toronto 2017Toria Gibbs
A search engine is not a database. Search systems are optimized for fast search using an internal data structure called an inverted index. Databases have a similar feature to allow quick access, also called an index, but it’s a totally different thing!
In this talk, Toria Gibbs will take you on a tour of the internals of a search index, comparing it to common implementations of indexing in relational databases. We’ll see how search engines can outperform databases and discuss the tradeoffs in implementing and maintaining such a system. No prior knowledge of database or search index implementations required; experience creating or querying database tables will be helpful.
Machine Learning in a Flash (Extended Edition 2): An Introduction to Neural N...Kory Becker
Learn the basics behind machine learning, neural networks, natural language processing, and clustering. In this presentation we’ll go over a handful of really quick machine learning algorithms. We’ll cover the difference between unsupervised and supervised learning in artificial intelligence, classification, clustering, and natural language processing to classify sentences as being about “eating”. We'll also see how to automatically categorize data under specific groups, using unsupervised learning, and apply topic detection to a finance data-set.
The document discusses odd behaviors in Python related to identity, mutability, and scope. It provides examples testing the identity and mutability of various Python objects. It also discusses issues that can arise from using mutable default arguments and provides tips on how to avoid these issues, such as using None as a default instead of a mutable object.
Query performance can either be a constant headache or the unsung hero of an application. MongoDB provides extremely powerful querying capabilities when used properly. As a member of the support team I will share common mistakes observed as well as tips and tricks to avoiding them.
Postgres the best tool you're already usingLiquidPlanner
This document discusses using Postgres with Ruby on Rails to add advanced features like tagging, hierarchies, custom data, and full text search. It covers using Postgres arrays to model tagging and club hierarchies, hstore to store custom user-defined data, and full text search capabilities. ActiveRecord scopes and queries are demonstrated for each feature. Sanitizing full text search queries and indexing the tsvector column are also discussed. The document provides an example application of a hedgehog social network and responds to various user requests to showcase how these Postgres features can be implemented.
The document provides examples and explanations of three methods - in_groups, in_groups_of, and split - from the ActiveSupport::CoreExtensions::Array::Grouping module in Ruby on Rails. These methods allow splitting arrays into groups or chunks of a specified size. in_groups and in_groups_of split arrays into groups or chunks respectively, while split divides an array based on a delimiter or block condition.
How Search Engines Work (A Thing I Didn't Learn in University)Toria Gibbs
You probably learned about databases in university, but did you learn about search engines? The search bar is the most important feature of many websites... and yet most people don’t know how it really works!
Toria Gibbs didn’t know how search engines work either, until she landed a job doing search infrastructure. Extrapolating (a.k.a. guessing) from what she knew about databases worked for a while, but eventually she had to buckle down and learn the fundamentals.
In this talk, we’ll learn the basic implementation of a search engine. We’ll see how search engines can outperform databases in some ways (but not others!) and what trade-offs were made to achieve this fast performance.
You’ll walk away knowing when to add a search engine to your project, how to build it using open source tools, and how to ace a technical interview and succeed at your job, even when you don’t already know the domain!
Introduction to Search Systems - ScaleConf Colombia 2017Toria Gibbs
Often when a new user arrives on your website, the first place they go to find information is the search box! Whether they are searching for hotels on your travel site, products on your e-commerce site, or friends to connect with on your social media site, it is important to have fast, effective search in order to engage the user.
A Search Index is Not a Database Index - Full Stack Toronto 2017Toria Gibbs
A search engine is not a database. Search systems are optimized for fast search using an internal data structure called an inverted index. Databases have a similar feature to allow quick access, also called an index, but it’s a totally different thing!
In this talk, Toria Gibbs will take you on a tour of the internals of a search index, comparing it to common implementations of indexing in relational databases. We’ll see how search engines can outperform databases and discuss the tradeoffs in implementing and maintaining such a system. No prior knowledge of database or search index implementations required; experience creating or querying database tables will be helpful.
Machine Learning in a Flash (Extended Edition 2): An Introduction to Neural N...Kory Becker
Learn the basics behind machine learning, neural networks, natural language processing, and clustering. In this presentation we’ll go over a handful of really quick machine learning algorithms. We’ll cover the difference between unsupervised and supervised learning in artificial intelligence, classification, clustering, and natural language processing to classify sentences as being about “eating”. We'll also see how to automatically categorize data under specific groups, using unsupervised learning, and apply topic detection to a finance data-set.
The document discusses odd behaviors in Python related to identity, mutability, and scope. It provides examples testing the identity and mutability of various Python objects. It also discusses issues that can arise from using mutable default arguments and provides tips on how to avoid these issues, such as using None as a default instead of a mutable object.
Query performance can either be a constant headache or the unsung hero of an application. MongoDB provides extremely powerful querying capabilities when used properly. As a member of the support team I will share common mistakes observed as well as tips and tricks to avoiding them.
Postgres the best tool you're already usingLiquidPlanner
This document discusses using Postgres with Ruby on Rails to add advanced features like tagging, hierarchies, custom data, and full text search. It covers using Postgres arrays to model tagging and club hierarchies, hstore to store custom user-defined data, and full text search capabilities. ActiveRecord scopes and queries are demonstrated for each feature. Sanitizing full text search queries and indexing the tsvector column are also discussed. The document provides an example application of a hedgehog social network and responds to various user requests to showcase how these Postgres features can be implemented.
The document provides examples and explanations of three methods - in_groups, in_groups_of, and split - from the ActiveSupport::CoreExtensions::Array::Grouping module in Ruby on Rails. These methods allow splitting arrays into groups or chunks of a specified size. in_groups and in_groups_of split arrays into groups or chunks respectively, while split divides an array based on a delimiter or block condition.
How Search Engines Work (A Thing I Didn't Learn in University)Toria Gibbs
You probably learned about databases in university, but did you learn about search engines? The search bar is the most important feature of many websites... and yet most people don’t know how it really works!
Toria Gibbs didn’t know how search engines work either, until she landed a job doing search infrastructure. Extrapolating (a.k.a. guessing) from what she knew about databases worked for a while, but eventually she had to buckle down and learn the fundamentals.
In this talk, we’ll learn the basic implementation of a search engine. We’ll see how search engines can outperform databases in some ways (but not others!) and what trade-offs were made to achieve this fast performance.
You’ll walk away knowing when to add a search engine to your project, how to build it using open source tools, and how to ace a technical interview and succeed at your job, even when you don’t already know the domain!
The document discusses common mistakes made by workshop applicants including string concatenation versus string interpolation, control flow issues like nested if/else statements that could be simplified with guard clauses, N+1 query problems that can be solved with eager loading, passing too many options to methods, exposing secrets in code repositories, duplicating styles, and using incorrect syntax for the template language being used. It provides examples of better approaches for each case discussed.
The TRECVID 2016 instance retrieval task involved finding a specific person in a specific location within a BBC soap opera video collection. Participants were given example images and video shots of the target person and location, and asked to return ranked shots where the person appeared in the given location. A total of 13 teams participated, with the top approaches using CNNs to detect faces and traditional SIFT features to model locations. The new addition of video examples in addition to images helped performance. Presentations from 4 participating teams followed, describing their approaches to this instance search task.
In this talk we’ll cover the basics of search relevancy in elasticsearch from how relevancy is calculated and modeled to modifying query structure, setting up analyzer chains and how to measure incremental improvements. The talk will highlight several real world relevancy scenarios encountered in the consulting work at KMW Technology, a leading provider of search professional services to major organizations.
Assumptions: Check yo'self before you wreck yourselfErin Shellman
Predicting the future is hard and it requires a lot of assumptions, also known as beliefs, also known as faith. In “Assumptions: Check yo self, before you wreck yo self” we explore the consequences of beliefs when constructing predictive models. We’ll walk through the process of developing a demand forecast for Evo, a Seattle-based outdoor recreation retailer, and discuss how assumptions influence the behavior of your application and ultimately the decisions you make.
Storing Time Series Metrics With Cassandra and Composite ColumnsJoe Stein
This document discusses storing and aggregating time series metrics in Cassandra using counters and composite columns. It provides an example schema using multiple column families partitioned by time period (day, hour, minute, second). Data is inserted by incrementing counters for composite column names representing the aggregated values. Retrieval involves multiget queries on ranges of composite column names to retrieve aggregated counts for a time period.
The document is a Microsoft presentation toolkit that includes slide templates, design variations, and photography to help users accelerate their presentation design process. The toolkit provides tips for using images and icons in slides and indicates that additional resources can be found on the last page. Sample slides are included that demonstrate different layouts, elements, and content that can be used to create a presentation.
The document discusses techniques for estimating work in Agile projects using story points and ideal days. It defines story points and ideal days, and explains how to assign estimates relatively by comparing stories rather than using specific units of time. The document also recommends estimating approaches like planning poker, re-estimating as stories change, and using the right units to keep estimates meaningful but relative.
Google INSTANT SEO -- Ecommerce Search Engine Optimization for Yahoo! StoresRob Snell
The document provides tips and strategies for optimizing an ecommerce website and online store for search engine optimization (SEO). It discusses the importance of keywords, writing unique product descriptions, prioritizing SEO pages and keywords based on revenue, de-templatifying the store, and collecting converting keywords to drive additional sales. Specific on-page optimization techniques are also recommended, like including keywords in page names, captions, and link text.
Agile experiments in Machine Learning with F#J On The Beach
Just like traditional applications development, machine learning involves writing code. One aspect where the two differ is the workflow. While software development follows a fairly linear process (design, develop, and deploy a feature), machine learning is a different beast. You work on a single feature, which is never 100% complete. You constantly run experiments, and re-design your model in depth at a rapid pace. Traditional tests are entirely useless. Validating whether you are on the right track takes minutes, if not hours.
In this talk, we will take the example of a Machine Learning competition we recently participated in, the Kaggle Home Depot competition, to illustrate what "doing Machine Learning" looks like. We will explain the challenges we faced, and how we tackled them, setting up a harness to easily create and run experiments, while keeping our sanity. We will also draw comparisons with traditional software development, and highlight how some ideas translate from one context to the other, adapted to different constraints.
Website Personalisation DIY with Google Tag Manager - AllThingsData '18Johannes Radig
Slides from my talk at AllThingsData Tel Aviv, presented on May 2nd 2018. Learn how to use Google Tag Manager to set up an A/B testing and website personalisation engine.
Crush Competitors with Deep On-Page SEO TacticsPJ Howland
This document discusses using TF-IDF (term frequency-inverse document frequency) analysis to improve on-page SEO and target featured snippets. It recommends identifying important pages to analyze with TF-IDF, running the analysis to find important keywords and topics, making recommendations to optimize pages based on the results, targeting associated featured snippets, and monitoring improved rankings and traffic. The document provides examples of using TF-IDF to optimize pages for specific keywords and increase their relevance to topics important to searchers and Google.
This document discusses using Scrum with multiple product backlogs to manage maintenance projects at ADP with an extended team in India. It describes ADP's traditional module-based working model and challenges with an extended team. With Agile, the India team takes on deliverables broken into user stories across multiple backlogs. Each backlog is planned separately based on size, release date, and velocity. This allows parallel progress while maintaining Agility. Results included increased utilization, cross-functional ownership, and team confidence.
This document provides an introduction to Python for high school programmers. It covers background information on Python, key concepts like data types and operators, and basics of the language like variables, collections, control flow, and object-oriented programming. Code examples are included to demonstrate various features. The presentation aims to get students started with Python and provide an overview of what it can do.
This document provides tips and strategies for creating compelling content to optimize an e-commerce store for search engine optimization. It recommends telling stories with buyers' guides and reviews, conducting interviews to generate text content, writing unique product descriptions, and prioritizing and optimizing keywords based on revenue. Content should include photos, videos, and long-tail keyword phrases to provide more value than competitors.
This presentation provides valuable insights into effective cost-saving techniques on AWS. Learn how to optimize your AWS resources by rightsizing, increasing elasticity, picking the right storage class, and choosing the best pricing model. Additionally, discover essential governance mechanisms to ensure continuous cost efficiency. Whether you are new to AWS or an experienced user, this presentation provides clear and practical tips to help you reduce your cloud costs and get the most out of your budget.
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...alexjohnson7307
Predictive maintenance is a proactive approach that anticipates equipment failures before they happen. At the forefront of this innovative strategy is Artificial Intelligence (AI), which brings unprecedented precision and efficiency. AI in predictive maintenance is transforming industries by reducing downtime, minimizing costs, and enhancing productivity.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Trusted Execution Environment for Decentralized Process MiningLucaBarbaro3
Presentation of the paper "Trusted Execution Environment for Decentralized Process Mining" given during the CAiSE 2024 Conference in Cyprus on June 7, 2024.
The document discusses common mistakes made by workshop applicants including string concatenation versus string interpolation, control flow issues like nested if/else statements that could be simplified with guard clauses, N+1 query problems that can be solved with eager loading, passing too many options to methods, exposing secrets in code repositories, duplicating styles, and using incorrect syntax for the template language being used. It provides examples of better approaches for each case discussed.
The TRECVID 2016 instance retrieval task involved finding a specific person in a specific location within a BBC soap opera video collection. Participants were given example images and video shots of the target person and location, and asked to return ranked shots where the person appeared in the given location. A total of 13 teams participated, with the top approaches using CNNs to detect faces and traditional SIFT features to model locations. The new addition of video examples in addition to images helped performance. Presentations from 4 participating teams followed, describing their approaches to this instance search task.
In this talk we’ll cover the basics of search relevancy in elasticsearch from how relevancy is calculated and modeled to modifying query structure, setting up analyzer chains and how to measure incremental improvements. The talk will highlight several real world relevancy scenarios encountered in the consulting work at KMW Technology, a leading provider of search professional services to major organizations.
Assumptions: Check yo'self before you wreck yourselfErin Shellman
Predicting the future is hard and it requires a lot of assumptions, also known as beliefs, also known as faith. In “Assumptions: Check yo self, before you wreck yo self” we explore the consequences of beliefs when constructing predictive models. We’ll walk through the process of developing a demand forecast for Evo, a Seattle-based outdoor recreation retailer, and discuss how assumptions influence the behavior of your application and ultimately the decisions you make.
Storing Time Series Metrics With Cassandra and Composite ColumnsJoe Stein
This document discusses storing and aggregating time series metrics in Cassandra using counters and composite columns. It provides an example schema using multiple column families partitioned by time period (day, hour, minute, second). Data is inserted by incrementing counters for composite column names representing the aggregated values. Retrieval involves multiget queries on ranges of composite column names to retrieve aggregated counts for a time period.
The document is a Microsoft presentation toolkit that includes slide templates, design variations, and photography to help users accelerate their presentation design process. The toolkit provides tips for using images and icons in slides and indicates that additional resources can be found on the last page. Sample slides are included that demonstrate different layouts, elements, and content that can be used to create a presentation.
The document discusses techniques for estimating work in Agile projects using story points and ideal days. It defines story points and ideal days, and explains how to assign estimates relatively by comparing stories rather than using specific units of time. The document also recommends estimating approaches like planning poker, re-estimating as stories change, and using the right units to keep estimates meaningful but relative.
Google INSTANT SEO -- Ecommerce Search Engine Optimization for Yahoo! StoresRob Snell
The document provides tips and strategies for optimizing an ecommerce website and online store for search engine optimization (SEO). It discusses the importance of keywords, writing unique product descriptions, prioritizing SEO pages and keywords based on revenue, de-templatifying the store, and collecting converting keywords to drive additional sales. Specific on-page optimization techniques are also recommended, like including keywords in page names, captions, and link text.
Agile experiments in Machine Learning with F#J On The Beach
Just like traditional applications development, machine learning involves writing code. One aspect where the two differ is the workflow. While software development follows a fairly linear process (design, develop, and deploy a feature), machine learning is a different beast. You work on a single feature, which is never 100% complete. You constantly run experiments, and re-design your model in depth at a rapid pace. Traditional tests are entirely useless. Validating whether you are on the right track takes minutes, if not hours.
In this talk, we will take the example of a Machine Learning competition we recently participated in, the Kaggle Home Depot competition, to illustrate what "doing Machine Learning" looks like. We will explain the challenges we faced, and how we tackled them, setting up a harness to easily create and run experiments, while keeping our sanity. We will also draw comparisons with traditional software development, and highlight how some ideas translate from one context to the other, adapted to different constraints.
Website Personalisation DIY with Google Tag Manager - AllThingsData '18Johannes Radig
Slides from my talk at AllThingsData Tel Aviv, presented on May 2nd 2018. Learn how to use Google Tag Manager to set up an A/B testing and website personalisation engine.
Crush Competitors with Deep On-Page SEO TacticsPJ Howland
This document discusses using TF-IDF (term frequency-inverse document frequency) analysis to improve on-page SEO and target featured snippets. It recommends identifying important pages to analyze with TF-IDF, running the analysis to find important keywords and topics, making recommendations to optimize pages based on the results, targeting associated featured snippets, and monitoring improved rankings and traffic. The document provides examples of using TF-IDF to optimize pages for specific keywords and increase their relevance to topics important to searchers and Google.
This document discusses using Scrum with multiple product backlogs to manage maintenance projects at ADP with an extended team in India. It describes ADP's traditional module-based working model and challenges with an extended team. With Agile, the India team takes on deliverables broken into user stories across multiple backlogs. Each backlog is planned separately based on size, release date, and velocity. This allows parallel progress while maintaining Agility. Results included increased utilization, cross-functional ownership, and team confidence.
This document provides an introduction to Python for high school programmers. It covers background information on Python, key concepts like data types and operators, and basics of the language like variables, collections, control flow, and object-oriented programming. Code examples are included to demonstrate various features. The presentation aims to get students started with Python and provide an overview of what it can do.
This document provides tips and strategies for creating compelling content to optimize an e-commerce store for search engine optimization. It recommends telling stories with buyers' guides and reviews, conducting interviews to generate text content, writing unique product descriptions, and prioritizing and optimizing keywords based on revenue. Content should include photos, videos, and long-tail keyword phrases to provide more value than competitors.
Ähnlich wie Search Engines: How They Work and Why You Need Them (15)
This presentation provides valuable insights into effective cost-saving techniques on AWS. Learn how to optimize your AWS resources by rightsizing, increasing elasticity, picking the right storage class, and choosing the best pricing model. Additionally, discover essential governance mechanisms to ensure continuous cost efficiency. Whether you are new to AWS or an experienced user, this presentation provides clear and practical tips to help you reduce your cloud costs and get the most out of your budget.
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...alexjohnson7307
Predictive maintenance is a proactive approach that anticipates equipment failures before they happen. At the forefront of this innovative strategy is Artificial Intelligence (AI), which brings unprecedented precision and efficiency. AI in predictive maintenance is transforming industries by reducing downtime, minimizing costs, and enhancing productivity.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Trusted Execution Environment for Decentralized Process MiningLucaBarbaro3
Presentation of the paper "Trusted Execution Environment for Decentralized Process Mining" given during the CAiSE 2024 Conference in Cyprus on June 7, 2024.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on automated letter generation for Bonterra Impact Management using Google Workspace or Microsoft 365.
Interested in deploying letter generation automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfflufftailshop
When it comes to unit testing in the .NET ecosystem, developers have a wide range of options available. Among the most popular choices are NUnit, XUnit, and MSTest. These unit testing frameworks provide essential tools and features to help ensure the quality and reliability of code. However, understanding the differences between these frameworks is crucial for selecting the most suitable one for your projects.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
16. id title price
1 red cat mittens 14.99
2 blue dog mittens 24.99
3 blue hat for cats 8.00
4 vacation hat for dog 12.99
5 cat hat 5.00
6 red and blue dog hat 10.49
7 kitten mittens 11.99
8 dog booties 11.99
17. id title price
1 red cat mittens 14.99
2 blue dog mittens 24.99
3 blue hat for cats 8.00
4 vacation hat for dog 12.99
5 cat hat 5.00
6 red and blue dog hat 10.49
7 kitten mittens 11.99
8 dog booties 11.99
cat
SELECT *
FROM items
WHERE title LIKE ‘%cat%’
18. id title price
1 red cat mittens 14.99
2 blue dog mittens 24.99
3 blue hat for cats 8.00
4 vacation hat for dog 12.99
5 cat hat 5.00
6 red and blue dog hat 10.49
7 kitten mittens 11.99
8 dog booties 11.99
cat
SELECT *
FROM items
WHERE title LIKE ‘%cat%’
19. n = items in database
m = max length of title strings
n·m
20. n = items in database
m = max length of title strings = 250
O(n)
22. Why build search engines?
● Keep it local and customize it
● Improve performance
23. id title price
1 red cat mittens 14.99
2 blue dog mittens 24.99
3 blue hat for cats 8.00
4 vacation hat for dog 12.99
5 cat hat 5.00
6 red and blue dog hat 10.49
7 kitten mittens 11.99
8 dog booties 11.99
SELECT *
FROM items
WHERE title LIKE ‘%cat%’
24. id title price
1 red cat mittens 14.99
2 blue dog mittens 24.99
3 blue hat for cats 8.00
4 vacation hat for dog 12.99
5 cat hat 5.00
6 red and blue dog hat 10.49
7 kitten mittens 11.99
8 dog booties 11.99
● Search for “cat” incorrectly
returns “vacation hat for dog”
SELECT *
FROM items
WHERE title LIKE ‘%cat%’
25. id title price
1 red cat mittens 14.99
2 blue dog mittens 24.99
3 blue hat for cats 8.00
4 vacation hat for dog 12.99
5 cat hat 5.00
6 red and blue dog hat 10.49
7 kitten mittens 11.99
8 dog booties 11.99
● Search for “cat” incorrectly
returns “vacation hat for dog”
● Search for “cat” doesn’t return
“kitten mittens”
SELECT *
FROM items
WHERE title LIKE ‘%cat%’
26. id title price
1 red cat mittens 14.99
2 blue dog mittens 24.99
3 blue hat for cats 8.00
4 vacation hat for dog 12.99
5 cat hat 5.00
6 red and blue dog hat 10.49
7 kitten mittens 11.99
8 dog booties 11.99
● Search for “cat” incorrectly
returns “vacation hat for dog”
● Search for “cat” doesn’t return
“kitten mittens”
● Search for “cats” doesn’t return
“cat hat” or “red cat mittens”
SELECT *
FROM items
WHERE title LIKE ‘%cats%’
27. SELECT * FROM items
WHERE title LIKE ‘cat’ OR title LIKE ‘cats’
OR title LIKE ‘cat %’ OR title LIKE ‘cats %’
OR title LIKE ‘% cat’ OR title LIKE ‘% cats’
OR title LIKE ‘% cat %’ OR title LIKE ‘% cats %’
OR title LIKE ‘% cat.%’ OR title LIKE ‘% cats.%’
OR title LIKE ‘%.cat %’ OR title LIKE ‘%.cats %’
OR title LIKE ‘%.cat.%’ OR title LIKE ‘%.cats.%’
OR title LIKE ‘% cat,%’ OR title LIKE ‘% cats,%’
OR title LIKE ‘%,cat %’ OR title LIKE ‘%,cats %’
OR title LIKE ‘%,cat,%’ OR title LIKE ‘%,cats,%’
OR title LIKE ‘% cat-%’ OR title LIKE ‘% cats-%’
OR title LIKE ‘%-cat %’ OR title LIKE ‘%-cats %’
OR title LIKE ‘%-cat-%’ OR title LIKE ‘%-cats-%’
...
28. Why build search engines?
● Keep it local and customize it
● Improve performance
● Improve quality of results
31. red [1, 6]
cat [1, 3, 5]
mitten [2, 7]
blue [2, 3, 6]
hat [3, 4, 5, 6]
dog [2, 4, 6, 8]
vacation [4]
kitten [7]
boot [8]
id title price
1 red cat mittens 14.99
2 blue dog mittens 24.99
3 blue hat for cats 8.00
4 vacation hat for dog 12.99
5 cat hat 5.00
6 red and blue dog hat 10.49
7 kitten mittens 11.99
8 dog booties 11.99
32. red [1, 6]
cat [1, 3, 5]
mitten [2, 7]
blue [2, 3, 6]
hat [3, 4, 5, 6]
dog [2, 4, 6, 8]
vacation [4]
kitten [7]
boot [8]
Inverted
Index
33. Terminology
● A document is a single searchable unit
red [1, 6]
cat [1, 3, 5]
mitten [2, 7]
blue [2, 3, 6]
hat [3, 4, 5, 6]
dog [2, 4, 6, 8]
vacation [4]
kitten [7]
boot [8]
7 kitten mittens 11.99
34. Terminology
● A document is a single searchable unit
● A field is a defined value in a document
red [1, 6]
cat [1, 3, 5]
mitten [2, 7]
blue [2, 3, 6]
hat [3, 4, 5, 6]
dog [2, 4, 6, 8]
vacation [4]
kitten [7]
boot [8]
id title price
7 kitten mittens 11.99
35. Terminology
● A document is a single searchable unit
● A field is a defined value in a document
● A term is a value extracted from the
source in order to build the index
red [1, 6]
cat [1, 3, 5]
mitten [2, 7]
blue [2, 3, 6]
hat [3, 4, 5, 6]
dog [2, 4, 6, 8]
vacation [4]
kitten [7]
boot [8]
id title price
7 kitten mittens 11.99
36. Terminology
● A document is a single searchable unit
● A field is a defined value in a document
● A term is a value extracted from the
source in order to build the index
● An inverted index is an internal data
structure which maps terms to IDs
red [1, 6]
cat [1, 3, 5]
mitten [2, 7]
blue [2, 3, 6]
hat [3, 4, 5, 6]
dog [2, 4, 6, 8]
vacation [4]
kitten [7]
boot [8]
37. Terminology
● A document is a single searchable unit
● A field is a defined value in a document
● A term is a value extracted from the
source in order to build the index
● An inverted index is an internal data
structure which maps terms to IDs
● An index is a collection of documents
(including many inverted indexes)
red [1, 6]
cat [1, 3, 5]
mitten [2, 7]
blue [2, 3, 6]
... ...
5.00 [5]
8.00 [3]
0-10.00 [3, 5]
11.99 [7, 8]
... ...
id title price
1 red cat mittens 14.99
2 blue dog mittens 24.99
... ... ...
38. items indexTerminology
● A search index can have
many inverted indexes
● A search engine can have
many search indexes
title inverted index
price inverted index
blog-posts index
title inverted index
post inverted index
39. Did we solve it?
● Keep it local ✓ and customize it
● Improve performance
● Improve quality of results
40. red [1, 6]
cat [1, 3, 5]
mitten [2, 7]
blue [2, 3, 6]
hat [3, 4, 5, 6]
dog [2, 4, 6, 8]
vacation [4]
kitten [7]
boot [8]
cat
42. red [1, 6]
cat [1, 3, 5]
mitten [2, 7]
blue [2, 3, 6]
hat [3, 4, 5, 6]
dog [2, 4, 6, 8]
vacation [4]
kitten [7]
boot [8]
cat
id title price
1 red cat mittens 14.99
3 blue hat for cats 8.00
5 cat hat 5.00
49. Did we solve it?
● Keep it local ✓ and customize it
● Improve performance ✓
○ At the expense of space, complexity, and pre-processing effort
● Improve quality of results
51. red [1, 6]
cat [1, 3, 5]
mitten [2, 7]
blue [2, 3, 6]
hat [3, 4, 5, 6]
dog [2, 4, 6, 8]
vacation [4]
kitten [7]
boot [8]
id title price
1 red cat mittens 14.99
2 blue dog mittens 24.99
3 blue hat for cats 8.00
4 vacation hat for dog 12.99
5 cat hat 5.00
6 red and blue dog hat 10.49
7 kitten mittens 11.99
8 dog booties 11.99
How did we do this??
56. Quality Problems
1. “cat” search returned “vacation hat for dog”
id title price
4 vacation hat for dog 12.99
cat [1, 3, 5]
hat [4]
dog [4]
vacation [4]
57. Quality Problems
1. “cat” search returned “vacation hat for dog”
cat [1, 3, 5]
hat [4]
dog [4]
vacation [4]
cat
id title price
4 vacation hat for dog 12.99
58. Quality Problems
1. “cat” search returned “vacation hat for dog”
2. “cats” search does not return “red cat mittens”
59. Quality Problems
2. “cats” search does not return “red cat mittens”
id title price
1 red cat mittens 14.99
red [1]
cat [1]
mitten [1]
→
61. Quality Problems
2. “cats” search does not return “red cat mittens”
id title price
1 red cat mittens 14.99
red [1]
cat [1]
mitten [1]
cats cat
62. Quality Problems
1. “cat” search returned “vacation hat for dogs”
2. “cats” search does not return “red cat mittens”
3. “cat” search does not return “kitten mittens”
63. Quality Problems
3. “cat” search does not return “kitten mittens”
id title price
7 kitten mittens 11.99
cat [7]
mitten [7]
64. Quality Problems
3. “cat” search does not return “kitten mittens”
cat [7]
mitten [7]
id title price
7 kitten mittens 11.99
cat
65. Quality Problems
3 ½ search for “kitten” still returns “kitten mittens”
cat [7]
mitten [7]
id title price
7 kitten mittens 11.99
kitten cat
66. Did we solve it?
● Keep it local ✓ and customize it ✓
● Improve performance ✓
○ At the expense of space, complexity, and pre-processing effort
● Improve quality of results ✓
○ By performing special pre-processing steps
68. I want a search engine...
do I have to build it myself?
@scarletdrive
69.
70. ● Inverted index
● Basic tokenization,
normalization, and filters
● Replication, sharding, and
distribution
● Caching and warming
● Advanced tokenization,
normalization, and filters
● Plugins
● ...and more!
72. Which one should I pick?
● Most projects work well with either
● Getting configuration right is most important
● Test with your own data, your own queries
Side by Side with Elasticsearch and Solr by Rafał Kuć and Radu Gheorghe
https://berlinbuzzwords.de/14/session/side-side-elasticsearch-and-solr
https://berlinbuzzwords.de/15/session/side-side-elasticsearch-solr-part-2-performance-scalability
Solr vs. Elasticsearch by Kelvin Tan
http://solr-vs-elasticsearch.com/
73. Which one should I pick?
Better for advanced
customization
Easier to learn, faster to
start up, better docs
~ ~ WARNING: Toria’s personal opinion ~ ~
82. id title price
1 red cat mittens 14.99
3 blue hat for cats 8.00
5 cat hat 5.00
22 feather cat toy 7.99
124 cat and mouse t-shirt 24.50
128 cat t-shirt 31.80
329 “cats rule” sticker 0.99
420 catnip joint for cats 5.99
455 cat toy 7.00
... ... ...
When there are
many results, what
order should we
display them in?
84. TF(term) = # times this term appears in doc / total # terms in doc
IDF(term) = loge
(total number of docs / # docs which contain this term)
Relevance with tf-idf
1. The orange cat is a very good cat.
2. My cat ate an orange.
3. Cats are the best and I will give
every cat a special cat toy.
1. TF(cat) = 2/8 = 0.25
2. TF(cat) = 1/5 = 0.20
3. TF(cat) = 3/14 = 0.21
IDF(cat) = loge
(3/3)
Result order = [1, 3, 2]Query: “cat”
85. TF(term) = # times this term appears in doc / total # terms in doc
IDF(term) = loge
(total number of docs / # docs which contain this term)
Relevance with tf-idf
1. The orange cat is a very good cat.
2. My cat ate an orange. Cat cat cat!
3. Cats are the best and I will give
every cat a special cat toy.
1. TF(cat) = 2/8 = 0.25
2. TF(cat) = 4/8 = 0.50
3. TF(cat) = 3/14 = 0.21
IDF(cat) = loge
(3/3)
Result order = [2, 1, 3]Query: “cat”
86. TF(term) = # times this term appears in doc / total # terms in doc
IDF(term) = loge
(total number of docs / # docs which contain this term)
Relevance with tf-idf
1. The orange cat is a good cat.
2. My cat ate an orange.
(assume 100 records which all contain
“cat” in them)
IDF(cat) = loge
(100/100) = 0.0
IDF(orange) = loge
(100/2) = 3.9
Query: “orange cat”
87. TF(term) = # times this term appears in doc / total # terms in doc
IDF(term) = loge
(total number of docs / # docs which contain this term)
Relevance with tf-idf
1. The orange cat is a good cat.
2. My cat ate an orange.
Query: “orange cat”
IDF(cat) = loge
(100/100) = 0.0
IDF(orange) = loge
(100/2) = 3.9
score = score(cat, doc1) + s(orange, doc1) = 0.29*0.0 + 0.14*3.9 = 0.55
score = score(cat, doc2) + s(orange, doc2) = 0.20*0.0 + 0.20*3.9 = 0.78
88. TF(term) = # times this term appears in doc / total # terms in doc
IDF(term) = loge
(total number of docs / # docs which contain this term)
Relevance with tf-idf
1. The orange cat is a good cat.
2. My cat ate an orange.
Result order = [2, 1]Query: “orange cat”
IDF(cat) = loge
(100/100) = 0.0
IDF(orange) = loge
(100/2) = 3.9
score = score(cat, doc1) + s(orange, doc1) = 0.29*0.0 + 0.14*3.9 = 0.55
score = score(cat, doc2) + s(orange, doc2) = 0.20*0.0 + 0.20*3.9 = 0.78
3/7 = 0.43
2/5 = 0.40
1/7 = 0.14
1/5 = 0.20
90. Relevance Challenges
● Prevent keyword stuffing or other “gaming the system”
● Phrase matching
● Fuzzy matching
● User factors: language, location
● Other factors: quality, recency, randomness, diversity
91. Interesting Challenges
● Scalability
● Relevance
● Query understanding
● Numeric range search
● Faceted search
● Autocomplete
We covered: We did not cover: