You probably learned about databases in university, but did you learn about search engines? The search bar is the most important feature of many websites... and yet most people don’t know how it really works!
Toria Gibbs didn’t know how search engines work either, until she landed a job doing search infrastructure. Extrapolating (a.k.a. guessing) from what she knew about databases worked for a while, but eventually she had to buckle down and learn the fundamentals.
In this talk, we’ll learn the basic implementation of a search engine. We’ll see how search engines can outperform databases in some ways (but not others!) and what trade-offs were made to achieve this fast performance.
You’ll walk away knowing when to add a search engine to your project, how to build it using open source tools, and how to ace a technical interview and succeed at your job, even when you don’t already know the domain!
Query performance can either be a constant headache or the unsung hero of an application. MongoDB provides extremely powerful querying capabilities when used properly. As a member of the support team I will share common mistakes observed as well as tips and tricks to avoiding them.
Machine Learning in a Flash (Extended Edition 2): An Introduction to Neural N...Kory Becker
Learn the basics behind machine learning, neural networks, natural language processing, and clustering. In this presentation we’ll go over a handful of really quick machine learning algorithms. We’ll cover the difference between unsupervised and supervised learning in artificial intelligence, classification, clustering, and natural language processing to classify sentences as being about “eating”. We'll also see how to automatically categorize data under specific groups, using unsupervised learning, and apply topic detection to a finance data-set.
Search Engines: How They Work and Why You Need ThemToria Gibbs
This document discusses search engines and how they work. It begins by explaining why custom search engines are built, such as to keep data local and customized or to improve performance. It then covers how search engines create indexes of terms mapped to document IDs to enable fast lookups. Building an index requires preprocessing documents through steps like tokenization, normalization, and filtering. The quality of search results depends on the preprocessing being identical for queries and documents. Open source tools can help build search indexes, while challenges include dealing with the tradeoffs of space, complexity and preprocessing time required for fast searches.
A Search Index is Not a Database Index - Full Stack Toronto 2017Toria Gibbs
A search engine is not a database. Search systems are optimized for fast search using an internal data structure called an inverted index. Databases have a similar feature to allow quick access, also called an index, but it’s a totally different thing!
In this talk, Toria Gibbs will take you on a tour of the internals of a search index, comparing it to common implementations of indexing in relational databases. We’ll see how search engines can outperform databases and discuss the tradeoffs in implementing and maintaining such a system. No prior knowledge of database or search index implementations required; experience creating or querying database tables will be helpful.
Introduction to Search Systems - ScaleConf Colombia 2017Toria Gibbs
Often when a new user arrives on your website, the first place they go to find information is the search box! Whether they are searching for hotels on your travel site, products on your e-commerce site, or friends to connect with on your social media site, it is important to have fast, effective search in order to engage the user.
The document discusses common mistakes made by workshop applicants including string concatenation versus string interpolation, control flow issues like nested if/else statements that could be simplified with guard clauses, N+1 query problems that can be solved with eager loading, passing too many options to methods, exposing secrets in code repositories, duplicating styles, and using incorrect syntax for the template language being used. It provides examples of better approaches for each case discussed.
The document discusses odd behaviors in Python related to identity, mutability, and scope. It provides examples testing the identity and mutability of various Python objects. It also discusses issues that can arise from using mutable default arguments and provides tips on how to avoid these issues, such as using None as a default instead of a mutable object.
Novel machine learning techniques comes from spending time with people that have distinct needs. This talk addresses how listening to end users can give rise to novel machine learning applications.
Query performance can either be a constant headache or the unsung hero of an application. MongoDB provides extremely powerful querying capabilities when used properly. As a member of the support team I will share common mistakes observed as well as tips and tricks to avoiding them.
Machine Learning in a Flash (Extended Edition 2): An Introduction to Neural N...Kory Becker
Learn the basics behind machine learning, neural networks, natural language processing, and clustering. In this presentation we’ll go over a handful of really quick machine learning algorithms. We’ll cover the difference between unsupervised and supervised learning in artificial intelligence, classification, clustering, and natural language processing to classify sentences as being about “eating”. We'll also see how to automatically categorize data under specific groups, using unsupervised learning, and apply topic detection to a finance data-set.
Search Engines: How They Work and Why You Need ThemToria Gibbs
This document discusses search engines and how they work. It begins by explaining why custom search engines are built, such as to keep data local and customized or to improve performance. It then covers how search engines create indexes of terms mapped to document IDs to enable fast lookups. Building an index requires preprocessing documents through steps like tokenization, normalization, and filtering. The quality of search results depends on the preprocessing being identical for queries and documents. Open source tools can help build search indexes, while challenges include dealing with the tradeoffs of space, complexity and preprocessing time required for fast searches.
A Search Index is Not a Database Index - Full Stack Toronto 2017Toria Gibbs
A search engine is not a database. Search systems are optimized for fast search using an internal data structure called an inverted index. Databases have a similar feature to allow quick access, also called an index, but it’s a totally different thing!
In this talk, Toria Gibbs will take you on a tour of the internals of a search index, comparing it to common implementations of indexing in relational databases. We’ll see how search engines can outperform databases and discuss the tradeoffs in implementing and maintaining such a system. No prior knowledge of database or search index implementations required; experience creating or querying database tables will be helpful.
Introduction to Search Systems - ScaleConf Colombia 2017Toria Gibbs
Often when a new user arrives on your website, the first place they go to find information is the search box! Whether they are searching for hotels on your travel site, products on your e-commerce site, or friends to connect with on your social media site, it is important to have fast, effective search in order to engage the user.
The document discusses common mistakes made by workshop applicants including string concatenation versus string interpolation, control flow issues like nested if/else statements that could be simplified with guard clauses, N+1 query problems that can be solved with eager loading, passing too many options to methods, exposing secrets in code repositories, duplicating styles, and using incorrect syntax for the template language being used. It provides examples of better approaches for each case discussed.
The document discusses odd behaviors in Python related to identity, mutability, and scope. It provides examples testing the identity and mutability of various Python objects. It also discusses issues that can arise from using mutable default arguments and provides tips on how to avoid these issues, such as using None as a default instead of a mutable object.
Novel machine learning techniques comes from spending time with people that have distinct needs. This talk addresses how listening to end users can give rise to novel machine learning applications.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on automated letter generation for Bonterra Impact Management using Google Workspace or Microsoft 365.
Interested in deploying letter generation automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Dive into the realm of operating systems (OS) with Pravash Chandra Das, a seasoned Digital Forensic Analyst, as your guide. 🚀 This comprehensive presentation illuminates the core concepts, types, and evolution of OS, essential for understanding modern computing landscapes.
Beginning with the foundational definition, Das clarifies the pivotal role of OS as system software orchestrating hardware resources, software applications, and user interactions. Through succinct descriptions, he delineates the diverse types of OS, from single-user, single-task environments like early MS-DOS iterations, to multi-user, multi-tasking systems exemplified by modern Linux distributions.
Crucial components like the kernel and shell are dissected, highlighting their indispensable functions in resource management and user interface interaction. Das elucidates how the kernel acts as the central nervous system, orchestrating process scheduling, memory allocation, and device management. Meanwhile, the shell serves as the gateway for user commands, bridging the gap between human input and machine execution. 💻
The narrative then shifts to a captivating exploration of prominent desktop OSs, Windows, macOS, and Linux. Windows, with its globally ubiquitous presence and user-friendly interface, emerges as a cornerstone in personal computing history. macOS, lauded for its sleek design and seamless integration with Apple's ecosystem, stands as a beacon of stability and creativity. Linux, an open-source marvel, offers unparalleled flexibility and security, revolutionizing the computing landscape. 🖥️
Moving to the realm of mobile devices, Das unravels the dominance of Android and iOS. Android's open-source ethos fosters a vibrant ecosystem of customization and innovation, while iOS boasts a seamless user experience and robust security infrastructure. Meanwhile, discontinued platforms like Symbian and Palm OS evoke nostalgia for their pioneering roles in the smartphone revolution.
The journey concludes with a reflection on the ever-evolving landscape of OS, underscored by the emergence of real-time operating systems (RTOS) and the persistent quest for innovation and efficiency. As technology continues to shape our world, understanding the foundations and evolution of operating systems remains paramount. Join Pravash Chandra Das on this illuminating journey through the heart of computing. 🌟
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfflufftailshop
When it comes to unit testing in the .NET ecosystem, developers have a wide range of options available. Among the most popular choices are NUnit, XUnit, and MSTest. These unit testing frameworks provide essential tools and features to help ensure the quality and reliability of code. However, understanding the differences between these frameworks is crucial for selecting the most suitable one for your projects.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
2024 State of Marketing Report – by HubspotMarius Sescu
https://www.hubspot.com/state-of-marketing
· Scaling relationships and proving ROI
· Social media is the place for search, sales, and service
· Authentic influencer partnerships fuel brand growth
· The strongest connections happen via call, click, chat, and camera.
· Time saved with AI leads to more creative work
· Seeking: A single source of truth
· TLDR; Get on social, try AI, and align your systems.
· More human marketing, powered by robots
ChatGPT is a revolutionary addition to the world since its introduction in 2022. A big shift in the sector of information gathering and processing happened because of this chatbot. What is the story of ChatGPT? How is the bot responding to prompts and generating contents? Swipe through these slides prepared by Expeed Software, a web development company regarding the development and technical intricacies of ChatGPT!
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on automated letter generation for Bonterra Impact Management using Google Workspace or Microsoft 365.
Interested in deploying letter generation automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Dive into the realm of operating systems (OS) with Pravash Chandra Das, a seasoned Digital Forensic Analyst, as your guide. 🚀 This comprehensive presentation illuminates the core concepts, types, and evolution of OS, essential for understanding modern computing landscapes.
Beginning with the foundational definition, Das clarifies the pivotal role of OS as system software orchestrating hardware resources, software applications, and user interactions. Through succinct descriptions, he delineates the diverse types of OS, from single-user, single-task environments like early MS-DOS iterations, to multi-user, multi-tasking systems exemplified by modern Linux distributions.
Crucial components like the kernel and shell are dissected, highlighting their indispensable functions in resource management and user interface interaction. Das elucidates how the kernel acts as the central nervous system, orchestrating process scheduling, memory allocation, and device management. Meanwhile, the shell serves as the gateway for user commands, bridging the gap between human input and machine execution. 💻
The narrative then shifts to a captivating exploration of prominent desktop OSs, Windows, macOS, and Linux. Windows, with its globally ubiquitous presence and user-friendly interface, emerges as a cornerstone in personal computing history. macOS, lauded for its sleek design and seamless integration with Apple's ecosystem, stands as a beacon of stability and creativity. Linux, an open-source marvel, offers unparalleled flexibility and security, revolutionizing the computing landscape. 🖥️
Moving to the realm of mobile devices, Das unravels the dominance of Android and iOS. Android's open-source ethos fosters a vibrant ecosystem of customization and innovation, while iOS boasts a seamless user experience and robust security infrastructure. Meanwhile, discontinued platforms like Symbian and Palm OS evoke nostalgia for their pioneering roles in the smartphone revolution.
The journey concludes with a reflection on the ever-evolving landscape of OS, underscored by the emergence of real-time operating systems (RTOS) and the persistent quest for innovation and efficiency. As technology continues to shape our world, understanding the foundations and evolution of operating systems remains paramount. Join Pravash Chandra Das on this illuminating journey through the heart of computing. 🌟
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfflufftailshop
When it comes to unit testing in the .NET ecosystem, developers have a wide range of options available. Among the most popular choices are NUnit, XUnit, and MSTest. These unit testing frameworks provide essential tools and features to help ensure the quality and reliability of code. However, understanding the differences between these frameworks is crucial for selecting the most suitable one for your projects.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
2024 State of Marketing Report – by HubspotMarius Sescu
https://www.hubspot.com/state-of-marketing
· Scaling relationships and proving ROI
· Social media is the place for search, sales, and service
· Authentic influencer partnerships fuel brand growth
· The strongest connections happen via call, click, chat, and camera.
· Time saved with AI leads to more creative work
· Seeking: A single source of truth
· TLDR; Get on social, try AI, and align your systems.
· More human marketing, powered by robots
ChatGPT is a revolutionary addition to the world since its introduction in 2022. A big shift in the sector of information gathering and processing happened because of this chatbot. What is the story of ChatGPT? How is the bot responding to prompts and generating contents? Swipe through these slides prepared by Expeed Software, a web development company regarding the development and technical intricacies of ChatGPT!
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
The realm of product design is a constantly changing environment where technology and style intersect. Every year introduces fresh challenges and exciting trends that mold the future of this captivating art form. In this piece, we delve into the significant trends set to influence the look and functionality of product design in the year 2024.
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
Mental health has been in the news quite a bit lately. Dozens of U.S. states are currently suing Meta for contributing to the youth mental health crisis by inserting addictive features into their products, while the U.S. Surgeon General is touring the nation to bring awareness to the growing epidemic of loneliness and isolation. The country has endured periods of low national morale, such as in the 1970s when high inflation and the energy crisis worsened public sentiment following the Vietnam War. The current mood, however, feels different. Gallup recently reported that national mental health is at an all-time low, with few bright spots to lift spirits.
To better understand how Americans are feeling and their attitudes towards mental health in general, ThinkNow conducted a nationally representative quantitative survey of 1,500 respondents and found some interesting differences among ethnic, age and gender groups.
Technology
For example, 52% agree that technology and social media have a negative impact on mental health, but when broken out by race, 61% of Whites felt technology had a negative effect, and only 48% of Hispanics thought it did.
While technology has helped us keep in touch with friends and family in faraway places, it appears to have degraded our ability to connect in person. Staying connected online is a double-edged sword since the same news feed that brings us pictures of the grandkids and fluffy kittens also feeds us news about the wars in Israel and Ukraine, the dysfunction in Washington, the latest mass shooting and the climate crisis.
Hispanics may have a built-in defense against the isolation technology breeds, owing to their large, multigenerational households, strong social support systems, and tendency to use social media to stay connected with relatives abroad.
Age and Gender
When asked how individuals rate their mental health, men rate it higher than women by 11 percentage points, and Baby Boomers rank it highest at 83%, saying it’s good or excellent vs. 57% of Gen Z saying the same.
Gen Z spends the most amount of time on social media, so the notion that social media negatively affects mental health appears to be correlated. Unfortunately, Gen Z is also the generation that’s least comfortable discussing mental health concerns with healthcare professionals. Only 40% of them state they’re comfortable discussing their issues with a professional compared to 60% of Millennials and 65% of Boomers.
Race Affects Attitudes
As seen in previous research conducted by ThinkNow, Asian Americans lag other groups when it comes to awareness of mental health issues. Twenty-four percent of Asian Americans believe that having a mental health issue is a sign of weakness compared to the 16% average for all groups. Asians are also considerably less likely to be aware of mental health services in their communities (42% vs. 55%) and most likely to seek out information on social media (51% vs. 35%).
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
Creative operations teams expect increased AI use in 2024. Currently, over half of tasks are not AI-enabled, but this is expected to decrease in the coming year. ChatGPT is the most popular AI tool currently. Business leaders are more actively exploring AI benefits than individual contributors. Most respondents do not believe AI will impact workforce size in 2024. However, some inhibitions still exist around AI accuracy and lack of understanding. Creatives primarily want to use AI to save time on mundane tasks and boost productivity.
Organizational culture includes values, norms, systems, symbols, language, assumptions, beliefs, and habits that influence employee behaviors and how people interpret those behaviors. It is important because culture can help or hinder a company's success. Some key aspects of Netflix's culture that help it achieve results include hiring smartly so every position has stars, focusing on attitude over just aptitude, and having a strict policy against peacocks, whiners, and jerks.
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
PepsiCo provided a safe harbor statement noting that any forward-looking statements are based on currently available information and are subject to risks and uncertainties. It also provided information on non-GAAP measures and directing readers to its website for disclosure and reconciliation. The document then discussed PepsiCo's business overview, including that it is a global beverage and convenient food company with iconic brands, $91 billion in net revenue in 2023, and nearly $14 billion in core operating profit. It operates through a divisional structure with a focus on local consumers.
Content Methodology: A Best Practices Report (Webinar)contently
This document provides an overview of content methodology best practices. It defines content methodology as establishing objectives, KPIs, and a culture of continuous learning and iteration. An effective methodology focuses on connecting with audiences, creating optimal content, and optimizing processes. It also discusses why a methodology is needed due to the competitive landscape, proliferation of channels, and opportunities for improvement. Components of an effective methodology include defining objectives and KPIs, audience analysis, identifying opportunities, and evaluating resources. The document concludes with recommendations around creating a content plan, testing and optimizing content over 90 days.
How to Prepare For a Successful Job Search for 2024Albert Qian
The document provides guidance on preparing a job search for 2024. It discusses the state of the job market, focusing on growth in AI and healthcare but also continued layoffs. It recommends figuring out what you want to do by researching interests and skills, then conducting informational interviews. The job search should involve building a personal brand on LinkedIn, actively applying to jobs, tailoring resumes and interviews, maintaining job hunting as a habit, and continuing self-improvement. Once hired, the document advises setting new goals and keeping skills and networking active in case of future opportunities.
A report by thenetworkone and Kurio.
The contributing experts and agencies are (in an alphabetical order): Sylwia Rytel, Social Media Supervisor, 180heartbeats + JUNG v MATT (PL), Sharlene Jenner, Vice President - Director of Engagement Strategy, Abelson Taylor (USA), Alex Casanovas, Digital Director, Atrevia (ES), Dora Beilin, Senior Social Strategist, Barrett Hoffher (USA), Min Seo, Campaign Director, Brand New Agency (KR), Deshé M. Gully, Associate Strategist, Day One Agency (USA), Francesca Trevisan, Strategist, Different (IT), Trevor Crossman, CX and Digital Transformation Director; Olivia Hussey, Strategic Planner; Simi Srinarula, Social Media Manager, The Hallway (AUS), James Hebbert, Managing Director, Hylink (CN / UK), Mundy Álvarez, Planning Director; Pedro Rojas, Social Media Manager; Pancho González, CCO, Inbrax (CH), Oana Oprea, Head of Digital Planning, Jam Session Agency (RO), Amy Bottrill, Social Account Director, Launch (UK), Gaby Arriaga, Founder, Leonardo1452 (MX), Shantesh S Row, Creative Director, Liwa (UAE), Rajesh Mehta, Chief Strategy Officer; Dhruv Gaur, Digital Planning Lead; Leonie Mergulhao, Account Supervisor - Social Media & PR, Medulla (IN), Aurelija Plioplytė, Head of Digital & Social, Not Perfect (LI), Daiana Khaidargaliyeva, Account Manager, Osaka Labs (UK / USA), Stefanie Söhnchen, Vice President Digital, PIABO Communications (DE), Elisabeth Winiartati, Managing Consultant, Head of Global Integrated Communications; Lydia Aprina, Account Manager, Integrated Marketing and Communications; Nita Prabowo, Account Manager, Integrated Marketing and Communications; Okhi, Web Developer, PNTR Group (ID), Kei Obusan, Insights Director; Daffi Ranandi, Insights Manager, Radarr (SG), Gautam Reghunath, Co-founder & CEO, Talented (IN), Donagh Humphreys, Head of Social and Digital Innovation, THINKHOUSE (IRE), Sarah Yim, Strategy Director, Zulu Alpha Kilo (CA).
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
The search marketing landscape is evolving rapidly with new technologies, and professionals, like you, rely on innovative paid search strategies to meet changing demands.
It’s important that you’re ready to implement new strategies in 2024.
Check this out and learn the top trends in paid search advertising that are expected to gain traction, so you can drive higher ROI more efficiently in 2024.
You’ll learn:
- The latest trends in AI and automation, and what this means for an evolving paid search ecosystem.
- New developments in privacy and data regulation.
- Emerging ad formats that are expected to make an impact next year.
Watch Sreekant Lanka from iQuanti and Irina Klein from OneMain Financial as they dive into the future of paid search and explore the trends, strategies, and technologies that will shape the search marketing landscape.
If you’re looking to assess your paid search strategy and design an industry-aligned plan for 2024, then this webinar is for you.
5 Public speaking tips from TED - Visualized summarySpeakerHub
From their humble beginnings in 1984, TED has grown into the world’s most powerful amplifier for speakers and thought-leaders to share their ideas. They have over 2,400 filmed talks (not including the 30,000+ TEDx videos) freely available online, and have hosted over 17,500 events around the world.
With over one billion views in a year, it’s no wonder that so many speakers are looking to TED for ideas on how to share their message more effectively.
The article “5 Public-Speaking Tips TED Gives Its Speakers”, by Carmine Gallo for Forbes, gives speakers five practical ways to connect with their audience, and effectively share their ideas on stage.
Whether you are gearing up to get on a TED stage yourself, or just want to master the skills that so many of their speakers possess, these tips and quotes from Chris Anderson, the TED Talks Curator, will encourage you to make the most impactful impression on your audience.
See the full article and more summaries like this on SpeakerHub here: https://speakerhub.com/blog/5-presentation-tips-ted-gives-its-speakers
See the original article on Forbes here:
http://www.forbes.com/forbes/welcome/?toURL=http://www.forbes.com/sites/carminegallo/2016/05/06/5-public-speaking-tips-ted-gives-its-speakers/&refURL=&referrer=#5c07a8221d9b
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
Everyone is in agreement that ChatGPT (and other generative AI tools) will shape the future of work. Yet there is little consensus on exactly how, when, and to what extent this technology will change our world.
Businesses that extract maximum value from ChatGPT will use it as a collaborative tool for everything from brainstorming to technical maintenance.
For individuals, now is the time to pinpoint the skills the future professional will need to thrive in the AI age.
Check out this presentation to understand what ChatGPT is, how it will shape the future of work, and how you can prepare to take advantage.
The document provides career advice for getting into the tech field, including:
- Doing projects and internships in college to build a portfolio.
- Learning about different roles and technologies through industry research.
- Contributing to open source projects to build experience and network.
- Developing a personal brand through a website and social media presence.
- Networking through events, communities, and finding a mentor.
- Practicing interviews through mock interviews and whiteboarding coding questions.
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
1. Core updates from Google periodically change how its algorithms assess and rank websites and pages. This can impact rankings through shifts in user intent, site quality issues being caught up to, world events influencing queries, and overhauls to search like the E-A-T framework.
2. There are many possible user intents beyond just transactional, navigational and informational. Identifying intent shifts is important during core updates. Sites may need to optimize for new intents through different content types and sections.
3. Responding effectively to core updates requires analyzing "before and after" data to understand changes, identifying new intents or page types, and ensuring content matches appropriate intents across video, images, knowledge graphs and more.
A brief introduction to DataScience with explaining of the concepts, algorithms, machine learning, supervised and unsupervised learning, clustering, statistics, data preprocessing, real-world applications etc.
It's part of a Data Science Corner Campaign where I will be discussing the fundamentals of DataScience, AIML, Statistics etc.
Time Management & Productivity - Best PracticesVit Horky
Here's my presentation on by proven best practices how to manage your work time effectively and how to improve your productivity. It includes practical tips and how to use tools such as Slack, Google Apps, Hubspot, Google Calendar, Gmail and others.
The six step guide to practical project managementMindGenius
The six step guide to practical project management
If you think managing projects is too difficult, think again.
We’ve stripped back project management processes to the
basics – to make it quicker and easier, without sacrificing
the vital ingredients for success.
“If you’re looking for some real-world guidance, then The Six Step Guide to Practical Project Management will help.”
Dr Andrew Makar, Tactical Project Management
2. 1 Introduction / who is this lady?
2 Text Search / inverted indexes
a Performance
b Quality
3 Relevance / quality part 2: ranking the results
4 Open Source Tools / free search engines!
5 Conclusion / bye
Agenda
2
3. Who is this lady?
3
Bachelor of
Computer
Science 2010
Toria Gibbs
@scarletdrive
17. 17
Assume we have a database...
@scarletdrive
id title price
1 red cat mittens 14.99
2 blue dog mittens 24.99
3 blue hat for cats 8.00
4 vacation hat for dog 12.99
5 cat hat 5.00
6 red and blue dog hat 10.49
7 kitten mittens 11.99
8 dog booties 11.99
18. 18
cat
id title price
1 red cat mittens 14.99
2 blue dog mittens 24.99
3 blue hat for cats 8.00
4 vacation hat for dog 12.99
5 cat hat 5.00
6 red and blue dog hat 10.49
7 kitten mittens 11.99
8 dog booties 11.99
SELECT *
FROM items
WHERE title LIKE ‘%cat%’
@scarletdrive
19. 19
cat
id title price
1 red cat mittens 14.99
2 blue dog mittens 24.99
3 blue hat for cats 8.00
4 vacation hat for dog 12.99
5 cat hat 5.00
6 red and blue dog hat 10.49
7 kitten mittens 11.99
8 dog booties 11.99
SELECT *
FROM items
WHERE title LIKE ‘%cat%’
@scarletdrive
24. 24
Let’s make it faster
@scarletdrive
id title price
1 red cat mittens 14.99
2 blue dog mittens 24.99
3 blue hat for cats 8.00
4 vacation hat for dog 12.99
5 cat hat 5.00
6 red and blue dog hat 10.49
7 kitten mittens 11.99
8 dog booties 11.99
We can look up an item by its
ID in constant time.
25. 25
Let’s make it faster
@scarletdrive
id title price
1 red cat mittens 14.99
2 blue dog mittens 24.99
3 blue hat for cats 8.00
4 vacation hat for dog 12.99
5 cat hat 5.00
6 red and blue dog hat 10.49
7 kitten mittens 11.99
8 dog booties 11.99
term ids
red [1, 6]
cat [1, 3, 5]
mitten [1, 2, 7]
blue [2, 3, 6]
hat [3, 4, 5, 6]
dog [2, 4, 6, 8]
vacation [4]
kitten [7]
boot [8]
26. 26
Inverted Index
@scarletdrive
term ids
red [1, 6]
cat [1, 3, 5]
mitten [1, 2, 7]
blue [2, 3, 6]
hat [3, 4, 5, 6]
dog [2, 4, 6, 8]
vacation [4]
kitten [7]
boot [8]
Map from words to
sets of IDs of records
which contained
those words
27. 27
cat
@scarletdrive
term ids
red [1, 6]
cat [1, 3, 5]
mitten [1, 2, 7]
blue [2, 3, 6]
hat [3, 4, 5, 6]
dog [2, 4, 6, 8]
vacation [4]
kitten [7]
boot [8]
30. 30
cat
@scarletdrive
term ids
red [1, 6]
cat [1, 3, 5]
mitten [1, 2, 7]
blue [2, 3, 6]
hat [3, 4, 5, 6]
dog [2, 4, 6, 8]
vacation [4]
kitten [7]
boot [8]
id title price
1 red cat mittens 14.99
3 blue hat for cats 8.00
5 cat hat 5.00
35. id title price
1 red cat mittens 14.99
2 blue dog mittens 24.99
3 blue hat for cats 8.00
4 vacation hat for dog 12.99
5 cat hat 5.00
6 red and blue dog hat 10.49
7 kitten mittens 11.99
8 dog booties 11.99
@scarletdrive
SELECT *
FROM items
WHERE title LIKE ‘%cat%’
36. id title price
1 red cat mittens 14.99
2 blue dog mittens 24.99
3 blue hat for cats 8.00
4 vacation hat for dog 12.99
5 cat hat 5.00
6 red and blue dog hat 10.49
7 kitten mittens 11.99
8 dog booties 11.99
@scarletdrive
SELECT *
FROM items
WHERE title LIKE ‘%cat%’
● Search for “cat” incorrectly
returns “vacation hat for dog”
37. id title price
1 red cat mittens 14.99
2 blue dog mittens 24.99
3 blue hat for cats 8.00
4 vacation hat for dog 12.99
5 cat hat 5.00
6 red and blue dog hat 10.49
7 kitten mittens 11.99
8 dog booties 11.99
● Search for “cat” incorrectly
returns “vacation hat for dog”
● Search for “cat” doesn’t return
“kitten mittens”
@scarletdrive
SELECT *
FROM items
WHERE title LIKE ‘%cat%’
38. id title price
1 red cat mittens 14.99
2 blue dog mittens 24.99
3 blue hat for cats 8.00
4 vacation hat for dog 12.99
5 cat hat 5.00
6 red and blue dog hat 10.49
7 kitten mittens 11.99
8 dog booties 11.99
● Search for “cat” incorrectly
returns “vacation hat for dog”
● Search for “cat” doesn’t return
“kitten mittens”
● Search for “cats” doesn’t return
“cat hat” or “red cat mittens”
@scarletdrive
SELECT *
FROM items
WHERE title LIKE ‘%cats%’
39. SELECT * FROM items
WHERE title LIKE ‘cat’ OR title LIKE ‘cats’
OR title LIKE ‘cat %’ OR title LIKE ‘cats %’
OR title LIKE ‘% cat’ OR title LIKE ‘% cats’
OR title LIKE ‘% cat %’ OR title LIKE ‘% cats %’
OR title LIKE ‘% cat.%’ OR title LIKE ‘% cats.%’
OR title LIKE ‘%.cat %’ OR title LIKE ‘%.cats %’
OR title LIKE ‘%.cat.%’ OR title LIKE ‘%.cats.%’
OR title LIKE ‘% cat,%’ OR title LIKE ‘% cats,%’
OR title LIKE ‘%,cat %’ OR title LIKE ‘%,cats %’
OR title LIKE ‘%,cat,%’ OR title LIKE ‘%,cats,%’
OR title LIKE ‘% cat-%’ OR title LIKE ‘% cats-%’
OR title LIKE ‘%-cat %’ OR title LIKE ‘%-cats %’
OR title LIKE ‘%-cat-%’ OR title LIKE ‘%-cats-%’
...
@scarletdrive
40. 40
How did we do this?
@scarletdrive
id title price
1 red cat mittens 14.99
2 blue dog mittens 24.99
3 blue hat for cats 8.00
4 vacation hat for dog 12.99
5 cat hat 5.00
6 red and blue dog hat 10.49
7 kitten mittens 11.99
8 dog booties 11.99
term ids
red [1, 6]
cat [1, 3, 5]
mitten [1, 2, 7]
blue [2, 3, 6]
hat [3, 4, 5, 6]
dog [2, 4, 6, 8]
vacation [4]
kitten [7]
boot [8]
47. 47
id title price
4 vacation hat for dog 12.99
5 cat hat 5.00
@scarletdrive
[“vacation”, “hat”, “for”, “dog”]
[“cat”, “hat”]
[“vacation”, “hat”, “dog”]
[“cat”, “hat”]
term ids
cat [5]
hat [4, 5]
dog [4]
vacation [4]
Tokenize it
Remove stop words
48. 48 @scarletdrive
term ids
cat [5]
hat [4, 5]
dog [4]
vacation [4]
cat
id title price
4 vacation hat for dog 12.99
5 cat hat 5.00
● Search for “cat” does not return “vacation hat for dog” due to tokenization
50. 50
id title price
3 blue hat for cats 12.99
5 cat hat 5.00
@scarletdrive
[“blue”, “hat”, “for”, “cats”]
[“cat”, “hat”]
[“blue”, “hat”, “cat”]
[“cat”, “hat”]
term ids
blue [3]
cat [3, 5]
hat [3, 5]
Tokenize it
Remove stop words
Stem it
[“blue”, “hat”, “for”, “cat”]
[“cat”, “hat”]
51. 51
id title price
3 blue hat for cats 12.99
5 cat hat 5.00
@scarletdrive
term ids
blue [3]
cat [3, 5]
hat [3, 5]
cats
???
53. 53
id title price
3 blue hat for cats 12.99
5 cat hat 5.00
@scarletdrive
term ids
blue [3]
cat [3, 5]
hat [3, 5]
cats
Stem it
cat
● Search for “cats” does return
“cat hat” due to stemming
54. @scarletdrive
Quality Problems
1. “cat” search returned “vacation hat for dog” ✓
2. “cats” search does not return “cat hat” ✓
3. “cat” search does not return “kitten mittens”
55. 55
id title price
5 cat hat 5.00
7 kitten mittens 11.99
@scarletdrive
[“cat”, “hat”]
[“kitten”, “mittens”]
[“cat”, “hat”]
[“cat”, “mitten”]
term ids
cat [5, 7]
hat [5]
mitten [7]
Tokenize it
Swap synonymsStem it
[“cat”, “hat”]
[“kitten”, “mitten”]
56. 56 @scarletdrive
cat
id title price
5 cat hat 5.00
7 kitten mittens 11.99
term ids
cat [5, 7]
hat [5]
mitten [7]
● Search for “cat” returns all
items with “cat” or “kitten” due
to synonyms
57. id title price
5 cat hat 5.00
7 kitten mittens 11.99
term ids
cat [5, 7]
hat [5]
mitten [7]
57 @scarletdrive
kitten
Swap synonym
cat
● Search for “kitten” returns all
items with “cat” or “kitten” due
to synonyms
58. @scarletdrive
Quality Problems
1. “cat” search returned “vacation hat for dog” ✓
2. “cats” search does not return “cat hat” ✓
3. “cat” search does not return “kitten mittens” ✓
59. Search engines provide faster and
better quality results than a
database for text search
@scarletdrive
60. 60
🙋🏽
How much disk space
will my new field use?
👩🏻🎓
I learned things!
I can help!
@scarletdrive
61. 61
��🏽
@scarletdrive
It’s a string field, but it’s only going
to be 100 characters long, max.
��
Can you tell me anything about the
characteristics of these strings?
62. 62 @scarletdrive
id title
1 red cat mittens
2 blue dog mittens
3 blue hat for cats
4 vacation hat for dog
5 cat hat
6 red and blue dog hat
7 kitten mittens
8 dog booties
term ids
red [1, 6]
cat [1, 3, 5, 7]
mitten [1, 2, 7]
blue [2, 3, 6]
hat [3, 4, 5, 6]
dog [2, 4, 6, 8]
vacation [4]
boot [8]
8 rows 8 rows
63. 63 @scarletdrive
id text
1 good dog
2 bad dog
3 good dog
4 bad dog
5 good dog
6 bad dog
7 good dog
8 bad dog
...
100 bad dog
term ids
good [1, 3, 5, 7, 9,
11, 13, 15, 17,
19, … 99]
bad [2, 4, 6, 8, 10,
12, 14, 16, 18,
20, … 100]
dog [1, 2, 3, 4, 5, 6,
7, 8, 9, 10, 11,
12, … 99, 100]
100 rows
3 rows
64. 64
��🏽
@scarletdrive
Why yes, they are categories
which are static and well-defined.
��
AWESOME.
categories
Pet Accessories
Pet Beds
term ids
pet ?
accessory ?
bed ?
67. 1 Introduction / who is this lady?
2 Text Search / inverted indexes
a Performance
b Quality
3 Relevance / quality part 2: ranking the results
4 Open Source Tools / free search engines!
5 Conclusion / bye
Agenda
67
69. @scarletdrive
id title price
1 red cat mittens 14.99
3 blue hat for cats 8.00
5 cat hat 5.00
22 feather cat toy 7.99
124 cat and mouse t-shirt 24.50
128 cat t-shirt 31.80
329 “cats rule” sticker 0.99
420 catnip joint for cats 5.99
455 cat toy 7.00
... ... ...
When there are
many results, what
order should we
display them in?
69
Relevance
71. @scarletdrive
TF(term) = # times this term appears in doc / total # terms in doc
IDF(term) = loge
(total number of docs / # docs which contain this term)
Relevance with tf-idf
1. The orange cat is a very good cat.
2. My cat ate an orange.
3. Cats are the best and I will give
every cat a special cat toy.
1. TF(cat) = 2/8 = 0.25
2. TF(cat) = 1/5 = 0.20
3. TF(cat) = 3/14 = 0.21
IDF(cat) = loge
(3/3)
Result order = [1, 3, 2]Query: cat
71
72. @scarletdrive
TF(term) = # times this term appears in doc / total # terms in doc
IDF(term) = loge
(total number of docs / # docs which contain this term)
Relevance with tf-idf
1. The orange cat is a very good cat.
2. My cat ate an orange. Cat cat cat!
3. Cats are the best and I will give
every cat a special cat toy.
1. TF(cat) = 2/8 = 0.25
2. TF(cat) = 4/8 = 0.50
3. TF(cat) = 3/14 = 0.21
IDF(cat) = loge
(3/3)
Result order = [2, 1, 3]Query: cat
72
73. @scarletdrive
TF(term) = # times this term appears in doc / total # terms in doc
IDF(term) = loge
(total number of docs / # docs which contain this term)
Relevance with tf-idf
1. The orange cat is a good cat.
2. My cat ate an orange.
(assume 100 records which all contain
“cat” in them)
Query: orange cat
73
74. @scarletdrive
TF(term) = # times this term appears in doc / total # terms in doc
IDF(term) = loge
(total number of docs / # docs which contain this term)
Relevance with tf-idf
1. The orange cat is a good cat.
2. My cat ate an orange.
Query: orange cat
IDF(cat) = loge
(100/100) = 0.0
IDF(orange) = loge
(100/2) = 3.9
score = score(cat, doc1) + s(orange, doc1)
score = score(cat, doc2) + s(orange, doc2)
74
75. @scarletdrive
TF(term) = # times this term appears in doc / total # terms in doc
IDF(term) = loge
(total number of docs / # docs which contain this term)
Relevance with tf-idf
1. The orange cat is a good cat.
2. My cat ate an orange.
Query: orange cat
IDF(cat) = loge
(100/100) = 0.0
IDF(orange) = loge
(100/2) = 3.9
score = score(cat, doc1) + s(orange, doc1) = 0.29*0.0 + 0.14*3.9 = 0.55
score = score(cat, doc2) + s(orange, doc2) = 0.20*0.0 + 0.20*3.9 = 0.78
75
76. @scarletdrive
TF(term) = # times this term appears in doc / total # terms in doc
IDF(term) = loge
(total number of docs / # docs which contain this term)
Relevance with tf-idf
1. The orange cat is a good cat.
2. My cat ate an orange.
Result order = [2, 1]Query: orange cat
IDF(cat) = loge
(100/100) = 0.0
IDF(orange) = loge
(100/2) = 3.9
score = score(cat, doc1) + s(orange, doc1) = 0.29*0.0 + 0.14*3.9 = 0.55
score = score(cat, doc2) + s(orange, doc2) = 0.20*0.0 + 0.20*3.9 = 0.78
1/7 = 0.14
1/5 = 0.20
76
77. 77
Better Relevance
● Phrase matching
● Fuzzy matching, spelling correction
● User factors: location, language
● Other factors: quality, recency, randomness
78. bm25
is the cool new thing
RIP tf-idf
@scarletdrive
https://elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables
83. Which one should I pick?
@scarletdrive
● Most projects work well with either
● Getting configuration right is most important
● Test with your own data, your own queries
Side by Side with Elasticsearch and Solr by Rafał Kuć and Radu Gheorghe
https://berlinbuzzwords.de/14/session/side-side-elasticsearch-and-solr
https://berlinbuzzwords.de/15/session/side-side-elasticsearch-solr-part-2-performance-scalability
Solr vs. Elasticsearch by Kelvin Tan
http://solr-vs-elasticsearch.com/
84. Which one should I pick?
Better for advanced
customization
Easier to learn, faster to
start up, better docs
~ ~ WARNING: Toria’s personal opinion ~ ~
@scarletdrive
86. 86
Recap
● Inverted index for text search
○ Faster than a database
○ Better quality than a database
● Ranking for relevance with tf-idf (or bm25)
● Solr and Elasticsearch are great open source solutions
@scarletdrive