Spark and Online Analytics: Spark Summit East talky by Shubham ChopraSpark Summit
Apache Spark was designed as a batch analytics system. By caching RDDs, Spark speeds up jobs that iteratively process the same data. This pattern is also applicable to online analytics. We use Bloomberg’s Spark Server as a server runtime for online analytics. Our framework implements certain useful patterns applicable to online query processing and is centered on the idea of “Managed” DataFrames that can be refreshed and updated as per user requirements, without violating the immutability of RDDs/DataFrames. However, Spark presents significant challenges with respect to availability and resilience in an online setting where Spark is required to respond to queries with high SLAs. In this talk, we try to identify specific areas where slow-down or failures can result in the largest hits on online-query performance and potential solutions to address these.
Building a real time Tweet map with Flink in six weeksMatthias Kricke
In this talk we present OSTMap, a tool which was build by 6 students over the course of 6 weeks. Each student only has to do as little as 5-10h per week and no experience with BigData or the used frameworks. We also present the concept of geotemporal indicies for our use-case.
Consolidating MLOps at One of Europe’s Biggest AirportsDatabricks
At Schiphol airport we run a lot of mission critical machine learning models in production, ranging from models that predict passenger flow to computer vision models that analyze what is happening around the aircraft. Especially now in times of Covid it is paramount for us to be able to quickly iterate on these models by implementing new features, retraining them to match the new dynamics and above all to monitor them actively to see if they still fit the current state of affairs.
To achieve those needs we rely on MLFlow but have also integrated that with many of our other systems. So have we written Airflow operators for MLFlow to ease the retraining of our models, have we integrated MLFlow deeply with our CI pipelines and have we integrated it with our model monitoring tooling.
In this talk we will take you through the way we rely on MLFlow and how that enables us to release (sometimes) multiple versions of a model per week in a controlled fashion. With this set-up we are achieving the same benefits and speed as you have with a traditional software CI pipeline.
Zoltán Zvara - Advanced visualization of Flink and Spark jobs Flink Forward
This document discusses developing an advanced visualization tool for Flink and Spark jobs that provides insight into data characteristics and the physical execution plan. It aims to help developers detect issues, understand distributed systems, and guide testing of adaptive partitioning techniques. The tool enhances existing metrics and APIs to visualize input/output patterns and physical tasks/subtasks. Future plans include public beta release and integrating dynamic repartitioning to mitigate data skew.
Just because you can, doesn’t mean you should. But in this case, you definitely should! Learn how this one weird trick (Jinja templating) will supercharge your analytics workflows and help you do more, better, faster with SQL.
A unified analytics platform with Kafka and Flink | Stephan Ewen, VervericaHostedbyConfluent
Apache Kafka and Apache Flink together are a winning stack for data analytics that is used by many companies across industries.
The two projects complement each other perfectly: Kafka offers a world-class log for event stream storage and transport, while Flink is a powerful system for analytics and applications on top of those event streams.
This talk will demonstrate how to use Kafka and Flink together for "unified analytics": Analytics that seamlessly combine processing of real-time data and historic data.
Using SQL as the language for our sample applications, we will walk though various scenarios for unified analytics, such as
- Running the same query for processing real-time data from Kafka and for batch-accelerated processing of the historic data stored in Kafka.
- Writing queries that combine data in Kafka with tables in external systems (like S3)
- Switching between streams of historic data (from S3) and real-time streams in Kafka.
The audience will learn how combining real-time and historic data is becoming convenient with the combination of Kafka and Flink.
Kafka for Real-Time Event Processing in Serverless Environmentsconfluent
(Jeff Sharpe + Alex Srisuwan, Capital One) Kafka Summit SF 2018
Using Kafka as a platform messaging bus is common, but bridging communication between real-time and asynchronous components can become complicated, especially when dealing with serverless environments. This has become increasingly common in modern banking where events need to be processed at near-real-time speed. Serverless environments are well-suited to address these needs, and Kafka remains an excellent solution for providing the reliable, resilient communication layer between serverless components and dedicated stream processing services.
In this talk, we will examine some of the strengths and weaknesses of using Kafka for real-time communication, some tips for efficient interactions with Kafka and AWS Lambda, and a number of useful patterns for maximizing the strengths of Kafka and serverless components.
As many industries, banking is undergoing a fundamental change because of the software revolution. No longer are banks competing only on interest rates and having the best traders, these days customer experience and having the best engineers are the focus. In this changing world, banks compete with new start-ups, the so-called Fintechs, and with large platform organisations such as Google, Facebook and Apple. At ING, we believe that staying ahead of the game means changing how we interact with our customers, no longer a traditional model of waiting for the customers to come to the bank through our website or apps, but to actively reach out to the customer with information that is relevant to him or her in order to make their financial life frictionless. Many of these changes are driven by reacting to all events that are relevant to the customer, and using streaming analytics to be able to reach out to the customer in milliseconds after the event occurs. Apache Flink is key for ING to achieve this. This presentation addresses how ING approaches the challenge, the role that Apache Flink plays, and the consequences regulations have on how we work with Open Source in general, and with Apache Flink (and data Artisans) in particular. This keynote takes place at Kino 3.
Spark and Online Analytics: Spark Summit East talky by Shubham ChopraSpark Summit
Apache Spark was designed as a batch analytics system. By caching RDDs, Spark speeds up jobs that iteratively process the same data. This pattern is also applicable to online analytics. We use Bloomberg’s Spark Server as a server runtime for online analytics. Our framework implements certain useful patterns applicable to online query processing and is centered on the idea of “Managed” DataFrames that can be refreshed and updated as per user requirements, without violating the immutability of RDDs/DataFrames. However, Spark presents significant challenges with respect to availability and resilience in an online setting where Spark is required to respond to queries with high SLAs. In this talk, we try to identify specific areas where slow-down or failures can result in the largest hits on online-query performance and potential solutions to address these.
Building a real time Tweet map with Flink in six weeksMatthias Kricke
In this talk we present OSTMap, a tool which was build by 6 students over the course of 6 weeks. Each student only has to do as little as 5-10h per week and no experience with BigData or the used frameworks. We also present the concept of geotemporal indicies for our use-case.
Consolidating MLOps at One of Europe’s Biggest AirportsDatabricks
At Schiphol airport we run a lot of mission critical machine learning models in production, ranging from models that predict passenger flow to computer vision models that analyze what is happening around the aircraft. Especially now in times of Covid it is paramount for us to be able to quickly iterate on these models by implementing new features, retraining them to match the new dynamics and above all to monitor them actively to see if they still fit the current state of affairs.
To achieve those needs we rely on MLFlow but have also integrated that with many of our other systems. So have we written Airflow operators for MLFlow to ease the retraining of our models, have we integrated MLFlow deeply with our CI pipelines and have we integrated it with our model monitoring tooling.
In this talk we will take you through the way we rely on MLFlow and how that enables us to release (sometimes) multiple versions of a model per week in a controlled fashion. With this set-up we are achieving the same benefits and speed as you have with a traditional software CI pipeline.
Zoltán Zvara - Advanced visualization of Flink and Spark jobs Flink Forward
This document discusses developing an advanced visualization tool for Flink and Spark jobs that provides insight into data characteristics and the physical execution plan. It aims to help developers detect issues, understand distributed systems, and guide testing of adaptive partitioning techniques. The tool enhances existing metrics and APIs to visualize input/output patterns and physical tasks/subtasks. Future plans include public beta release and integrating dynamic repartitioning to mitigate data skew.
Just because you can, doesn’t mean you should. But in this case, you definitely should! Learn how this one weird trick (Jinja templating) will supercharge your analytics workflows and help you do more, better, faster with SQL.
A unified analytics platform with Kafka and Flink | Stephan Ewen, VervericaHostedbyConfluent
Apache Kafka and Apache Flink together are a winning stack for data analytics that is used by many companies across industries.
The two projects complement each other perfectly: Kafka offers a world-class log for event stream storage and transport, while Flink is a powerful system for analytics and applications on top of those event streams.
This talk will demonstrate how to use Kafka and Flink together for "unified analytics": Analytics that seamlessly combine processing of real-time data and historic data.
Using SQL as the language for our sample applications, we will walk though various scenarios for unified analytics, such as
- Running the same query for processing real-time data from Kafka and for batch-accelerated processing of the historic data stored in Kafka.
- Writing queries that combine data in Kafka with tables in external systems (like S3)
- Switching between streams of historic data (from S3) and real-time streams in Kafka.
The audience will learn how combining real-time and historic data is becoming convenient with the combination of Kafka and Flink.
Kafka for Real-Time Event Processing in Serverless Environmentsconfluent
(Jeff Sharpe + Alex Srisuwan, Capital One) Kafka Summit SF 2018
Using Kafka as a platform messaging bus is common, but bridging communication between real-time and asynchronous components can become complicated, especially when dealing with serverless environments. This has become increasingly common in modern banking where events need to be processed at near-real-time speed. Serverless environments are well-suited to address these needs, and Kafka remains an excellent solution for providing the reliable, resilient communication layer between serverless components and dedicated stream processing services.
In this talk, we will examine some of the strengths and weaknesses of using Kafka for real-time communication, some tips for efficient interactions with Kafka and AWS Lambda, and a number of useful patterns for maximizing the strengths of Kafka and serverless components.
As many industries, banking is undergoing a fundamental change because of the software revolution. No longer are banks competing only on interest rates and having the best traders, these days customer experience and having the best engineers are the focus. In this changing world, banks compete with new start-ups, the so-called Fintechs, and with large platform organisations such as Google, Facebook and Apple. At ING, we believe that staying ahead of the game means changing how we interact with our customers, no longer a traditional model of waiting for the customers to come to the bank through our website or apps, but to actively reach out to the customer with information that is relevant to him or her in order to make their financial life frictionless. Many of these changes are driven by reacting to all events that are relevant to the customer, and using streaming analytics to be able to reach out to the customer in milliseconds after the event occurs. Apache Flink is key for ING to achieve this. This presentation addresses how ING approaches the challenge, the role that Apache Flink plays, and the consequences regulations have on how we work with Open Source in general, and with Apache Flink (and data Artisans) in particular. This keynote takes place at Kino 3.
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...HostedbyConfluent
We often need to build applications that analyze Kafka data to unlock the most value from event streams, so how can organizations build these real-time analytics applications? In this talk, we examine an indexing approach that enables fast SQL analytics on data from Kafka, without data flattening or denormalization. Rockset is the real-time indexing database that builds an inverted index, a columnar index and a row index on all fields of your Kafka messages, including nested fields and arrays. This Converged Index accelerates various types of analytic queries–search, aggregations and joins–without the need to denormalize or transform data for performance reasons. With indexing delivering significant gains in query performance, we also need to index new data in a timely manner. We discuss several strategies used for efficient ingestion and indexing from Kafka, including rollups, write optimizations on the underlying RocksDB storage engine, and the disaggregation of ingest and query compute.
Telling the LivePerson Technology Story at Couchbase [SF] 2013LivePerson
As part of Couchbase[SF]2013, Ido Shilon, R&D Group Leader at LivePerson, discusses LivePerson's project to re-architect their LiveEngage platform backend, with a focus on LivePerson's decision to use NoSQL technologies, the challenges encountered, and the lessons learned.
Video: http://www.youtube.com/watch?v=rYKWFmJEHX0
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...HostedbyConfluent
Apache Kafka users who want to leverage Google Cloud Platform's (GCPs) data analytics platform and open source hosting capabilities can bridge their existing Kafka infrastructure on-premise or in other clouds to GCP using Confluent's replicator tool and managed Kafka service on GCP. Using actual customer examples and a reference architecture, we'll showcase how existing Kafka users can stream data to GCP and use it in popular tools like Apache Beam on Dataflow, BigQuery, Google Cloud Storage (GCS), Spark on Dataproc, and Tensorflow for data warehousing, data processing, data storage, and advanced analytics using AI and ML.
Kafka, Killer of Point-to-Point Integrations, Lucian Litaconfluent
With 60+ products and over 24% of the US GDP flowing through it, system integration is a tough problem for Intuit. Seasonality, scale, and massive peaks in products like TurboTax, QuickBooks, and Mint.com add extra layers of difficulty when building shared data services around transaction and user graphs, clickstream processing, a/b testing, and personalization. To reduce complexity and latency, we’ve implemented Kafka as the backbone across these data services. This allows us to asynchronously trigger relevant processing, elegantly scaling up and down as needed around peaks, all without the need for point-to-point integrations.
In this talk, we share what we’ve learned about Kafka at Intuit and describe our data services architecture. We found that Kafka is invaluable in achieving a scalable, clean architecture, allowing engineering teams to focus less on integration and more on product development.
Kafka and Stream Processing, Taking Analytics Real-time, Mike Spicerconfluent
Stream processing analyzes data in motion before it is stored, allowing for real-time analytics with low latency. Kafka is well-suited for stream processing due to its speed, scalability, durability, and ability to act as a universal hub. Real-time analytics can handle many use cases like customer intelligence, IoT, and security. Examples include a telco using stream processing for real-time advertising and Thompson Reuters using it for news ingestion and analytics. Stream processing can analyze data from the edge to the center in real-time to detect and predict insights and enable immediate actions.
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumarconfluent
Siphon is a highly available and reliable distributed pub/sub system built using Apache Kafka. It is used to publish, discover and subscribe to near real-time data streams for operational and product intelligence. Siphon is used as a “Databus” by a variety of producers and subscribers in Microsoft, and is compliant with security and privacy requirements. It has a built-in Auditing and Quality control. This session will provide an overview of the use of Kafka at Microsoft, and then deep dive into Siphon. We will describe an important business scenario and talk about the technical details of the system in the context of that scenario. We will also cover the design and implementation of the service, the scale, and real world production experiences from operating the service in the Microsoft cloud environment.
Taboola's data processing architecture has evolved over time from directly writing to databases to using Apache Spark for scalable real-time processing. Spark allows Taboola to process terabytes of data daily across multiple data centers for real-time recommendations, analytics, and algorithm calibration. Key aspects of Taboola's architecture include using Cassandra for event storage, Spark for distributed computing, Mesos for cluster management, and Zookeeper for coordination across a large Spark cluster.
At Netflix, we've spent a lot of time thinking about how we can make our analytics group move quickly. Netflix's Data Engineering & Analytics organization embraces the company's culture of "Freedom & Responsibility".
How does a company with a $40 billion market cap and $6 billion in annual revenue keep their data teams moving with the agility of a tiny company?
How do hundreds of data engineers and scientists make the best decisions for their projects independently, without the analytics environment devolving into chaos?
We'll talk about how Netflix equips its business intelligence and data engineers with:
the freedom to leverage cloud-based data tools - Spark, Presto, Redshift, Tableau and others - in ways that solve our most difficult data problems
the freedom to find and introduce right software for the job - even if it isn't used anywhere else in-house
the freedom to create and drop new tables in production without approval
the freedom to choose when a question is a one-off, and when a question is asked often enough to require a self-service tool
the freedom to retire analytics and data processes whose value doesn't justify their support costs
Speaker Bios
Monisha Kanoth is a Senior Data Architect at Netflix, and was one of the founding members of the current streaming Content Analytics team. She previously worked as a big data lead at Convertro (acquired by AOL) and as a data warehouse lead at MySpace.
Jason Flittner is a Senior Business Intelligence Engineer at Netflix, focusing on data transformation, analysis, and visualization as part of the Content Data Engineering & Analytics team. He previously led the EC2 Business Intelligence team at Amazon Web Services and was a business intelligence engineer with Cisco.
Chris Stephens is a Senior Data Engineer at Netflix. He previously served as the CTO at Deep 6 Analytics, a machine learning & content analytics company in Los Angeles, and on the data warehouse teams at the FOX Audience Network and Anheuser-Busch.
Kafka Summit NYC 2017 - Every Message Counts: Kafka as a Foundation for Highl...confluent
This document discusses Airbnb's use of Kafka as the foundation for its highly reliable logging system. It describes the types of data Airbnb collects, including product events, database exports, service events, and derived data. Airbnb uses a simple logging pipeline where events are delivered reliably through Kafka in real time. Key components of its production logging pipeline include Jitney for standardized messaging, a central schema repository, client SDKs, producer and consumer agents, and a self-service portal. The pipeline provides continuous integration through schema authoring, deployment, implementation, processing and storage, and monitoring. It handles large volumes of data reliably with 150 brokers processing 1 million messages per second and 10 billion events collected daily with very low
This document summarizes 5 papers related to big data architecture and deep learning. Paper 1 discusses the Lambda architecture for balancing real-time and batch data processing. Paper 2 introduces Delta Lake for efficient ACID-compliant storage over object stores. Paper 3 proposes the Lakehouse architecture which unifies data warehousing and analytics using Delta Lake. Paper 4 presents the Conformer model that combines transformers and convolutions for speech recognition. The last paper applies intent detection and slot filling to Vietnamese text using BERT. These papers are relevant to the author's graduation thesis on traffic prediction using speech data analysis.
Radical Speed for SQL Queries on Databricks: Photon Under the HoodDatabricks
Join this session to hear from the Photon product and engineering team talk about the latest developments with the project.
As organizations embrace data-driven decision-making, it has become imperative for them to invest in a platform that can quickly ingest and analyze massive amounts and types of data. With their data lakes, organizations can store all their data assets in cheap cloud object storage. But data lakes alone lack robust data management and governance capabilities. Fortunately, Delta Lake brings ACID transactions to your data lakes – making them more reliable while retaining the open access and low storage cost you are used to.
Using Delta Lake as its foundation, the Databricks Lakehouse platform delivers a simplified and performant experience with first-class support for all your workloads, including SQL, data engineering, data science & machine learning. With a broad set of enhancements in data access and filtering, query optimization and scheduling, as well as query execution, the Lakehouse achieves state-of-the-art performance to meet the increasing demands of data applications. In this session, we will dive into Photon, a key component responsible for efficient query execution.
Photon was first introduced at Spark and AI Summit 2020 and is written from the ground up in C++ to take advantage of modern hardware. It uses the latest techniques in vectorized query processing to capitalize on data- and instruction-level parallelism in CPUs, enhancing performance on real-world data and applications — all natively on your data lake. Photon is fully compatible with the Apache Spark™ DataFrame and SQL APIs to ensure workloads run seamlessly without code changes. Come join us to learn more about how Photon can radically speed up your queries on Databricks.
How a distributed graph analytics platform uses Apache Kafka for data ingesti...HostedbyConfluent
Using Kafka to stream data into TigerGraph, a distributed graph database, is a common pattern in our customers’ data architecture. In the TigerGraph database, Kafka Connect framework was used to build the native S3 data loader. In TigerGraph Cloud, we will be building native integration with many data sources such as Azure Blob Storage and Google Cloud Storage using Kafka as an integrated component for the Cloud Portal.
In this session, we will be discussing both architectures: 1. built-in Kafka Connect framework within TigerGraph database; 2. using Kafka cluster for cloud native integration with other popular data sources. Demo will be provided for both data streaming processes.
Presentation on dogfooding data at Lyft by Mark Grover and Arup Malakar on Oct 25, 2017 at Big Analytics Meetup (https://www.meetup.com/SF-Big-Analytics/events/243896328/)
Empowering Zillow’s Developers with Self-Service ETLDatabricks
The document discusses Zillow's efforts to empower its developers with self-service extract, transform, load (ETL) tools. It describes two key components: Zetlas, a user-friendly tool that allows non-coding users to automate SQL workflows through a graphical interface; and Zagger, a developer-focused service that automates data engineering tasks and integrates with common ETL tools. The tools were developed to meet different user needs, lower the barrier to data work, and provide modular platforms that can be expanded over time. Zillow aims to continue growing adoption of these self-service ETL tools and unifying their backends.
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligenceWei Di
B2B sales intelligence has become an integral part of LinkedIn’s business to help companies optimize resource allocation and design effective sales and marketing strategies. This new trend of data-driven approaches has “sparked” a new wave of AI and ML needs in companies large and small. Given the tremendous complexity that arises from the multitude of business needs across different verticals and product lines, Apache Spark, with its rich machine learning libraries, scalable data processing engine and developer-friendly APIs, has been proven to be a great fit for delivering such intelligence at scale.
See how Linkedin is utilizing Spark for building sales intelligence products. This session will introduce a comprehensive B2B intelligence system built on top of various open source stacks. The system puts advanced data science to work in a dynamic and complex scenario, in an easily controllable and interpretable way. Balancing flexibility and complexity, the system can deal with various problems in a unified manner and yield actionable insights to empower successful business. You will also learn about some impactful Spark-ML powered applications such as prospect prediction and prioritization, churn prediction, model interpretation, as well as challenges and lessons learned at LinkedIn while building such platform.
Correlate Log Data with Business Metrics Like a JediTrevor Parsons
The Logentries and Hosted Graphite integration allows you to connect two of your favorite Ops tools to easily extract important data from your log files, visualize them as metrics, and share them in Hosted Graphite dashboards.
• Integrate your systems to extract the metrics you need, from both your applications and log data.
• Set-up log metric dashboards based on common use cases (e.g. error tracking, performance, app usage).
• Get off the "complexity elevator" of hosting your own in-house logging or graphite solutions.
• Delight your team and organization with valuable metrics and performance insights.
Jamie Grier - Robust Stream Processing with Apache FlinkFlink Forward
The document discusses Apache Flink, an open source platform for distributed stream and batch data processing. It describes how Flink allows for stateful stream processing, windowed computations over streams, and robust handling of time, failures, planned downtimes, and reprocessing. The presentation provides an overview of these concepts and includes a code example and details on Flink's distributed and parallel deployment architecture.
Streaming datasets for personalizationShriya Arora
Streaming applications have historically been complex to design and implement because of the significant infrastructure investment. However, recent active developments in various streaming platforms provide an easy transition to stream processing, and enable analytics applications/experiments to consume near real-time data without massive development cycles.In this session, we will present our experience on stream processing unbounded datasets in the personalization space. The datasets consisted of -- but were not limited to -- the stream of playback events that are used as feedback for all personalization algorithms. These datasets when ultimately consumed by our machine learning models, directly affect the customer’s personalized experience. We’ll talk about the experiments we did to compare Apache Spark and Apache Flink, and the challenges we faced.
Lyft's streaming platform uses Apache Flink for stream processing and Apache Kafka for messaging. Flink was chosen for its capabilities around state management, exactly-once processing, and flexible APIs. Kafka was chosen for its durability, low latency, and consumer fanout. However, open problems remain around rescaling Kafka while preserving per-key ordering, enabling dynamic stream computations, long-term event storage, and zero downtime deployments. Lyft is working to solve these challenges as it builds out its next generation streaming platform.
Spline: Data Lineage For Spark Structured StreamingVaclav Kosar
Data lineage tracking is one of the significant problems that companies in highly regulated industries face. These companies are forced to have a good understanding of how data flows through their systems to comply with strict regulatory frameworks. Many of these organizations also utilize big and fast data technologies such as Hadoop, Apache Spark and Kafka. Spark has become one of the most popular engines for big data computing. In recent releases, Spark also provides the Structured Streaming component, which allows for real-time analysis and processing of streamed data from many sources. Spline is a data lineage tracking and visualization tool for Apache Spark. Spline captures and stores lineage information from internal Spark execution plans in a lightweight, unobtrusive and easy to use manner.
Additionally, Spline offers a modern user interface that allows non-technical users to understand the logic of Apache Spark applications. In this presentation we cover the support of Spline for Structured Streaming and we demonstrate how data lineage can be captured for streaming applications.
Presented at Spark Summit London 2018
Elasticsearch has always been fast, but required structuring and indexing your data up front. We're changing that with the introduction of runtime fields, which enable you to extract, calculate, and transform fields at query time. They can be defined after data is indexed or provided with your query, enabling new cost/storage/performance tradeoffs, and letting analysts gradually define fields over time.
Activate 2019 - Search and relevance at scale for online classifiedsRoger Rafanell Mas
A high performing search service implies both having an effective search infrastructure and high search relevance.
Seeking for a fault tolerant, self-healing and cost-effective search infrastructure at scale, we built a platform based on Apache Solr search engine with light in-memory indexes, avoiding sharding and decreasing the overall infrastructure needs.
To populate the indexes, we use flexible ETL processes, keeping our product catalog and search indexes updated in a near real-time fashion and distributed across high-performant database engines.
We aim at getting a high search relevance precision and recall by applying query relaxation and boost solutions on top of the optimised platform.
https://www.activate-conf.com/speakers/detail/roger-rafanell
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...HostedbyConfluent
We often need to build applications that analyze Kafka data to unlock the most value from event streams, so how can organizations build these real-time analytics applications? In this talk, we examine an indexing approach that enables fast SQL analytics on data from Kafka, without data flattening or denormalization. Rockset is the real-time indexing database that builds an inverted index, a columnar index and a row index on all fields of your Kafka messages, including nested fields and arrays. This Converged Index accelerates various types of analytic queries–search, aggregations and joins–without the need to denormalize or transform data for performance reasons. With indexing delivering significant gains in query performance, we also need to index new data in a timely manner. We discuss several strategies used for efficient ingestion and indexing from Kafka, including rollups, write optimizations on the underlying RocksDB storage engine, and the disaggregation of ingest and query compute.
Telling the LivePerson Technology Story at Couchbase [SF] 2013LivePerson
As part of Couchbase[SF]2013, Ido Shilon, R&D Group Leader at LivePerson, discusses LivePerson's project to re-architect their LiveEngage platform backend, with a focus on LivePerson's decision to use NoSQL technologies, the challenges encountered, and the lessons learned.
Video: http://www.youtube.com/watch?v=rYKWFmJEHX0
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...HostedbyConfluent
Apache Kafka users who want to leverage Google Cloud Platform's (GCPs) data analytics platform and open source hosting capabilities can bridge their existing Kafka infrastructure on-premise or in other clouds to GCP using Confluent's replicator tool and managed Kafka service on GCP. Using actual customer examples and a reference architecture, we'll showcase how existing Kafka users can stream data to GCP and use it in popular tools like Apache Beam on Dataflow, BigQuery, Google Cloud Storage (GCS), Spark on Dataproc, and Tensorflow for data warehousing, data processing, data storage, and advanced analytics using AI and ML.
Kafka, Killer of Point-to-Point Integrations, Lucian Litaconfluent
With 60+ products and over 24% of the US GDP flowing through it, system integration is a tough problem for Intuit. Seasonality, scale, and massive peaks in products like TurboTax, QuickBooks, and Mint.com add extra layers of difficulty when building shared data services around transaction and user graphs, clickstream processing, a/b testing, and personalization. To reduce complexity and latency, we’ve implemented Kafka as the backbone across these data services. This allows us to asynchronously trigger relevant processing, elegantly scaling up and down as needed around peaks, all without the need for point-to-point integrations.
In this talk, we share what we’ve learned about Kafka at Intuit and describe our data services architecture. We found that Kafka is invaluable in achieving a scalable, clean architecture, allowing engineering teams to focus less on integration and more on product development.
Kafka and Stream Processing, Taking Analytics Real-time, Mike Spicerconfluent
Stream processing analyzes data in motion before it is stored, allowing for real-time analytics with low latency. Kafka is well-suited for stream processing due to its speed, scalability, durability, and ability to act as a universal hub. Real-time analytics can handle many use cases like customer intelligence, IoT, and security. Examples include a telco using stream processing for real-time advertising and Thompson Reuters using it for news ingestion and analytics. Stream processing can analyze data from the edge to the center in real-time to detect and predict insights and enable immediate actions.
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumarconfluent
Siphon is a highly available and reliable distributed pub/sub system built using Apache Kafka. It is used to publish, discover and subscribe to near real-time data streams for operational and product intelligence. Siphon is used as a “Databus” by a variety of producers and subscribers in Microsoft, and is compliant with security and privacy requirements. It has a built-in Auditing and Quality control. This session will provide an overview of the use of Kafka at Microsoft, and then deep dive into Siphon. We will describe an important business scenario and talk about the technical details of the system in the context of that scenario. We will also cover the design and implementation of the service, the scale, and real world production experiences from operating the service in the Microsoft cloud environment.
Taboola's data processing architecture has evolved over time from directly writing to databases to using Apache Spark for scalable real-time processing. Spark allows Taboola to process terabytes of data daily across multiple data centers for real-time recommendations, analytics, and algorithm calibration. Key aspects of Taboola's architecture include using Cassandra for event storage, Spark for distributed computing, Mesos for cluster management, and Zookeeper for coordination across a large Spark cluster.
At Netflix, we've spent a lot of time thinking about how we can make our analytics group move quickly. Netflix's Data Engineering & Analytics organization embraces the company's culture of "Freedom & Responsibility".
How does a company with a $40 billion market cap and $6 billion in annual revenue keep their data teams moving with the agility of a tiny company?
How do hundreds of data engineers and scientists make the best decisions for their projects independently, without the analytics environment devolving into chaos?
We'll talk about how Netflix equips its business intelligence and data engineers with:
the freedom to leverage cloud-based data tools - Spark, Presto, Redshift, Tableau and others - in ways that solve our most difficult data problems
the freedom to find and introduce right software for the job - even if it isn't used anywhere else in-house
the freedom to create and drop new tables in production without approval
the freedom to choose when a question is a one-off, and when a question is asked often enough to require a self-service tool
the freedom to retire analytics and data processes whose value doesn't justify their support costs
Speaker Bios
Monisha Kanoth is a Senior Data Architect at Netflix, and was one of the founding members of the current streaming Content Analytics team. She previously worked as a big data lead at Convertro (acquired by AOL) and as a data warehouse lead at MySpace.
Jason Flittner is a Senior Business Intelligence Engineer at Netflix, focusing on data transformation, analysis, and visualization as part of the Content Data Engineering & Analytics team. He previously led the EC2 Business Intelligence team at Amazon Web Services and was a business intelligence engineer with Cisco.
Chris Stephens is a Senior Data Engineer at Netflix. He previously served as the CTO at Deep 6 Analytics, a machine learning & content analytics company in Los Angeles, and on the data warehouse teams at the FOX Audience Network and Anheuser-Busch.
Kafka Summit NYC 2017 - Every Message Counts: Kafka as a Foundation for Highl...confluent
This document discusses Airbnb's use of Kafka as the foundation for its highly reliable logging system. It describes the types of data Airbnb collects, including product events, database exports, service events, and derived data. Airbnb uses a simple logging pipeline where events are delivered reliably through Kafka in real time. Key components of its production logging pipeline include Jitney for standardized messaging, a central schema repository, client SDKs, producer and consumer agents, and a self-service portal. The pipeline provides continuous integration through schema authoring, deployment, implementation, processing and storage, and monitoring. It handles large volumes of data reliably with 150 brokers processing 1 million messages per second and 10 billion events collected daily with very low
This document summarizes 5 papers related to big data architecture and deep learning. Paper 1 discusses the Lambda architecture for balancing real-time and batch data processing. Paper 2 introduces Delta Lake for efficient ACID-compliant storage over object stores. Paper 3 proposes the Lakehouse architecture which unifies data warehousing and analytics using Delta Lake. Paper 4 presents the Conformer model that combines transformers and convolutions for speech recognition. The last paper applies intent detection and slot filling to Vietnamese text using BERT. These papers are relevant to the author's graduation thesis on traffic prediction using speech data analysis.
Radical Speed for SQL Queries on Databricks: Photon Under the HoodDatabricks
Join this session to hear from the Photon product and engineering team talk about the latest developments with the project.
As organizations embrace data-driven decision-making, it has become imperative for them to invest in a platform that can quickly ingest and analyze massive amounts and types of data. With their data lakes, organizations can store all their data assets in cheap cloud object storage. But data lakes alone lack robust data management and governance capabilities. Fortunately, Delta Lake brings ACID transactions to your data lakes – making them more reliable while retaining the open access and low storage cost you are used to.
Using Delta Lake as its foundation, the Databricks Lakehouse platform delivers a simplified and performant experience with first-class support for all your workloads, including SQL, data engineering, data science & machine learning. With a broad set of enhancements in data access and filtering, query optimization and scheduling, as well as query execution, the Lakehouse achieves state-of-the-art performance to meet the increasing demands of data applications. In this session, we will dive into Photon, a key component responsible for efficient query execution.
Photon was first introduced at Spark and AI Summit 2020 and is written from the ground up in C++ to take advantage of modern hardware. It uses the latest techniques in vectorized query processing to capitalize on data- and instruction-level parallelism in CPUs, enhancing performance on real-world data and applications — all natively on your data lake. Photon is fully compatible with the Apache Spark™ DataFrame and SQL APIs to ensure workloads run seamlessly without code changes. Come join us to learn more about how Photon can radically speed up your queries on Databricks.
How a distributed graph analytics platform uses Apache Kafka for data ingesti...HostedbyConfluent
Using Kafka to stream data into TigerGraph, a distributed graph database, is a common pattern in our customers’ data architecture. In the TigerGraph database, Kafka Connect framework was used to build the native S3 data loader. In TigerGraph Cloud, we will be building native integration with many data sources such as Azure Blob Storage and Google Cloud Storage using Kafka as an integrated component for the Cloud Portal.
In this session, we will be discussing both architectures: 1. built-in Kafka Connect framework within TigerGraph database; 2. using Kafka cluster for cloud native integration with other popular data sources. Demo will be provided for both data streaming processes.
Presentation on dogfooding data at Lyft by Mark Grover and Arup Malakar on Oct 25, 2017 at Big Analytics Meetup (https://www.meetup.com/SF-Big-Analytics/events/243896328/)
Empowering Zillow’s Developers with Self-Service ETLDatabricks
The document discusses Zillow's efforts to empower its developers with self-service extract, transform, load (ETL) tools. It describes two key components: Zetlas, a user-friendly tool that allows non-coding users to automate SQL workflows through a graphical interface; and Zagger, a developer-focused service that automates data engineering tasks and integrates with common ETL tools. The tools were developed to meet different user needs, lower the barrier to data work, and provide modular platforms that can be expanded over time. Zillow aims to continue growing adoption of these self-service ETL tools and unifying their backends.
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligenceWei Di
B2B sales intelligence has become an integral part of LinkedIn’s business to help companies optimize resource allocation and design effective sales and marketing strategies. This new trend of data-driven approaches has “sparked” a new wave of AI and ML needs in companies large and small. Given the tremendous complexity that arises from the multitude of business needs across different verticals and product lines, Apache Spark, with its rich machine learning libraries, scalable data processing engine and developer-friendly APIs, has been proven to be a great fit for delivering such intelligence at scale.
See how Linkedin is utilizing Spark for building sales intelligence products. This session will introduce a comprehensive B2B intelligence system built on top of various open source stacks. The system puts advanced data science to work in a dynamic and complex scenario, in an easily controllable and interpretable way. Balancing flexibility and complexity, the system can deal with various problems in a unified manner and yield actionable insights to empower successful business. You will also learn about some impactful Spark-ML powered applications such as prospect prediction and prioritization, churn prediction, model interpretation, as well as challenges and lessons learned at LinkedIn while building such platform.
Correlate Log Data with Business Metrics Like a JediTrevor Parsons
The Logentries and Hosted Graphite integration allows you to connect two of your favorite Ops tools to easily extract important data from your log files, visualize them as metrics, and share them in Hosted Graphite dashboards.
• Integrate your systems to extract the metrics you need, from both your applications and log data.
• Set-up log metric dashboards based on common use cases (e.g. error tracking, performance, app usage).
• Get off the "complexity elevator" of hosting your own in-house logging or graphite solutions.
• Delight your team and organization with valuable metrics and performance insights.
Jamie Grier - Robust Stream Processing with Apache FlinkFlink Forward
The document discusses Apache Flink, an open source platform for distributed stream and batch data processing. It describes how Flink allows for stateful stream processing, windowed computations over streams, and robust handling of time, failures, planned downtimes, and reprocessing. The presentation provides an overview of these concepts and includes a code example and details on Flink's distributed and parallel deployment architecture.
Streaming datasets for personalizationShriya Arora
Streaming applications have historically been complex to design and implement because of the significant infrastructure investment. However, recent active developments in various streaming platforms provide an easy transition to stream processing, and enable analytics applications/experiments to consume near real-time data without massive development cycles.In this session, we will present our experience on stream processing unbounded datasets in the personalization space. The datasets consisted of -- but were not limited to -- the stream of playback events that are used as feedback for all personalization algorithms. These datasets when ultimately consumed by our machine learning models, directly affect the customer’s personalized experience. We’ll talk about the experiments we did to compare Apache Spark and Apache Flink, and the challenges we faced.
Lyft's streaming platform uses Apache Flink for stream processing and Apache Kafka for messaging. Flink was chosen for its capabilities around state management, exactly-once processing, and flexible APIs. Kafka was chosen for its durability, low latency, and consumer fanout. However, open problems remain around rescaling Kafka while preserving per-key ordering, enabling dynamic stream computations, long-term event storage, and zero downtime deployments. Lyft is working to solve these challenges as it builds out its next generation streaming platform.
Spline: Data Lineage For Spark Structured StreamingVaclav Kosar
Data lineage tracking is one of the significant problems that companies in highly regulated industries face. These companies are forced to have a good understanding of how data flows through their systems to comply with strict regulatory frameworks. Many of these organizations also utilize big and fast data technologies such as Hadoop, Apache Spark and Kafka. Spark has become one of the most popular engines for big data computing. In recent releases, Spark also provides the Structured Streaming component, which allows for real-time analysis and processing of streamed data from many sources. Spline is a data lineage tracking and visualization tool for Apache Spark. Spline captures and stores lineage information from internal Spark execution plans in a lightweight, unobtrusive and easy to use manner.
Additionally, Spline offers a modern user interface that allows non-technical users to understand the logic of Apache Spark applications. In this presentation we cover the support of Spline for Structured Streaming and we demonstrate how data lineage can be captured for streaming applications.
Presented at Spark Summit London 2018
Elasticsearch has always been fast, but required structuring and indexing your data up front. We're changing that with the introduction of runtime fields, which enable you to extract, calculate, and transform fields at query time. They can be defined after data is indexed or provided with your query, enabling new cost/storage/performance tradeoffs, and letting analysts gradually define fields over time.
Activate 2019 - Search and relevance at scale for online classifiedsRoger Rafanell Mas
A high performing search service implies both having an effective search infrastructure and high search relevance.
Seeking for a fault tolerant, self-healing and cost-effective search infrastructure at scale, we built a platform based on Apache Solr search engine with light in-memory indexes, avoiding sharding and decreasing the overall infrastructure needs.
To populate the indexes, we use flexible ETL processes, keeping our product catalog and search indexes updated in a near real-time fashion and distributed across high-performant database engines.
We aim at getting a high search relevance precision and recall by applying query relaxation and boost solutions on top of the optimised platform.
https://www.activate-conf.com/speakers/detail/roger-rafanell
The Evolution of Testing Methodology at AWS: From Status Quo to Formal Method...C4Media
The document summarizes the evolution of testing methodology at AWS from a status quo approach to using formal methods with TLA+. It describes AWS services experiencing exponential growth and the need for better testing of distributed algorithms. It outlines their general test strategy and limitations of status quo test adequacy criteria. The document then details AWS' adoption of generative testing, in-process clusters, informal proofs, and formal specification with TLA+ to validate algorithm correctness. Real-world examples applying these methods to DynamoDB and other AWS services are provided.
The document provides information about Worldwide Intellectual Property Service (WIPS), including:
1. WIPS is a Korean company that offers online worldwide patent information services and has over 3,000 company and institution clients.
2. WIPS provides various patent search and analysis tools including integrated searches across multiple countries, advanced searches, and tools for viewing search results, analyzing patent families and citations.
3. WIPS collects patent data from several major jurisdictions including the US, Europe, Japan, and Korea and provides tools for downloading, analyzing, and monitoring patent information.
Behind the Wizard’s Curtain: Scalability and Security at Zuora (Subscribed13)Zuora, Inc.
Ever wonder what's in the Zuora cloud? Join us and learn how Zuora has built a scalable and secure cloud based subscription billing management service. Hear from scalability, security and operations engineers and have your questions answered.
Enhancements on Spark SQL optimizer by Min QiuSpark Summit
This document summarizes enhancements made to Spark SQL's optimizer including rule-based optimizations and cost-based optimizations. Rule-based optimizations included join condition push down through predicate rewrite and join order adjustment, as well as data volume reduction through column pruning enhancements. Cost-based optimizations leveraged statistics and histograms to select optimal join types, partitions, and join orders. Future work focuses on enumerating the full space of query plans and improving estimation accuracy.
This document summarizes a presentation about streamlining Oracle service contracts for Crossbeam Systems. It introduces Protégé Software Services and Crossbeam, and discusses Crossbeam's history and business needs around improving visibility and processing of contract information. The presentation demonstrates an add-on application developed by Protégé that provides enhanced search, dashboard, drill-down and export/import features to optimize contract management in Oracle. It shows how the application improved customer service, increased revenue, and reduced costs for Crossbeam.
Ronald Hsu presented on Carousell's migration from a monolithic architecture to microservices. Some key points:
- Carousell is a mobile-first classifieds app with over 185 million listings across 20+ cities in 7 markets.
- The goals of migrating were to improve productivity, reduce dependencies and server costs, and handle higher traffic.
- The strategy involved developing services independently, tying them together gradually, and ensuring zero downtime during rollout.
- Challenges included stabilizing GRPC connections, handling side effects, switching feature flags for a short time, backfilling data, and balancing performance against a good enough initial logic.
(Mike Graham + Dan Carroll, Comcast) Kafka Summit SF 2018
Comcast manages over 2 million miles of fiber and coax, and over 40 million in home devices. This “outside plant” is subject to adverse conditions from severe weather to power grid outages to construction-related disruptions. Maintaining the health of this large and important infrastructure requires a distributed, scalable, reliable and fast information system capable of real-time processing and rapid analysis and response. Using Apache Kafka and the Kafka Streams Processor API, Comcast built an innovative new system for monitoring, problem analysis, metrics reporting and action response for the outside plant.
In this talk, you’ll learn how topic partitions, state stores, key mapping, source and sink topics and processors from the Kafka Streams Processor API work together to build a powerful dynamic system. We will dive into the details about the inner workings of the state store—how it is backed by a Kafka “changelog” topic, how it is scaled horizontally by partition and how the instances are rebuilt on startup or on processor failure. We will discuss how these state stores essentially become like materialized views in a SQL database but are updated incrementally as data flows through the system, and how this allows the developers to maintain the data in the optimal structures for performing the processing. The best part is that the data is readily available when needed by the processors. You will see how a REST API using Kafka Streams “interactive queries” can be used to retrieve the data in the state stores. We will explore the deployment and monitoring mechanisms used to deliver this system as a set of independently deployed components.
Using Deep Learning and Customized Solr Components to Improve search Relevanc...Lucidworks
Gain insight into the state-of-the-art deep learning algorithms being used to power e-commerce search at Target and how to customize Solr to blend multiple ML signals at a large scale.
Speakers:
Aashish Dattani, Lead Data Engineer, Target
Richard Wang, Principal AI Scientist, Target
Sunil Srinivasan, Lead Engineer, Target
IBM Blockchain Platform - Architectural Good Practices v1.0Matt Lucas
This document discusses architectural good practices for blockchains and Hyperledger Fabric performance. It provides an overview of key concepts like transaction processing in Fabric and performance metrics. It also covers optimizing different parts of the Fabric network like client applications, peers, ordering service, and chaincode. The document recommends using tools like Hyperledger Caliper and custom test harnesses for performance testing and monitoring Fabric deployments. It highlights lessons learned from real projects around reusing connections and load balancing requests.
SplunkLive! Presentation - Data Onboarding with SplunkSplunk
- The data onboarding process involves systematically bringing new data sources into Splunk to make the data instantly usable and valuable for users
- The process includes pre-boarding activities like identifying the data, mapping fields, and building index-time and search-time configurations
- It also involves deploying any necessary infrastructure, deploying the configurations, testing and validating the data, and getting user approval before the process is complete
This document outlines an agenda for an advanced Splunk user training workshop. The workshop covers topics like field aliasing, common information models, event types, tags, dashboard customization, index replication for high availability, report acceleration, and lookups. It provides overviews and examples for each topic and directs attendees to additional documentation resources for more in-depth learning. The workshop also includes demonstrations of dashboard customization techniques and discusses support options through the Splunk community.
Creating a Project Plan for a Data Warehouse Testing AssignmentRTTS
This document provides guidance on creating a project plan for testing a data warehouse project. It discusses key aspects to consider such as reviewing documentation, estimating resources like test engineers, determining the number of ETL legs and release cycles, assessing test complexity, and ensuring the test automation tool QuerySurge is configured. An example project plan estimates the time to review documentation, identifies one test engineer and ETL leg, plans for four release cycles, and provides estimates of 7 low complexity, 21 medium complexity, and 8 high complexity tests.
A Practical Guide to Selecting a Stream Processing Technology confluent
Presented by Michael Noll, Product Manager, Confluent.
Why are there so many stream processing frameworks that each define their own terminology? Are the components of each comparable? Why do you need to know about spouts or DStreams just to process a simple sequence of records? Depending on your application’s requirements, you may not need a full framework at all.
Processing and understanding your data to create business value is the ultimate goal of a stream data platform. In this talk we will survey the stream processing landscape, the dimensions along which to evaluate stream processing technologies, and how they integrate with Apache Kafka. Particularly, we will learn how Kafka Streams, the built-in stream processing engine of Apache Kafka, compares to other stream processing systems that require a separate processing infrastructure.
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...Databricks
Our team at Comcast is challenged with operationalizing predictive ML models to improve customer experience. Our goal is to eliminate bottlenecks in the process from model inception to deployment and monitoring.
Traditionally CI/CD manages code and infrastructure artifacts like container definitions. We want to extend it to support granular traceability enabling tracking of ML Models from use-case, to feature/attribute selection, development of versioned datasets, model training code, model evaluation artifacts, model prediction deployment containers, and sinks to which the predictions/outcomes are persisted to. Our framework stack enables us to track models from use-case to deployments, manage and evaluate multiple models simultaneously in the live yet dark mode and continue to monitor models in production against real-world outcomes using configurable policies.
The technologies/components which drive this vision are:
1. FeatureStore – Enables data scientists to reuse versioned features and review feature metrics by models. Self-Service capabilities allow all teams to onboard their events data into the feature store.
2. ModelRepository – Manages meta-data about models including pre-processing parameters (Ex. Scaling parameters for features), mapping to the features needed to execute the model, model discovery mechanisms, etc.
3. Spark on Alluxio – Alluxio provides the universal data plane on top of various under-stores (Ex. S3, HDFS, RDBMS). Apache Spark with its Data Sources API provides a unified query language which Data Scientist use to consume features to create training/validation/test datasets which are versioned and integrated into the full model pipeline using Ground-Context discussed next.
4. Ground-Context – This open-source vendor-neutral data context service enables full traceability from use-case, models, features, model to features mapping, versioned datasets, model training codebase, model deployment containers and prediction/outcome sinks. It integrates with the Feature-Store, Container Repository and Git to integrate data, code and run-time artifacts for CI/CD integration.
This document provides an overview of efficient query processing infrastructures for web search engines. It discusses how search engines use distributed architectures across many servers to efficiently process queries at large scale. It also describes how search engines employ various techniques like index compression, skipping, dynamic pruning, and learning to rank to efficiently evaluate queries while maintaining effectiveness. The goal is to provide concise yet relevant search results to users as fast as possible despite the massive scale of web data.
This document provides an overview of various topics related to software project management. It begins with a list of suggested topics for discussion, such as challenges specific to software projects, quality measurements, and best practices in Pakistan. It then covers aspects of the software development lifecycle from planning and requirements through deployment and maintenance. Different project models like waterfall, evolutionary prototyping, and spiral development are described along with their advantages and disadvantages. Finally, it touches on using commercial off-the-shelf software.
The document discusses planning for software project management. It provides examples of potential topics that could be covered in project planning, such as challenges specific to software projects, quality measurements, and best practices in Pakistan. It also gives examples of time and resource allocation across different project phases. Potential project deliverables are outlined for each phase from concept exploration to deployment and maintenance. Finally, it discusses lifecycle planning and the importance of choosing an appropriate model based on project risks and requirements understanding.
Salesforce API access is nice, but often not enough. In the Big Data era, you are often required to replicate Salesforce data for offline processing. Implisit requires such replication for its data entry engine to automatically enter emails, events, contacts, and leads for its users. In order to do so, Implisit maintains a daily sync of over one billion Salesforce data records, while using no more than a few hundred API calls per Salesforce Org. Join us as we share the suggested architecture of such a replication mechanism, the best practices we developed over time, and the pitfalls to avoid.
Ähnlich wie Elevation Query Extension: Introducing Subselects into Lucene Queries (20)
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
With ecommerce experiencing explosive growth, it seems intuitive that the B2B segment of that ecosystem is mirroring the same trajectory. That said, B2B has very different needs when it comes to transacting with the same style of experiences that we see in B2C. For instance, B2B ecommerce is about precision findability, whereas B2C customers can convert at higher rates when they’re just browsing online. In order for the B2B buying experience to be successful, search needs to be tuned to meet the unique needs of the segment.
In this webinar with Forrester senior analyst Joe Cicman, you’ll learn:
-Which verticals in B2B will drive the most growth, and how machine-learning powered personalization tactics can be deployed to support those specific verticals
-Why an omnichannel selling approach must be deployed in order to see success in B2B
-How deploying content search capabilities will support a longer sales cycle at scale
-What the next steps are to support a robust B2B commerce strategy supported by new technology
Speakers
Joe Cicman, Senior Analyst, Forrester
Jenny Gomez, VP of Marketing, Lucidworks
Customer loyalty starts with quickly responding to your customer’s needs. When it comes to resolving open support cases, time is of the essence. Time spent searching for answers adds up and creates inefficiencies in resolving cases at scale. Relevant answers need to be a few clicks away and easily accessible for agents directly from their service console.
We will explore how Lucidworks’ Agent Insights application automatically connects agents with the correct answers and resources. You’ll learn how to:
-Configure a proactive widget in an agent’s case view page to access resources across third-party systems (such as Sharepoint, Confluence, JIRA, Zendesk, and ServiceNow).
-Easily set up query pipelines to autonomously route assets and resources that are relevant to the case-at-hand—directly to the right agent.
-Identify subject matter experts within your support data and access tribal knowledge with lightning-fast speed.
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
Lunch and Learn during Retail TouchPoints #RIC21 virtual event.
***
Crate & Barrel’s previous search solution couldn’t provide its shoppers with an online search and browse experience consistent with the customer-centric Crate & Barrel brand. Meanwhile, Crate & Barrel merchandisers spent the bulk of their time manually creating and maintaining search rules. The search experience impacted customer retention, loyalty, and revenue growth.
Join this lunch & learn for an interactive chat on how Crate & Barrel partnered with Lucidworks to:
-Improve search and browse by modernizing the technology stack with ML-based personalization and merchandising solutions
-Enhance the experience for both shoppers and merchandisers
-Explore signals to transform the omnichannel shopping experience
Questions? Visit https://lucidworks.com/contact/
Learn how to guide customers to relevant products using eCommerce search, hyper-personalisation, and recommendations in our ‘Best-In-Class Retail Product Discovery’ webinar.
Nowadays, shoppers want their online experience to be engaging, inspirational and fulfilling. They want to find what they’re looking for quickly and easily. If the sought after item isn’t available, they want the next best product or content surfaced to them. They want a website to understand their goals as though they were talking to a sales assistant in person, in-store.
In this webinar, we explore IMRG industry data insights and a best-in-class example of retail product discovery. You’ll learn:
- How AI can drive increased revenue through hyper-personalised experiences
- How user intent can be easily understood and results displayed immediately
- How merchandisers can be empowered to curate results and product placement – all without having to rely on IT.
Presented by:
Dave Hawkins, Principal Sales Engineer - Lucidworks
Matthew Walsh, Director of Data & Retail - IMRG
Connected Experiences Are Personalized ExperiencesLucidworks
Many companies claim personalization and omnichannel capabilities are top priorities. Few are able to deliver on those experiences.
For a recent Lucidworks-commissioned study, Forrester Consulting surveyed 350+ global business decision-makers to see what gets in the way of achieving these goals. They discovered that inefficient technology, lack of behavioral insights, and failure to tie initiatives to enterprise-wide goals are some of the most frequent blockers to personalization success.
Join guest speaker, Forrester VP and Principal Analyst, Brendan Witcher, and Lucidworks CEO, Will Hayes, to hear the results of the Forrester Consulting study, how to avoid “digital blindness,” and how to apply VoC data in real-time to delight customers with personalized experiences connected across every touchpoint.
In this webinar, you’ll learn:
- Why companies who utilize real-time customer signals report more effective personalization
- How to connect employees and customers in a shared experience through search and browse
- How Lucidworks clients Lenovo, Morgan Stanley and Red Hat fast-tracked improvements in conversion, engagement and customer satisfaction
Featuring
- Will Hayes, CEO, Lucidworks
- Brendan Witcher, VP, Principal Analyst, Forrester
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
Intelligent Policing. Leveraging Data to more effectively Serve Communities.
Policing in the next decade is anticipated to be very different from historical methods. More data driven, more focused on the intricacies of communities they serve and more open and collaborative to make informed recommendations a reality. Whether its social populations, NIBRS or organization improvement that’s the driver, the IT requirement is largely the same. Provide 360 access to large volumes of siloed data to gain a full 360 understanding of existing connections and patterns for improved insight and recommendation.
Join us for a round table discussion of how the Toronto Police Service is better serving their community through deploying a unified intelligent data platform.
Data innovation improves officers' engagement with existing data and streamlines investigation workflows by enhancing collaboration. This improved visibility into existing police data allows for a more intelligent and responsive police force.
In this webinar, we'll cover:
-The technology needs of an intelligent police force.
-How a Global Search improves an officer's interaction with existing data.
Featuring:
-Simon Taylor, VP, Worldwide Channels & Alliances, Lucidworks
-Michael Cizmar, Managing Director, MC+A
-Ian Williams, Manager of Analytics & Innovation, Toronto Police Service
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
Policing in the next decade is anticipated to be very different from historical methods. More data driven, more focused on the intricacies of communities they serve and more open and collaborative to make informed recommendations a reality. Whether its social populations, NIBRS or organization improvement that’s the driver, the IT requirement is largely the same. Provide 360 access to large volumes of siloed data to gain a full 360 understanding of existing connections and patterns for improved insight and recommendation.
Join us for a round table discussion of how the Toronto Police Service is better serving their community through deploying a unified intelligent data platform.
Data innovation improves officers' engagement with existing data and streamlines investigation workflows by enhancing collaboration. This improved visibility into existing police data allows for a more intelligent and responsive police force.
In this webinar, we'll cover:
The technology needs of an intelligent police force.
How a Global Search improves an officer's interaction with existing data.
Featuring
-Simon Taylor, VP, Worldwide Channels & Alliances, Lucidworks
-Michael Cizmar, Managing Director, MC+A
-Ian Williams, Manager of Analytics & Innovation, Toronto Police Service
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
This document provides a framework for prioritizing onsite search problems and key performance indicators (KPIs) to measure for e-commerce search optimization. It recommends prioritizing fixing searches that yield no results, improving relevance of results, and reducing false positives. The most essential KPIs to measure include query latency, throughput, result relevance through click-through rates and NDCG scores. The document also provides tips for self-benchmarking search performance and examples of search performance benchmarks across nine e-commerce sites from various industries.
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
Wish your conversion rates were higher? Can’t figure out how to efficiently and effectively serve all the visitors on your site? Embarrassed by the quality of your product discovery experience? The bar is high and the influx of online shopping over recent months has reminded us that the opportunities are real. We’re all deep in holiday prep, but let’s take a few minutes to think about January 2021 and beyond. How can we position ourselves for success with our customers and against our competition?
Grab your lunch and let’s dive into three strategies that need to be part of your 2021 roadmap. You don’t need an army to get there. But you do need to take action and capitalize on the shoppers abandoning the product discovery journey on your site.
In this session, attendees will find out how to:
-Take control of merchandising at scale;
-Implement hands-free search relevancy; and
-Address personalization challenges.
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
For a personalized search experience, search curation requires robust text interpretation, data enrichment, relevancy tuning and recommendations. In order to achieve this, language and entity identification are crucial.
For teams working on search applications, advanced language packages allow them to achieve greater recall without sacrificing precision.
Join us for a guided tour of our new Advanced Linguistics packages, available in Fusion, thanks to the technology partnership between Lucidworks and Basistech.
We’ll explore the application of language identification and entity extraction in the context of search, along with practical examples of personalizing search and enhancing entity extraction.
In this webinar, we’ll cover:
-How Fusion uses the Rosette Basic Linguistics and Entity Extraction packages
-Tips for improving language identification and treatment as well as data enrichment for personalization
-Speech2 demo modeling Active Recommendation
-Use Rosette’s packages with Fusion Pipelines to build custom entities for specific domain use cases
Featuring:
-Radu Miclaus, Director of Product, AI and Cloud, Lucidworks, Lucidworks
-Robert Lucarini, Senior Software Engineer, Lucidworks
-Nick Belanger, Solutions Engineer, Basis Technology
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
Before COVID-19, almost 80% of the US workforce worked service in jobs that involve in-person interaction with strangers. Now, leaders of service organizations must reshape their offerings during the pandemic and prepare for whatever the new normal turns out to be. Our three panelists will share ideas for adapting their service businesses, now that closer-than-six-feet isn’t an option.
Join Lucidworks as we talk shop with 3 service business leaders, covering:
-Common impacts of the pandemic on service businesses (and what to do about them),
-How service teams can maintain a human touch across virtual channels, and
-Plans for the future, before and after the pandemic subsides.
Featuring
-Sara Nathan, President & CEO, AMIGOS
-Anthony Carruesco, Founder, AC Fly Fishing
-sara bradley, chef and proprietor, freight house
-Justin Sears, VP Product Marketing, Lucidworks
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
The COVID-19 pandemic has forced companies to support far more customers and employees through digital channels than ever before. Many are turning to chatbots to help meet increasing demand, but traditional rules-based approaches can’t keep up. Our new Smart Answers add-on to Lucidworks Fusion makes existing chatbots and virtual assistants more intelligent and more valuable to the people you serve.
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
Watch our on-demand webinar showcasing Smart Answers on Lucidworks Fusion. This technology makes existing chatbots and virtual assistants more intelligent and more valuable to the people you serve.
In this webinar, we’ll cover off:
-How search and deep learning extend conversational frameworks for improved experiences
-How Smart Answers improves customer care, call deflection, and employee self-service
-A live demo of Smart Answers for multi-channel self-service support
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
In the current climate, it’s now more important than ever to digitally enable your workforce and customers.
Hear from Simon Taylor, VP Global Partners & Alliances, Lucidworks and Matt Aslett, Research Vice President, 451 Research to get the inside scoop on how industry leaders in Europe are developing and executing their digital transformation strategies.
In this webinar, we’ll discuss:
The top challenges and aspirations European business and technology leaders are solving using AI and search technology
Which search and AI use cases are making the biggest impact in industries such as finance, healthcare, retail and energy in Europe
What technology buyers should look for when evaluating AI and search solutions
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
This document introduces Fusion 5.1 and its new capabilities for integrating with data science tools like Tensorflow, Scikit-Learn, and Spacy.
It provides an overview of Fusion's capabilities for understanding content, users, and delivering insights at scale. The document then demonstrates Fusion's Jupyter Notebook integration for reading and writing data and running SQL queries.
Finally, it shows how Fusion integrates with Seldon Core to easily deploy machine learning models with tools like Tensorflow and Scikit-Learn. A live demo is provided of deploying a custom model and using it in Fusion's query and indexing pipelines.
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
In this webinar with 451 Research, you'll understand how retailers are using AI to predict customer intent and learn which key performance metrics are used by more than 120 online retailers in Lucidworks’ 2019 Retail Benchmark Survey.
In this webinar, you’ll learn:
● What trends and opportunities are facing the ecommerce industry in 2020
● Why search is the universal path to understanding customer intent
● How large online retailers apply AI to maximize the effectiveness of their personalization efforts
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
Nordstrom Rack | Hautelook curates and serves customers a wide selection of on-trend apparel, accessories, and shoes at an everyday savings of up to 75 percent off regular prices. With over a million visitors shopping across different platforms every day, and a realization that customers have become accustomed to robust and personalized search interactions, Nordstrom Rack | Hautelook launched an initiative over a year ago to provide data science-driven digital experiences to their customers.
In this session, we’ll discuss Nordstrom Rack | Hautelook’s journey of operationalizing a hefty strategy, optimizing a fickle infrastructure, and rallying troops around a single vision of building an expansible machine-learning driven product discovery engine.
The audience will learn about:
-The key technical challenges and outcomes that come with onboarding a solution
-The lessons learned of creating and executing operational design
-The use of Lucidworks Fusion to plug custom data science models into search and browse applications to understand user intent and deliver personalized experiences
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
Knowledge graphs and machine learning are on the rise as enterprises hunt for more effective ways to connect the dots between the data and the business world. With newer technologies, the digital workplace can dramatically improve employee engagement, data-driven decisions, and actions that serve tangible business objectives.
In this webinar, you will learn
-- Introduction to knowledge graphs and where they fit in the ML landscape
-- How breakthroughs in search affect your business
-- The key features to consider when choosing a data discovery platform
-- Best practices for adopting AI-powered search, with real-world examples
Webinar: Building a Business Case for Enterprise SearchLucidworks
The document discusses building a business case for enterprise search. It notes that 85% of information is unstructured data locked in various locations and applications. Many knowledge workers spend a significant portion of their day searching across multiple systems for information. The rise of unstructured data and AI capabilities can help organizations unlock value from their information assets. Effective enterprise search powered by AI can provide real-time intelligence, personalized information, and more efficient research to help knowledge workers.
Trusted Execution Environment for Decentralized Process MiningLucaBarbaro3
Presentation of the paper "Trusted Execution Environment for Decentralized Process Mining" given during the CAiSE 2024 Conference in Cyprus on June 7, 2024.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
This presentation provides valuable insights into effective cost-saving techniques on AWS. Learn how to optimize your AWS resources by rightsizing, increasing elasticity, picking the right storage class, and choosing the best pricing model. Additionally, discover essential governance mechanisms to ensure continuous cost efficiency. Whether you are new to AWS or an experienced user, this presentation provides clear and practical tips to help you reduce your cloud costs and get the most out of your budget.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Dive into the realm of operating systems (OS) with Pravash Chandra Das, a seasoned Digital Forensic Analyst, as your guide. 🚀 This comprehensive presentation illuminates the core concepts, types, and evolution of OS, essential for understanding modern computing landscapes.
Beginning with the foundational definition, Das clarifies the pivotal role of OS as system software orchestrating hardware resources, software applications, and user interactions. Through succinct descriptions, he delineates the diverse types of OS, from single-user, single-task environments like early MS-DOS iterations, to multi-user, multi-tasking systems exemplified by modern Linux distributions.
Crucial components like the kernel and shell are dissected, highlighting their indispensable functions in resource management and user interface interaction. Das elucidates how the kernel acts as the central nervous system, orchestrating process scheduling, memory allocation, and device management. Meanwhile, the shell serves as the gateway for user commands, bridging the gap between human input and machine execution. 💻
The narrative then shifts to a captivating exploration of prominent desktop OSs, Windows, macOS, and Linux. Windows, with its globally ubiquitous presence and user-friendly interface, emerges as a cornerstone in personal computing history. macOS, lauded for its sleek design and seamless integration with Apple's ecosystem, stands as a beacon of stability and creativity. Linux, an open-source marvel, offers unparalleled flexibility and security, revolutionizing the computing landscape. 🖥️
Moving to the realm of mobile devices, Das unravels the dominance of Android and iOS. Android's open-source ethos fosters a vibrant ecosystem of customization and innovation, while iOS boasts a seamless user experience and robust security infrastructure. Meanwhile, discontinued platforms like Symbian and Palm OS evoke nostalgia for their pioneering roles in the smartphone revolution.
The journey concludes with a reflection on the ever-evolving landscape of OS, underscored by the emergence of real-time operating systems (RTOS) and the persistent quest for innovation and efficiency. As technology continues to shape our world, understanding the foundations and evolution of operating systems remains paramount. Join Pravash Chandra Das on this illuminating journey through the heart of computing. 🌟
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Elevation Query Extension: Introducing Subselects into Lucene Queries
1. STAY CONNECTED
Twitter @activate_conf
Facebook @activateconf
#Activate19
Log in to wifi, follow Activate on social media,
and download the event app where you can
submit an evaluation after the session
WIFI NETWORK: Activate2019
PASSWORD: Lucidworks
DOWNLOAD THE ACTIVATE 2019 MOBILE APP
Search Activate2019 in the App/Play store
Or visit: http://crowd.cc/activate19
4. Speaker Slide
R O B E R T
K I R C H G E S S N E R
Search Technology Architect
Wolters Kluwer
E X P E R I E N C E
• Search Algorithms Development
• Content Analysis
• Entity Recognition
• Solr plugins / extensions
• Strong software development experience for about 14 years in different commercial projects
• Last 4 years working on search expertise, particularly with Apache Solr and cloud-based solution for this
including availability and scalability.
• Customers: Wolters Kluwer, TRAFIGURA, Daadkracht...
N A Z A R S E N I U K
Lead Software Engineer
EPAM
6. Some background
• Developing search applications for legal market since 2003
• Inhomogeneous, structured content, rich metadata (laws, cases, commentaries)
• Use of metadata for ranking is essential for good results
• Up to 30% of queries contain legal / other entities
• Relying on query cooking using entity recognition in the user input
• Combining with full text search and tuning the results becomes a challenge
7. Example
User input: § 123 BGB
Transformed to queries Q1, Q2, Q3, Q4
Expected output:
• § 123 BGB (law document)
• Legal commentary A to § 123 BGB (promoted content)
• Legal commentary B to § 123 BGB (promoted content)
• Some latest cases based on § 123 BGB (relevant content)
• Full text (or whatever needed)
How to achieve?
8. Requirements
C O N T E N T S T R U C T U R E
• Handle entities in the user input properly: legal citations, locations, dates, names
– e.g. place the correct document cited in the query on the top
– given a book title place an entry document (table of contents) on the top
• Top (1-5) hits expected to be unambiguous
• Use the top slots efficiently (10-100 hits)
• Keep balance between numerous document types (legal cases) and relevant or promoted
document types
Generally more precise control of what is going on in the top 10
9. Possible solutions
• Boost factors on queries, terms, documents
• Sort fields
• Ranking functions
• Function queries
• Reranking (in Solr or application)
• Filtering
• Multiple requests
10. Works, but…
• Some are too complex
• Some are too slow
• Others are not reliable
• Missing a concept of subquery:
– tracking from which subquery a document is coming from
• Missing LIMIT as in SQL
11. Example continued
User input: § 123 BGB
Transformed to queries Q1, Q2, Q3, Q4
Expected output:
• § 123 BGB (law document)
• Legal commentary A to § 123 BGB (promoted content)
• Legal commentary B to § 123 BGB (promoted content)
• Some latest cases based on § 123 BGB (relevant content)
• Full text (or whatever needed)
Want the request look like: Q1 << Q2 << Q3 << Q4
12. Elevation query
Initial Idea / Specification
Given a list of queries Q1, Q2, …, QN produce a result fulfilling the conditions:
• All the documents of Qn are placed before the documents of Qm for m>n
• Each hit should occur in the leftmost possible subset
• No duplication of hits
• Meaningful scores
• Correct faceting
13. Elevation query
Additional requirements / expectations
• One request / one pass search
• Usable via some new syntax / parser support
• Implemented as plugin
Furthermore it should be possible to
• impose a limit on the results of each subquery
• provide a sort parameter for each subquery
16. Implementation status
• https://github.com/rokirx/solr-eq
• Working
– Collector logic / multiple queues
– Sort and limit parameter per subquery
– Parser support
• In testing
– Correct scoring
– Faceting
– Multiple sort fields per subquery
• Works with 6.4, 7.6, 8.0, 8.2
17. Case Study: Autosuggest
User Input tax
• Assumptions on the relevancy of completion:
– Highest priority if the term at the beginning and exact match, eg tax relief
– Lower priority exact match but term not at the beginnilng, eg income tax
– Lowest priority prefix match anywhere in the phrase, eg estate taxes
• Map this condition to queries:
– Term at the beginning of a phrase and exact match: ^tax$
– Exact match in the middle of a phrase: tax$
– Prefix match (edge n-gram): tax
18. Case Study: Autosuggest
User Input tax
• Resulting query: ^tax$ << tax$ << tax guarantees the specified behavior
• Additional benefit: optimize the performance by cancelling out subqueries
– If the exact hit count is not necessary
– And the minimum required number of hits in the preceeding queues is collected
– Stop fetching the docs from lower priority queue by cancelling them out of the collector/scorer
– Whitout missing out any relevant documents
19. Potential benefits
• Reduce the number of search requests
• Reduce the complexity of the architecture
• Additional dimension to control rank
• Pluggable, easy to evaluate
• Improve performance through runtime subquery cancellation
20. Summary
It is technically possible to implement a concept of subquery into Solr/Lucene
• Single request / one pass collection of results
• Individual limits on each subquery
• Individual sort parameters on each subquery
• Optimization if no total hits number needed
– cancel lower prioritized subqueries during evaluation without affecting top hits
• Plugin