Talk given by Dr. Han Xiao at LF OSS EU 2019 Summit (LF AI track). GNES source code is available at https://github.com/gnes-ai/gnes/ More information can be found in Han's blog: https://hanxiao.github.io
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algori...Han Xiao
This document introduces Fashion-MNIST, a new dataset created by Zalando Research as a drop-in replacement for the MNIST dataset for benchmarking machine learning algorithms. Fashion-MNIST consists of 60,000 training images and 10,000 test images of 10 fashion product categories, formatted similarly to MNIST. It was created to address issues with MNIST being too easy and not representative of modern computer vision tasks. The dataset has gained popularity in the machine learning community as an alternative to MNIST, with over 2,000 stars on GitHub and being supported by several machine learning libraries.
Domain Specific Languages and C++ Code GenerationOvidiu Farauanu
Florentin Picioroaga presents at C++ Meetup Iasi: Domain Specific Languages and C++ Code generation. On how to build your own DLS in Xtext. Live demo included. https://github.com/Iasi-C-CPP-Developers-Meetup/presentations-code-samples/tree/master/filorom/dsl
In this powerpoint presentation you can learn about history of python programming, Features, Strengths, Applications and careers related to the python programming and also describe what global leaders use python programming
Processing malaria HTS results using KNIME: a tutorialGreg Landrum
Walks through a couple of KNIME Workflows for working with HTS Data.
The workflows are derived from the work described in this publication: https://f1000research.com/articles/6-1136/v2
This document provides an overview of data visualization in Python. It discusses popular Python libraries and modules for visualization like Matplotlib, Seaborn, Pandas, NumPy, Plotly, and Bokeh. It also covers different types of visualization plots like bar charts, line graphs, pie charts, scatter plots, histograms and how to create them in Python using the mentioned libraries. The document is divided into sections on visualization libraries, version overview of updates to plots, and examples of various plot types created in Python.
Getting Productive my Journey with Grakn and GraqlVaticle
Over many weeks and months, I have been learning about Grakn Labs and the two core components i.e. Grakn and Graql.
As a polyglot developer, specialising in Java/JVM tech, I have explored GraalVM, an enhanced JVM with runtime performance being its primary goal.
My first instinct was to run Grakn on GraalVM and then run GraknLabs’ benchmark suite to measure runtime performance against both the traditional JVM and GraalVM.
We will see how we can utilize and speak Graql just like we would be speaking to another human. We will explore English-to-Graql and vice-versa, using natural language to communicate with this novel graph engine called Grakn (this all started from a meetup and a few Github issues).
Nuxeo Semantic ECM: from Scribo and Stanbol to valuable applicationsNuxeo
Work on integrating semantic technologies developed in several R&D projects is now progressing at full speed. Expect to see creative new uses of semantic technologies in Nuxeo open source content management products in 2011!
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algori...Han Xiao
This document introduces Fashion-MNIST, a new dataset created by Zalando Research as a drop-in replacement for the MNIST dataset for benchmarking machine learning algorithms. Fashion-MNIST consists of 60,000 training images and 10,000 test images of 10 fashion product categories, formatted similarly to MNIST. It was created to address issues with MNIST being too easy and not representative of modern computer vision tasks. The dataset has gained popularity in the machine learning community as an alternative to MNIST, with over 2,000 stars on GitHub and being supported by several machine learning libraries.
Domain Specific Languages and C++ Code GenerationOvidiu Farauanu
Florentin Picioroaga presents at C++ Meetup Iasi: Domain Specific Languages and C++ Code generation. On how to build your own DLS in Xtext. Live demo included. https://github.com/Iasi-C-CPP-Developers-Meetup/presentations-code-samples/tree/master/filorom/dsl
In this powerpoint presentation you can learn about history of python programming, Features, Strengths, Applications and careers related to the python programming and also describe what global leaders use python programming
Processing malaria HTS results using KNIME: a tutorialGreg Landrum
Walks through a couple of KNIME Workflows for working with HTS Data.
The workflows are derived from the work described in this publication: https://f1000research.com/articles/6-1136/v2
This document provides an overview of data visualization in Python. It discusses popular Python libraries and modules for visualization like Matplotlib, Seaborn, Pandas, NumPy, Plotly, and Bokeh. It also covers different types of visualization plots like bar charts, line graphs, pie charts, scatter plots, histograms and how to create them in Python using the mentioned libraries. The document is divided into sections on visualization libraries, version overview of updates to plots, and examples of various plot types created in Python.
Getting Productive my Journey with Grakn and GraqlVaticle
Over many weeks and months, I have been learning about Grakn Labs and the two core components i.e. Grakn and Graql.
As a polyglot developer, specialising in Java/JVM tech, I have explored GraalVM, an enhanced JVM with runtime performance being its primary goal.
My first instinct was to run Grakn on GraalVM and then run GraknLabs’ benchmark suite to measure runtime performance against both the traditional JVM and GraalVM.
We will see how we can utilize and speak Graql just like we would be speaking to another human. We will explore English-to-Graql and vice-versa, using natural language to communicate with this novel graph engine called Grakn (this all started from a meetup and a few Github issues).
Nuxeo Semantic ECM: from Scribo and Stanbol to valuable applicationsNuxeo
Work on integrating semantic technologies developed in several R&D projects is now progressing at full speed. Expect to see creative new uses of semantic technologies in Nuxeo open source content management products in 2011!
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsStijn Decubber
Slides from the TensorFlow meetup hosted on October 9th at the ML6 offices in Ghent. Join our Meetup group for updates and future sessions: https://www.meetup.com/TensorFlow-Belgium/
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/may-2018-embedded-vision-summit-trevett
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Neil Trevett, President of the Khronos Group and Vice President at NVIDIA, presents the "APIs for Accelerating Vision and Inferencing: Options and Trade-offs" tutorial at the May 2018 Embedded Vision Summit.
The landscape of SDKs, APIs and file formats for accelerating inferencing and vision applications continues to rapidly evolve. Low-level compute APIs, such as OpenCL, Vulkan and CUDA are being used to accelerate inferencing engines such as OpenVX, CoreML, NNAPI and TensorRT. Inferencing engines are being fed via neural network file formats such as NNEF and ONNX. Some of these APIs, like OpenCV, are vision-specific, while others, like OpenCL, are general-purpose. Some engines, like CoreML and TensorRT, are supplier-specific, while others, such as OpenVX, are open standards that any supplier can adopt. Which ones should you use for your project?
In this presentation, Trevett presents the current landscape of APIs, file formats and SDKs for inferencing and vision acceleration, explaining where each one fits in the development flow. Trevett also highlights where these APIs overlap and where they complement each other, and previews some of the latest developments in these APIs.
The document provides details on Lenovo's new distributed storage solution for SAP HANA based on SUSE Enterprise Storage (SES). It discusses the architecture, which includes Lenovo hardware, SUSE Enterprise Storage software, and configuration files. The solution offers various performance levels depending on hardware configuration. Lenovo performs testing and guarantees required SAP HANA performance. The solution provides benefits over traditional storage such as lower TCO, no vendor lock-in, and support from Lenovo and SUSE.
This document contains an agenda for a webinar on the age of language models in NLP. The agenda includes discussions on word embeddings, sequence modeling, advanced language models like BERT and Transformers, attention mechanisms, and case studies. It also provides overviews of Tyrone systems' high-performance AI platform using NVIDIA A100 GPUs and its Tyrone Kubyts technology for revolutionizing deep learning environments.
Redfish and python-redfish for Software Defined InfrastructureBruno Cornec
How the new Redfish protocol will help achieving the promises of a Software Defined Infrastructure, and which new projects are needed such as python-redfish and Alexandria to support it
This document provides an overview and summary of Red Hat Storage and Inktank Ceph. It discusses Red Hat acquiring Inktank Ceph in April 2014 and the future of Red Hat Storage having two flavors - Gluster edition and Ceph edition. Key features of Red Hat Storage 3.0 include enhanced data protection with snapshots, cluster monitoring, and deep Hadoop integration. The document also introduces Inktank Ceph Enterprise v1.2 and discusses Ceph components like RADOS, LIBRADOS, RBD, RGW and how Ceph can be used with OpenStack.
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloLinaro
Short
The growing amount of data captured by sensors and the real time constraints imply that not only big data analytics but also Machine Learning (ML) inference shall be executed at the edge. The multiple options for neural network acceleration in Arm-based platforms provide an unprecedented opportunity for new intelligent devices. It also raises the risk of fragmentation and duplication of efforts when multiple frameworks shall support multiple accelerators.
Andrea Gallo, Linaro VP of Segment Groups, will summarise the existing NN frameworks, accelerator solutions, and will describe the efforts underway in the Arm ecosystem.
Abstract
The dramatically growing amount of data captured by sensors and the ever more stringent requirements for latency and real time constraints are paving the way for edge computing, and this implies that not only big data analytics but also Machine Learning (ML) inference shall be executed at the edge. The multiple options for neural network acceleration in recent Arm-based platforms provides an unprecedented opportunity for new intelligent devices with ML inference. It also raises the risk of fragmentation and duplication of efforts when multiple frameworks shall support multiple accelerators.
Andrea Gallo, Linaro VP of Segment Groups, will summarise the existing NN frameworks, model description formats, accelerator solutions, low cost development boards and will describe the efforts underway to identify the best technologies to improve the consolidation and enable the competitive innovative advantage from all vendors.
Audience
The session will be useful for executives to engineers. Executives will gain a deeper understanding of the issues and opportunities. Engineers at NN acceleration IP design houses will take away ideas for how to collaborate in the open source community on their area of expertise, how to evaluate the performance and accelerate multiple NN frameworks without modifying them for each new IP, whether it be targeting edge computing gateways, smart devices or simple microcontrollers.
Benefits to the Ecosystem
The AI deep learning neural network ecosystem is starting just now and it has similar implications with open source as GPU and video accelerators had in the early days with user space drivers, binary blobs, proprietary APIs and all possible ways to protect their IPs. The session will outline a proposal for a collaborative ecosystem effort to create a common framework to manage multiple NN accelerators while at the same time avoiding to modify deep learning frameworks with multiple forks.
The most hated thing a developer can imagine is writing documentation. But on the other hand nothing can compare with a well sorted documentation, in case you want to change or extend something or just want to get into the topic again. We all know, there is no major way how to do documentation, but there a number of principles and todos which makes it much easier for you. This talk is not about tools, like phpDocumentor, nor is it about promoting a special way of documentation. It is about some of the thoughts you should have gone through, before and when writing documentation.
The Nuxeo team is involved in 3 different cooperative R&D projects, as well as other internal or collaborative endeavors, which all aim at expanding the scope, performance and ease of use of the Nuxeo Platform, and to keep up with the ever changing needs of ECM in the face of Enterprise 2.0.
Transforming Application Delivery with PaaS and Linux ContainersGiovanni Galloro
This document discusses Red Hat OpenShift Enterprise and how it helps with application delivery using Platform as a Service (PaaS) and Linux containers. It covers OpenShift's architecture using Linux containers, Docker, Kubernetes, and RHEL Atomic Host. It also discusses OpenShift's application deployment flow, adoption trends, and challenges with container adoption as well as Red Hat's strategy to address these challenges through container certification and simplifying adoption for partners.
final proposal-Implement and create new documentation toolchainParamkusham Shruthi
The document outlines a proposal to implement a new documentation tool chain for CentOS. The tool chain would make it easier for contributors to submit short-form documentation articles and push them to relevant upstream projects. It would synchronize content between git.centos.org and GitHub, support common markup formats, convert formats, tag content by upstream project, and include documentation on using the system. The proposal includes an implementation plan and timeline spread over 12 weeks.
substrate: A framework to efficiently build blockchainsservicesNitor
Substrate is an open and interoperable blockchain framework that helps developers focus on the business logic of the chain and easily build multiple blockchains. Read our blog to learn more about it.
Spark Pipelines in the Cloud with Alluxio with Gene PangSpark Summit
Organizations commonly use Apache Spark to gain actionable insight from their large amounts of data. Often, these analytics are in the form of data processing pipelines, where there are a series of processing stages, and each stage performs a particular function, and the output of one stage is the input of the next stage. There are several examples of pipelines, such as log processing, IoT pipelines, and machine learning. The common attribute among different pipelines is the sharing of data between stages. It is also common for Spark pipelines to process data stored in the public cloud, such as Amazon S3, Microsoft Azure Blob Storage, or Google Cloud Storage. The global availability and cost effectiveness of these public cloud storage services make them the preferred storage for data. However, running pipeline jobs while sharing data via cloud storage can be expensive in terms of increased network traffic, and slower data sharing and job completion times. Using Alluxio, a memory speed virtual distributed storage system, enables sharing data between different stages or jobs at memory speed. By reading and writing data in Alluxio, the data can stay in memory for the next stage of the pipeline, and this result in great performance gains. In this talk, we discuss how Alluxio can be deployed and used with a Spark data processing pipeline in the cloud. We show how pipeline stages can share data with Alluxio memory for improved performance benefits, and how Alluxio can improves completion times and reduces performance variability for Spark pipelines in the cloud.
Comparing IaaS: VMware vs OpenStack vs Google’s GanetiGiuseppe Paterno'
No matter if you are a lonely system administrator or the CTO of the largest carrier in the World, getting to know what’s out there is a jungle. Is VMware still the lead? I’ve heard about OpenStack, how mature is that? And what this “Ganeti” I’ve never heard of?
Well, here I am. Guess what, you’re not the only one asking these questions. I traveled most of Europe hearing world’s most famous enterprises, banks and telcos and also in contact with many vendors’ labs, from San Francisco to Munich.
In this presentation I just wish to give a quick overview of the state-of-the-art in the IaaS and virtualization world. This is not a sales or marketing presentation: no vaporware, just pure and real experience from the field.
Enjoy the slides and stay tuned on my twitter channel on @gpaterno
This slideshow gives feedback about using Linux in industrial projects. It is part of a conference held by our company CIO Informatique Industrielle at ERTS 2008, the European Embedded Real Time software Congress in Toulouse
Spark Pipelines in the Cloud with AlluxioAlluxio, Inc.
Spark Pipelines in the Cloud with Alluxio
Gene Pang presented on using Alluxio to improve the performance of data pipelines running on Spark in the cloud. Alluxio provides an in-memory filesystem that allows data to be shared between stages of a pipeline faster than using cloud storage. Experiments show Alluxio can provide over 9x speedup for a log analysis pipeline in AWS compared to using just S3 storage. Alluxio's fast durable writes feature writes data synchronously to memory and asynchronously to storage, improving write performance without sacrificing fault tolerance.
Plebeia, a new storage for Tezos blockchain stateJun Furuse
Plebeia is a new storage for Tezos blockchain state based on a binary Merkle Patricia trie. It provides compact Merkle proofs and an append-only storage design. Plebeia trees can be updated functionally using a zipper-based cursor and stored in a fixed-size cell format on disk for efficient reads and writes. Integration of Plebeia into the Tezos node is ongoing to validate its correctness and improve storage and processing performance over the current context system.
Conf42-Python-Building Apache NiFi 2.0 Python Processors
https://www.conf42.com/Python_2024_Tim_Spann_apache_nifi_2_processors
Building Apache NiFi 2.0 Python Processors
Abstract
Let’s enhance real-time streaming pipelines with smart Python code. Adding code for vector databases and LLM.
Summary
Tim Spann: I'm going to be talking today, be building Apache 9520 Python processors. One of the main purposes of supporting Python in the streaming tool Apache Nifi is to interface with new machine learning and AI and Gen AI. He says Python is a real game changer for Cloudera.
You're just going to add some metadata around it. It's a great way to pass a file along without changing it too substantially. We really need you to have Python 310 and again JDK 21 on your machine. You got to be smart about how you use these models.
There are a ton of python processors available. You can use them in multiple ways. We're still in the early world of Python processors, so now's the time to start putting yours out there. Love to see a lot of people write their own.
When we are parsing documents here, again, this is the Python one I'm picking PDF. Lots of different things you could do. If you're interested on writing your own python code for Apache Nifi, definitely reach out and thank.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Weitere ähnliche Inhalte
Ähnlich wie GNES is Generic Neural Elastic Search (OSSEU19 Lyon)
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsStijn Decubber
Slides from the TensorFlow meetup hosted on October 9th at the ML6 offices in Ghent. Join our Meetup group for updates and future sessions: https://www.meetup.com/TensorFlow-Belgium/
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/may-2018-embedded-vision-summit-trevett
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Neil Trevett, President of the Khronos Group and Vice President at NVIDIA, presents the "APIs for Accelerating Vision and Inferencing: Options and Trade-offs" tutorial at the May 2018 Embedded Vision Summit.
The landscape of SDKs, APIs and file formats for accelerating inferencing and vision applications continues to rapidly evolve. Low-level compute APIs, such as OpenCL, Vulkan and CUDA are being used to accelerate inferencing engines such as OpenVX, CoreML, NNAPI and TensorRT. Inferencing engines are being fed via neural network file formats such as NNEF and ONNX. Some of these APIs, like OpenCV, are vision-specific, while others, like OpenCL, are general-purpose. Some engines, like CoreML and TensorRT, are supplier-specific, while others, such as OpenVX, are open standards that any supplier can adopt. Which ones should you use for your project?
In this presentation, Trevett presents the current landscape of APIs, file formats and SDKs for inferencing and vision acceleration, explaining where each one fits in the development flow. Trevett also highlights where these APIs overlap and where they complement each other, and previews some of the latest developments in these APIs.
The document provides details on Lenovo's new distributed storage solution for SAP HANA based on SUSE Enterprise Storage (SES). It discusses the architecture, which includes Lenovo hardware, SUSE Enterprise Storage software, and configuration files. The solution offers various performance levels depending on hardware configuration. Lenovo performs testing and guarantees required SAP HANA performance. The solution provides benefits over traditional storage such as lower TCO, no vendor lock-in, and support from Lenovo and SUSE.
This document contains an agenda for a webinar on the age of language models in NLP. The agenda includes discussions on word embeddings, sequence modeling, advanced language models like BERT and Transformers, attention mechanisms, and case studies. It also provides overviews of Tyrone systems' high-performance AI platform using NVIDIA A100 GPUs and its Tyrone Kubyts technology for revolutionizing deep learning environments.
Redfish and python-redfish for Software Defined InfrastructureBruno Cornec
How the new Redfish protocol will help achieving the promises of a Software Defined Infrastructure, and which new projects are needed such as python-redfish and Alexandria to support it
This document provides an overview and summary of Red Hat Storage and Inktank Ceph. It discusses Red Hat acquiring Inktank Ceph in April 2014 and the future of Red Hat Storage having two flavors - Gluster edition and Ceph edition. Key features of Red Hat Storage 3.0 include enhanced data protection with snapshots, cluster monitoring, and deep Hadoop integration. The document also introduces Inktank Ceph Enterprise v1.2 and discusses Ceph components like RADOS, LIBRADOS, RBD, RGW and how Ceph can be used with OpenStack.
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloLinaro
Short
The growing amount of data captured by sensors and the real time constraints imply that not only big data analytics but also Machine Learning (ML) inference shall be executed at the edge. The multiple options for neural network acceleration in Arm-based platforms provide an unprecedented opportunity for new intelligent devices. It also raises the risk of fragmentation and duplication of efforts when multiple frameworks shall support multiple accelerators.
Andrea Gallo, Linaro VP of Segment Groups, will summarise the existing NN frameworks, accelerator solutions, and will describe the efforts underway in the Arm ecosystem.
Abstract
The dramatically growing amount of data captured by sensors and the ever more stringent requirements for latency and real time constraints are paving the way for edge computing, and this implies that not only big data analytics but also Machine Learning (ML) inference shall be executed at the edge. The multiple options for neural network acceleration in recent Arm-based platforms provides an unprecedented opportunity for new intelligent devices with ML inference. It also raises the risk of fragmentation and duplication of efforts when multiple frameworks shall support multiple accelerators.
Andrea Gallo, Linaro VP of Segment Groups, will summarise the existing NN frameworks, model description formats, accelerator solutions, low cost development boards and will describe the efforts underway to identify the best technologies to improve the consolidation and enable the competitive innovative advantage from all vendors.
Audience
The session will be useful for executives to engineers. Executives will gain a deeper understanding of the issues and opportunities. Engineers at NN acceleration IP design houses will take away ideas for how to collaborate in the open source community on their area of expertise, how to evaluate the performance and accelerate multiple NN frameworks without modifying them for each new IP, whether it be targeting edge computing gateways, smart devices or simple microcontrollers.
Benefits to the Ecosystem
The AI deep learning neural network ecosystem is starting just now and it has similar implications with open source as GPU and video accelerators had in the early days with user space drivers, binary blobs, proprietary APIs and all possible ways to protect their IPs. The session will outline a proposal for a collaborative ecosystem effort to create a common framework to manage multiple NN accelerators while at the same time avoiding to modify deep learning frameworks with multiple forks.
The most hated thing a developer can imagine is writing documentation. But on the other hand nothing can compare with a well sorted documentation, in case you want to change or extend something or just want to get into the topic again. We all know, there is no major way how to do documentation, but there a number of principles and todos which makes it much easier for you. This talk is not about tools, like phpDocumentor, nor is it about promoting a special way of documentation. It is about some of the thoughts you should have gone through, before and when writing documentation.
The Nuxeo team is involved in 3 different cooperative R&D projects, as well as other internal or collaborative endeavors, which all aim at expanding the scope, performance and ease of use of the Nuxeo Platform, and to keep up with the ever changing needs of ECM in the face of Enterprise 2.0.
Transforming Application Delivery with PaaS and Linux ContainersGiovanni Galloro
This document discusses Red Hat OpenShift Enterprise and how it helps with application delivery using Platform as a Service (PaaS) and Linux containers. It covers OpenShift's architecture using Linux containers, Docker, Kubernetes, and RHEL Atomic Host. It also discusses OpenShift's application deployment flow, adoption trends, and challenges with container adoption as well as Red Hat's strategy to address these challenges through container certification and simplifying adoption for partners.
final proposal-Implement and create new documentation toolchainParamkusham Shruthi
The document outlines a proposal to implement a new documentation tool chain for CentOS. The tool chain would make it easier for contributors to submit short-form documentation articles and push them to relevant upstream projects. It would synchronize content between git.centos.org and GitHub, support common markup formats, convert formats, tag content by upstream project, and include documentation on using the system. The proposal includes an implementation plan and timeline spread over 12 weeks.
substrate: A framework to efficiently build blockchainsservicesNitor
Substrate is an open and interoperable blockchain framework that helps developers focus on the business logic of the chain and easily build multiple blockchains. Read our blog to learn more about it.
Spark Pipelines in the Cloud with Alluxio with Gene PangSpark Summit
Organizations commonly use Apache Spark to gain actionable insight from their large amounts of data. Often, these analytics are in the form of data processing pipelines, where there are a series of processing stages, and each stage performs a particular function, and the output of one stage is the input of the next stage. There are several examples of pipelines, such as log processing, IoT pipelines, and machine learning. The common attribute among different pipelines is the sharing of data between stages. It is also common for Spark pipelines to process data stored in the public cloud, such as Amazon S3, Microsoft Azure Blob Storage, or Google Cloud Storage. The global availability and cost effectiveness of these public cloud storage services make them the preferred storage for data. However, running pipeline jobs while sharing data via cloud storage can be expensive in terms of increased network traffic, and slower data sharing and job completion times. Using Alluxio, a memory speed virtual distributed storage system, enables sharing data between different stages or jobs at memory speed. By reading and writing data in Alluxio, the data can stay in memory for the next stage of the pipeline, and this result in great performance gains. In this talk, we discuss how Alluxio can be deployed and used with a Spark data processing pipeline in the cloud. We show how pipeline stages can share data with Alluxio memory for improved performance benefits, and how Alluxio can improves completion times and reduces performance variability for Spark pipelines in the cloud.
Comparing IaaS: VMware vs OpenStack vs Google’s GanetiGiuseppe Paterno'
No matter if you are a lonely system administrator or the CTO of the largest carrier in the World, getting to know what’s out there is a jungle. Is VMware still the lead? I’ve heard about OpenStack, how mature is that? And what this “Ganeti” I’ve never heard of?
Well, here I am. Guess what, you’re not the only one asking these questions. I traveled most of Europe hearing world’s most famous enterprises, banks and telcos and also in contact with many vendors’ labs, from San Francisco to Munich.
In this presentation I just wish to give a quick overview of the state-of-the-art in the IaaS and virtualization world. This is not a sales or marketing presentation: no vaporware, just pure and real experience from the field.
Enjoy the slides and stay tuned on my twitter channel on @gpaterno
This slideshow gives feedback about using Linux in industrial projects. It is part of a conference held by our company CIO Informatique Industrielle at ERTS 2008, the European Embedded Real Time software Congress in Toulouse
Spark Pipelines in the Cloud with AlluxioAlluxio, Inc.
Spark Pipelines in the Cloud with Alluxio
Gene Pang presented on using Alluxio to improve the performance of data pipelines running on Spark in the cloud. Alluxio provides an in-memory filesystem that allows data to be shared between stages of a pipeline faster than using cloud storage. Experiments show Alluxio can provide over 9x speedup for a log analysis pipeline in AWS compared to using just S3 storage. Alluxio's fast durable writes feature writes data synchronously to memory and asynchronously to storage, improving write performance without sacrificing fault tolerance.
Plebeia, a new storage for Tezos blockchain stateJun Furuse
Plebeia is a new storage for Tezos blockchain state based on a binary Merkle Patricia trie. It provides compact Merkle proofs and an append-only storage design. Plebeia trees can be updated functionally using a zipper-based cursor and stored in a fixed-size cell format on disk for efficient reads and writes. Integration of Plebeia into the Tezos node is ongoing to validate its correctness and improve storage and processing performance over the current context system.
Conf42-Python-Building Apache NiFi 2.0 Python Processors
https://www.conf42.com/Python_2024_Tim_Spann_apache_nifi_2_processors
Building Apache NiFi 2.0 Python Processors
Abstract
Let’s enhance real-time streaming pipelines with smart Python code. Adding code for vector databases and LLM.
Summary
Tim Spann: I'm going to be talking today, be building Apache 9520 Python processors. One of the main purposes of supporting Python in the streaming tool Apache Nifi is to interface with new machine learning and AI and Gen AI. He says Python is a real game changer for Cloudera.
You're just going to add some metadata around it. It's a great way to pass a file along without changing it too substantially. We really need you to have Python 310 and again JDK 21 on your machine. You got to be smart about how you use these models.
There are a ton of python processors available. You can use them in multiple ways. We're still in the early world of Python processors, so now's the time to start putting yours out there. Love to see a lot of people write their own.
When we are parsing documents here, again, this is the Python one I'm picking PDF. Lots of different things you could do. If you're interested on writing your own python code for Apache Nifi, definitely reach out and thank.
Ähnlich wie GNES is Generic Neural Elastic Search (OSSEU19 Lyon) (20)
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Tatiana Kojar
Skybuffer AI, built on the robust SAP Business Technology Platform (SAP BTP), is the latest and most advanced version of our AI development, reaffirming our commitment to delivering top-tier AI solutions. Skybuffer AI harnesses all the innovative capabilities of the SAP BTP in the AI domain, from Conversational AI to cutting-edge Generative AI and Retrieval-Augmented Generation (RAG). It also helps SAP customers safeguard their investments into SAP Conversational AI and ensure a seamless, one-click transition to SAP Business AI.
With Skybuffer AI, various AI models can be integrated into a single communication channel such as Microsoft Teams. This integration empowers business users with insights drawn from SAP backend systems, enterprise documents, and the expansive knowledge of Generative AI. And the best part of it is that it is all managed through our intuitive no-code Action Server interface, requiring no extensive coding knowledge and making the advanced AI accessible to more users.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on automated letter generation for Bonterra Impact Management using Google Workspace or Microsoft 365.
Interested in deploying letter generation automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Trusted Execution Environment for Decentralized Process MiningLucaBarbaro3
Presentation of the paper "Trusted Execution Environment for Decentralized Process Mining" given during the CAiSE 2024 Conference in Cyprus on June 7, 2024.
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
Dive into the realm of operating systems (OS) with Pravash Chandra Das, a seasoned Digital Forensic Analyst, as your guide. 🚀 This comprehensive presentation illuminates the core concepts, types, and evolution of OS, essential for understanding modern computing landscapes.
Beginning with the foundational definition, Das clarifies the pivotal role of OS as system software orchestrating hardware resources, software applications, and user interactions. Through succinct descriptions, he delineates the diverse types of OS, from single-user, single-task environments like early MS-DOS iterations, to multi-user, multi-tasking systems exemplified by modern Linux distributions.
Crucial components like the kernel and shell are dissected, highlighting their indispensable functions in resource management and user interface interaction. Das elucidates how the kernel acts as the central nervous system, orchestrating process scheduling, memory allocation, and device management. Meanwhile, the shell serves as the gateway for user commands, bridging the gap between human input and machine execution. 💻
The narrative then shifts to a captivating exploration of prominent desktop OSs, Windows, macOS, and Linux. Windows, with its globally ubiquitous presence and user-friendly interface, emerges as a cornerstone in personal computing history. macOS, lauded for its sleek design and seamless integration with Apple's ecosystem, stands as a beacon of stability and creativity. Linux, an open-source marvel, offers unparalleled flexibility and security, revolutionizing the computing landscape. 🖥️
Moving to the realm of mobile devices, Das unravels the dominance of Android and iOS. Android's open-source ethos fosters a vibrant ecosystem of customization and innovation, while iOS boasts a seamless user experience and robust security infrastructure. Meanwhile, discontinued platforms like Symbian and Palm OS evoke nostalgia for their pioneering roles in the smartphone revolution.
The journey concludes with a reflection on the ever-evolving landscape of OS, underscored by the emergence of real-time operating systems (RTOS) and the persistent quest for innovation and efficiency. As technology continues to shape our world, understanding the foundations and evolution of operating systems remains paramount. Join Pravash Chandra Das on this illuminating journey through the heart of computing. 🌟
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
4. hxiao87 hanxiao @hxiao
GNES is Generic Neural Elastic Search
4
GNES [jee-nes] is a cloud-native semantic search system based on deep neural networks.
GNES enables large-scale index and semantic search for text-to-text, image-to-image,
video-to-video and any-to-any content form.
Cloud-Native
Semantic Search based
on DNN
End2End Generic
Solution
9. hxiao87 hanxiao @hxiao
Preliminaries: Neural, Elastic, and Search
9
Find semantic similar text in large database
■ Microservices are a software development technique—a variant of the
service-oriented architecture (SOA) architectural style that structures an
application as a collection of loosely coupled services.
■ Microservices - also known as the microservice architecture - is an architectural
style that structures an application as a collection of services that are.
■ Microservices architecture is a term used to describe the practice of breaking up
an application into a series of smaller, more specialised parts, each of which
communicate with one another across common interfaces such as APIs and REST
interfaces like HTTP.
10. hxiao87 hanxiao @hxiao
Preliminaries: Neural, Elastic, and Search
10
Find semantic similar text in a large database
How to quantize the semantics?
Does it work on super-long
/short document as well?
How to define similarity?
How to store the semantics?
Vector representation of the doc
Vector indexing, e.g. Faiss
Distance metrics (L2, Hamming, etc)
Segment long document into sentences
State-of-the-art NLP model
Faster, lighter and
distributed database
Domain/app-specific
preprocessing
11. hxiao87 hanxiao @hxiao
Preliminaries: Neural, Elastic, and Search
11
Find semantic similar text, image, video in a large database
How to quantize the semantics?
Does it work on large/small
image, long/short video as
well?
How to define similarity?
How to store the semantics?
Vector representation of the image/video
Vector indexing, e.g. Faiss
Distance metrics (L2, Hamming, etc)
Segment image/video into patches
State-of-the-art CV model
Faster, lighter and
distributed database
Domain/app-specific
preprocessing
12. hxiao87 hanxiao @hxiao
Preliminaries: Neural, Elastic, and Search
12
Find semantic similar text, image, video in a large database
How to quantize the semantics?
Does it work on large/small
image, long/short video as
well?
How to define similarity?
How to store the semantics?
Vector representation of the doc
Vector indexing, e.g. Faiss
Distance metrics (L2, Hamming, etc)
Segment image/video into patches
State-of-the-art CV model
Faster, lighter and
distributed database
Domain/app-specific
preprocessing
Encoder
Indexer
Preprocessor
13. A good neural search is only possible
when document and query are comparable semantic units.
17. hxiao87 hanxiao @hxiao
Runtime in GNES
17
Typical ML system
Typical search system
GNES
Train Inference
QueryIndex
Train Index Query
Train-time is not for everyone. Most users will just use our
pretrained model from GNES Hub.
Train
Runtime
20. hxiao87 hanxiao @hxiao
Four fundamental microservices
To summarize, we have four fundamental components in GNES:
- Preprocessor: transforming a real-world object to a list of workable semantic units, aka
chunk;
- Encoder: representing chunks with vector representation;
- Indexer: storing the vectors into memory/disk that allows fast-access;
- Router: forwarding messages between microservices: e.g. batching, mapping, reducing
20
Train Index Query
21. Which (microservice) does what (logic) at when (runtime)
Understanding how GNES works is basically to know
and design the corresponding workflow
24. hxiao87 hanxiao @hxiao 24
Highlights
Cloud-native: GNES is all-in-microservice: encoder, indexer, preprocessor and router are all
running statelessly and independently in their own containers. Scaling, load-balancing, automated
recovering, they come off-the-shelf in GNES.
Encoder IndexerPreprocessor
Monolith: everything coupled in one process Encoder
Indexer
Preprocessor
GNES microservice architecture
30. hxiao87 hanxiao @hxiao
With vs. Without Code/Model Separation
30
Change on model
Encoder
Update encoder.py Rebuild the project
into package
Online
Deploy the new package
Change on model
Encoder
YAML
Update YAML
Online
Old version
Offline the old version
Serve users
Rollout and serve users
✓ Immutable codebase
✓ Minimum rollout time
✓ Version-controlled model
✓ Ease AB test, side-by-side
comparison
32. hxiao87 hanxiao @hxiao
Challenges to AI OSS maintainers
32
What is the most sustainable way to incorporate latest NLP/CV/AI model into a framework?
As the developer of bert-as-service (one of the most popular AI OSS in 2018), I was often asked
by the community
“Han, can you support model X and make it X-as-a-service?”
33. hxiao87 hanxiao @hxiao
Challenges to AI OSS maintainers
Popular design philosophy of an AI framework:
- Rewrite the code and claim it better than the original one
- Wrap the code (e.g. C-> Python) and provide better interface
33
bert-as-service
34. hxiao87 hanxiao @hxiao
Not sustainable because you can't match the speed of AI
34
AI development nowadays
You as a OSS maintainer
35. hxiao87 hanxiao @hxiao
Not sustainable because you can't handle the dependencies
- dependencies: packages or libraries required to run the algorithm,
e.g. ffmpeg, libcuda, tensorflow;
- codes: the implementation of the logic, can be written in Python, C,
Java, Scala with the help of Tensorflow, Pytorch, etc;
- a small config file: the arguments abstracted from the logic for
better flexibility during training and inference. For example,
batch_size, index_strategy, and model_path;
- big data files: the serialization of the model structure the learned
parameters, e.g. a pretrained VGG/BERT model.
35
Four pieces required to
run an AI model
42. Demo: Build a Poem Semantic Search
https://github.com/gnes-ai/demo-poems-ir
42
43. hxiao87 hanxiao @hxiao
Demo: Build a Poem Semantic Search
43
Steps:
1. Define the workflow
a. What microserivces do I need
b. How should they connect with each other
2. Specify each microservice
a. Yaml config
b. additional python files/ Dockerfile
Preprocessor EncoderVector-Indexer Doc-IndexerRouter
Encoder
45. hxiao87 hanxiao @hxiao
Specify each component
All microservice start with a base image: gnes/gnes:latest-alpine
Encoder design:
- use Pytorch-transformer
- need GPU and cuda support
- need to download pretrained model in advance
- config the encoder and pooling strategy
45
56. hxiao87 hanxiao @hxiao
GNES Flow to GNES is Keras to Tensorflow
56
Motivation
- a readable and brief idiom to define pipelines: index, query, train, etc.
- make GNES easier to debug locally
57. hxiao87 hanxiao @hxiao
GNES Flow highlights
- chain multiple add() functions to build a pipeline;
- use self-defined names instead of ports to a service;
- modify a pipeline’s component via set();
- run a pipeline on multiple orchestration layers, e.g. multi-thread, multi-process,
Docker Swarm, Kubernetes;
- serialize/deserialize a pipeline to/from a binary file, a SVG image, Docker
Swarm/Kubernetes config files.
57
61. hxiao87 hanxiao @hxiao
Use the index flow
with flow.build(backend='process') as f:
f.index(txt_file='poems.txt', batch_size=20)
with flow.build(backend='swarm') as f:
f.index(bytes_gen=read_flowers(), batch_size=64)
61
66. 66
GNES is …
- Cloud-native, all-in-microservice
- Generic semantic search solution
using DNN
- Elastic workflow optimized and
tailored for search scenarios
- A different mindset for building
sustainable AI OSS
- Grow with the community
GNES is NOT ...
- Yet another collection of AI
algorithms
- a generic framework for doing every
ML task (e.g. clustering)
68. hxiao87 hanxiao @hxiao
GNES resources
68
Version: v0.0.46
Github: https://github.com/gnes-ai/gnes
5 direct contributors from Tencent
2 community contributors
Homepage: https://gnes.ai
Docs: https://doc.gnes.ai
PyPI: https://pypi.org/project/gnes
Docker Hub: https://cloud.docker.com/u/gnes/repository/docker/gnes/gnes
GNES Board: https://board.gnes.ai
Blog: https://hanxiao.github.io
Call for more contributors!
69. hxiao87 hanxiao @hxiao
My other opensource projects
69
263 6736 1407
Fashion-MNIST
Most popular AI open-source project of 2017 (0.3% chance)
Google Scholar > 1300 publications
165 5808 1174
bert-as-service
Most popular AI open-source project of 2018 (0.22% chance)