Using Derivation-Free Optimization Methods in the Hadoop Cluster with TerasortAnhanguera Educacional S/A
We implemented MapReduce cluster benchmark TeraSort by derivative free optimization (DFO) method having runtime function object. In this, every iteration of DFO method uses new values for Hadoop parameter configuration. These parameters are specified within the framework, we used Chef server and client tool which assists in this cluster configuration to ensure proper implementation of TeraSort application. The Chef server acts as a hub for configuration data...
Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNetAI Frontiers
In this talk at AI Frontiers Conference, Alex Smola gives a brief overview over the features used to scale deep learning using MXNet. It relies on a mix between declarative and imperative programming to achieve efficiency while also allowing for significant flexibility for the user. It relies on a distributed (key, value) store for synchronization between GPUs and between machines. It also relies on the separation between a highly efficient execution engine and language bindings to achieve a high degree of flexibility between different languages while offering a native feel in each of them. Alex also briefly discusses how Amazon AWS can help deploy deep learning models and outline steps on our future roadmap.
Deep learning continues to push the state of the art in domains such as computer vision, natural language understanding and recommendation engines. One of the key reasons for this progress is the availability of highly flexible and developer friendly deep learning frameworks. During this workshop, members of the Amazon Machine Learning team will provide a short background on Deep Learning focusing on relevant application domains and an introduction to using the powerful and scalable Deep Learning framework, MXNet. At the end of this tutorial you’ll gain hands on experience targeting a variety of applications including computer vision and recommendation engines as well as exposure to how to use preconfigured Deep Learning AMIs and CloudFormation Templates to help speed your development.
CIFAR-10 for DAWNBench: Wide ResNets, Mixup Augmentation and "Super Convergen...Thom Lane
Summary of models and methods used for DAWNBench CIFAR-10 Challenge. Starting with an review of ResNets from high level architecture, we review Basic vs Bottleneck blocks, pre-activation blocks and Wide Resets. After a brief mention of PyramidNet, ResNext and DenseNet models, we look at regularization techniques such as Mixup. And we finish with a review of Cyclical Learning Rates, and the phenomenon of "Super Convergence".
MXNet Gluon API was used for the implementations.
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUsJeff Larkin
This talk was presented at the DOE Centers of Excellence Performance Portability Workshop in August 2017. In this talk I explore the current status of 4 OpenMP 4.5 compilers for NVIDIA GPUs and CPUs from the perspective of performance portability between compilers and between the GPU and CPU.
Using Derivation-Free Optimization Methods in the Hadoop Cluster with TerasortAnhanguera Educacional S/A
We implemented MapReduce cluster benchmark TeraSort by derivative free optimization (DFO) method having runtime function object. In this, every iteration of DFO method uses new values for Hadoop parameter configuration. These parameters are specified within the framework, we used Chef server and client tool which assists in this cluster configuration to ensure proper implementation of TeraSort application. The Chef server acts as a hub for configuration data...
Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNetAI Frontiers
In this talk at AI Frontiers Conference, Alex Smola gives a brief overview over the features used to scale deep learning using MXNet. It relies on a mix between declarative and imperative programming to achieve efficiency while also allowing for significant flexibility for the user. It relies on a distributed (key, value) store for synchronization between GPUs and between machines. It also relies on the separation between a highly efficient execution engine and language bindings to achieve a high degree of flexibility between different languages while offering a native feel in each of them. Alex also briefly discusses how Amazon AWS can help deploy deep learning models and outline steps on our future roadmap.
Deep learning continues to push the state of the art in domains such as computer vision, natural language understanding and recommendation engines. One of the key reasons for this progress is the availability of highly flexible and developer friendly deep learning frameworks. During this workshop, members of the Amazon Machine Learning team will provide a short background on Deep Learning focusing on relevant application domains and an introduction to using the powerful and scalable Deep Learning framework, MXNet. At the end of this tutorial you’ll gain hands on experience targeting a variety of applications including computer vision and recommendation engines as well as exposure to how to use preconfigured Deep Learning AMIs and CloudFormation Templates to help speed your development.
CIFAR-10 for DAWNBench: Wide ResNets, Mixup Augmentation and "Super Convergen...Thom Lane
Summary of models and methods used for DAWNBench CIFAR-10 Challenge. Starting with an review of ResNets from high level architecture, we review Basic vs Bottleneck blocks, pre-activation blocks and Wide Resets. After a brief mention of PyramidNet, ResNext and DenseNet models, we look at regularization techniques such as Mixup. And we finish with a review of Cyclical Learning Rates, and the phenomenon of "Super Convergence".
MXNet Gluon API was used for the implementations.
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUsJeff Larkin
This talk was presented at the DOE Centers of Excellence Performance Portability Workshop in August 2017. In this talk I explore the current status of 4 OpenMP 4.5 compilers for NVIDIA GPUs and CPUs from the perspective of performance portability between compilers and between the GPU and CPU.
Presentation from DICE Coder's Day (2010 November) by Andreas Fredriksson in the Frostbite team.
Goes into detail about Scope Stacks, which are a systems programming tool for memory layout that provides
• Deterministic memory map behavior
• Single-cycle allocation speed
• Regular C++ object life cycle for objects that need it
This makes it very suitable for games.
The past few years have seen a sharp increase in the complexity of rendering algorithms used in modern game engines. Large portions of the rendering work are increasingly written in GPU computing languages, and decoupled from the conventional “one-to-one” pipeline stages for which shading languages were designed. Following Tim Foley’s talk from SIGGRAPH 2016’s Open Problems course on shading language directions, we explore example rendering algorithms that we want to express in a composable, reusable and performance-portable manner. We argue that a few key constraints in GPU computing languages inhibit these goals, some of which are rooted in hardware limitations. We conclude with a call to action detailing specific improvements we would like to see in GPU compute languages, as well as the underlying graphics hardware.
This talk was originally given at SIGGRAPH 2017 by Andrew Lauritzen (EA SEED) for the Open Problems in Real-Time Rendering course.
Talk by Graham Wihlidal (Frostbite Labs) at GDC 2017.
Checkerboard rendering is a relatively new technique, popularized recently by the introduction of the PlayStation 4 Pro. Many modern game engines are adding support for it right now, and in this talk, Graham will present an in-depth look at the new implementation in Frostbite, which is used in shipping titles like 'Battlefield 1' and 'Mass Effect Andromeda'. Despite being conceptually simple, checkerboard rendering requires a deep integration into the post-processing chain, in particular temporal anti-aliasing, dynamic resolution scaling, and poses various challenges to existing effects. This presentation will cover the basics of checkerboard rendering, explain the impact on a game engine that powers a wide range of titles, and provide a detailed look at how the current implementation in Frostbite works, including topics like object id, alpha unrolling, gradient adjust, and a highly efficient depth resolve.
Accelerating microbiome research with OpenACCIgor Sfiligoi
Presented at OpenACC Summit 2020.
UniFrac is a commonly used metric in microbiome research for comparing microbiome profiles to one another. Computing UniFrac on modest sample sizes used to take a workday on a server class CPU-only node, while modern datasets would require a large compute cluster to be feasible. After porting to GPUs using OpenACC, the compute of the same modest sample size now takes only a few minutes on a single NVIDIA V100 GPU, while modern datasets can be processed on a single GPU in hours. The OpenACC programming model made the porting of the code to GPUs extremely simple; the first prototype was completed in just over a day. Getting full performance did however take much longer, since proper memory access is fundamental for this application.
GPUIterator: Bridging the Gap between Chapel and GPU PlatformsAkihiro Hayashi
The ACM SIGPLAN 6th Annual Chapel Implementers and Users Workshop (CHIUW2019) co-located with PLDI 2019 / ACM FCRC 2019.
PGAS (Partitioned Global Address Space) programming models were originally designed to facilitate productive parallel programming at both the intra-node and inter-node levels in homogeneous parallel machines. However, there is a growing need to support accelerators, especially GPU accelerators, in heterogeneous nodes in a cluster. Among high-level PGAS programming languages, Chapel is well suited for this task due to its use of locales and domains to help abstract away low-level details of data and compute mappings for different compute nodes, as well as for different processing units (CPU vs. GPU) within a node. In this paper, we address some of the key limitations of past approaches on mapping Chapel on to GPUs as follows. First, we introduce a Chapel module, GPUIterator, which is a portable programming interface that supports GPU execution of a Chapel forall loop. This module makes it possible for Chapel programmers to easily use hand-tuned native GPU programs/libraries, which is an important requirement in practice since there is still a big performance gap between compiler-generated GPU code and hand-turned GPU code; hand-optimization of CPU-GPU data transfers is also an important contributor to this performance gap. Second, though Chapel programs are regularly executed on multi-node clusters, past work on GPU enablement of Chapel programs mainly focused on single-node execution. In contrast, our work supports execution across multiple CPU+GPU nodes by accepting Chapel's distributed domains. Third, our approach supports hybrid execution of a Chapel parallel (forall) loop across both a GPU and CPU cores, which is beneficial for specific platforms. Our preliminary performance evaluations show that the use of the GPUIterator is a promising approach for Chapel programmers to easily utilize a single or multiple CPU+GPU node(s) while maintaining portability.
Landuse Classification from Satellite Imagery using Deep LearningDataWorks Summit
With the abundance of remote sensing satellite imagery, the possibilities are endless as to the kind of insights that can be derived from them. One such use is to determine land use for agriculture and non-agricultural purposes.
In this talk, we’ll be looking at leveraging Sentinel-2 satellite imagery data along with OpenStreetMap labels to be able to classify land use as agricultural or non-agricultural.
Sentinel-2 data has a 10-meter resolution in RGB bands and is well-suited for land use classification. Using these two datasets, many different machine learning tasks can be performed like image segmentation into two classes (farm land and non-farm land) or more challenging task of identification of crop type being cultivated on fields.
For this talk, we’ll be looking at leveraging convolutional neural networks (CNNs) built with Apache MXNet to train deep learning models for land use classification. We’ll be covering the different deep learning architectures considered for this particular use case along with the appropriate metrics.
We’ll be leveraging streaming pipelines built on Apache Flink and Apache NiFi for model training and inference. Developers will come away with a better understanding of how to analyze satellite imagery and the different deep learning architectures along with their pros/cons when analyzing satellite imagery for land use. SUNEEL MARTHI and CHRIS OLIVIER, Software Development Engineer Amazon Web Services
This session presents a detailed programmer oriented overview of our SPU based shading system implemented in DICE's Frostbite 2 engine and how it enables more visually rich environments in BATTLEFIELD 3 and better performance over traditional GPU-only based renderers. We explain in detail how our SPU Tile-based deferred shading system is implemented, and how it supports rich material variety, High Dynamic Range Lighting, and large amounts of light sources of different types through an extensive set of culling, occlusion and optimization techniques.
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI ProjectsWee Hyong Tok
In this session, we will share about cutting-edge deep learning innovations, and present emerging trends in the AI community. This session is for data scientists, developers who have a keen interest in getting started in an AI project, and wants to learn the tools of the trade. We will draw on practical experiences from working on various AI projects, and share the key learning, and pitfalls
Developing and Deploying Deep Learning Based Computer Vision Systems - Alka N...CodeOps Technologies LLP
Deep Learning is enabling a wide range of computer vision applications from advanced driver assistance systems to sophisticated medical diagnostic devices. However, designing and deploying these applications involve a lot of challenges like handling large datasets, developing optimized models, effectively performing GPU computing and efficiently deploying deep learning models to embedded boards like NVIDIA Jetson. This session illustrates how MATLAB supports all phases of this workflow starting with algorithm design to automatically generating portable and optimized CUDA code helping engineers and scientists address the commonly observed challenges in deep learning workflow
Locating objects in images (“detection”) quickly and efficiently enables object tracking and counting applications on embedded visual sensors (fixed and mobile). By 2012, progress on techniques for detecting objects in images – a topic of perennial interest in computer vision – had plateaued, and techniques based on histogram of oriented gradients (HOG) were state of the art. Soon, though, convolutional neural networks (CNNs), in addition to classifying objects, were also beginning to become effective at simultaneously detecting objects. Research in CNN-based object detection was jump-started by the groundbreaking region-based CNN (R-CNN). We’ll follow the evolution of neural network algorithms for object detection, starting with R-CNN and proceeding to Fast R-CNN, Faster R-CNN, “You Only Look Once” (YOLO), and up to the latest Single Shot Multibox detector. In this talk, we’ll examine the successive innovations in performance and accuracy embodied in these algorithms – which is a good way to understand the insights behind effective neural-network-based object localization. We’ll also contrast bounding-box approaches with pixel-level segmentation approaches and present pros and cons.
https://imatge.upc.edu/web/publications/region-oriented-convolutional-networks-object-retrieval
BSc thesis by Eduard Fontdevila advised by Amaia Salvador and Xavier Giró-i-Nieto.
EET UPC, June 2015.
講師: Jhen-Wei Huang, Solution Architect, AWS
Artificial Intelligence (AI) and deep learning are now ready to power your business, as it is powering most of the innovation of Amazon.com with autonomous drones, and robots, Amazon Alexa, Amazon Go, and many other hard and important business problems. Come and learn why and how to get started with deep learning, and what you can expect from a future with better AI in the cloud and on the edge.
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
Fast, Cheap and Deep – Scaling Machine Learning: Distributed high throughput machine learning is both a challenge and a key enabling technology. Using a Parameter Server template we are able to distribute algorithms efficiently over multiple GPUs and in the cloud. This allows us to design very fast recommender systems, factorization machines, classifiers, and deep networks. This degree of scalability allows us to tackle computationally expensive problems efficiently, yielding excellent results e.g. in visual question answering.
Presentation from DICE Coder's Day (2010 November) by Andreas Fredriksson in the Frostbite team.
Goes into detail about Scope Stacks, which are a systems programming tool for memory layout that provides
• Deterministic memory map behavior
• Single-cycle allocation speed
• Regular C++ object life cycle for objects that need it
This makes it very suitable for games.
The past few years have seen a sharp increase in the complexity of rendering algorithms used in modern game engines. Large portions of the rendering work are increasingly written in GPU computing languages, and decoupled from the conventional “one-to-one” pipeline stages for which shading languages were designed. Following Tim Foley’s talk from SIGGRAPH 2016’s Open Problems course on shading language directions, we explore example rendering algorithms that we want to express in a composable, reusable and performance-portable manner. We argue that a few key constraints in GPU computing languages inhibit these goals, some of which are rooted in hardware limitations. We conclude with a call to action detailing specific improvements we would like to see in GPU compute languages, as well as the underlying graphics hardware.
This talk was originally given at SIGGRAPH 2017 by Andrew Lauritzen (EA SEED) for the Open Problems in Real-Time Rendering course.
Talk by Graham Wihlidal (Frostbite Labs) at GDC 2017.
Checkerboard rendering is a relatively new technique, popularized recently by the introduction of the PlayStation 4 Pro. Many modern game engines are adding support for it right now, and in this talk, Graham will present an in-depth look at the new implementation in Frostbite, which is used in shipping titles like 'Battlefield 1' and 'Mass Effect Andromeda'. Despite being conceptually simple, checkerboard rendering requires a deep integration into the post-processing chain, in particular temporal anti-aliasing, dynamic resolution scaling, and poses various challenges to existing effects. This presentation will cover the basics of checkerboard rendering, explain the impact on a game engine that powers a wide range of titles, and provide a detailed look at how the current implementation in Frostbite works, including topics like object id, alpha unrolling, gradient adjust, and a highly efficient depth resolve.
Accelerating microbiome research with OpenACCIgor Sfiligoi
Presented at OpenACC Summit 2020.
UniFrac is a commonly used metric in microbiome research for comparing microbiome profiles to one another. Computing UniFrac on modest sample sizes used to take a workday on a server class CPU-only node, while modern datasets would require a large compute cluster to be feasible. After porting to GPUs using OpenACC, the compute of the same modest sample size now takes only a few minutes on a single NVIDIA V100 GPU, while modern datasets can be processed on a single GPU in hours. The OpenACC programming model made the porting of the code to GPUs extremely simple; the first prototype was completed in just over a day. Getting full performance did however take much longer, since proper memory access is fundamental for this application.
GPUIterator: Bridging the Gap between Chapel and GPU PlatformsAkihiro Hayashi
The ACM SIGPLAN 6th Annual Chapel Implementers and Users Workshop (CHIUW2019) co-located with PLDI 2019 / ACM FCRC 2019.
PGAS (Partitioned Global Address Space) programming models were originally designed to facilitate productive parallel programming at both the intra-node and inter-node levels in homogeneous parallel machines. However, there is a growing need to support accelerators, especially GPU accelerators, in heterogeneous nodes in a cluster. Among high-level PGAS programming languages, Chapel is well suited for this task due to its use of locales and domains to help abstract away low-level details of data and compute mappings for different compute nodes, as well as for different processing units (CPU vs. GPU) within a node. In this paper, we address some of the key limitations of past approaches on mapping Chapel on to GPUs as follows. First, we introduce a Chapel module, GPUIterator, which is a portable programming interface that supports GPU execution of a Chapel forall loop. This module makes it possible for Chapel programmers to easily use hand-tuned native GPU programs/libraries, which is an important requirement in practice since there is still a big performance gap between compiler-generated GPU code and hand-turned GPU code; hand-optimization of CPU-GPU data transfers is also an important contributor to this performance gap. Second, though Chapel programs are regularly executed on multi-node clusters, past work on GPU enablement of Chapel programs mainly focused on single-node execution. In contrast, our work supports execution across multiple CPU+GPU nodes by accepting Chapel's distributed domains. Third, our approach supports hybrid execution of a Chapel parallel (forall) loop across both a GPU and CPU cores, which is beneficial for specific platforms. Our preliminary performance evaluations show that the use of the GPUIterator is a promising approach for Chapel programmers to easily utilize a single or multiple CPU+GPU node(s) while maintaining portability.
Landuse Classification from Satellite Imagery using Deep LearningDataWorks Summit
With the abundance of remote sensing satellite imagery, the possibilities are endless as to the kind of insights that can be derived from them. One such use is to determine land use for agriculture and non-agricultural purposes.
In this talk, we’ll be looking at leveraging Sentinel-2 satellite imagery data along with OpenStreetMap labels to be able to classify land use as agricultural or non-agricultural.
Sentinel-2 data has a 10-meter resolution in RGB bands and is well-suited for land use classification. Using these two datasets, many different machine learning tasks can be performed like image segmentation into two classes (farm land and non-farm land) or more challenging task of identification of crop type being cultivated on fields.
For this talk, we’ll be looking at leveraging convolutional neural networks (CNNs) built with Apache MXNet to train deep learning models for land use classification. We’ll be covering the different deep learning architectures considered for this particular use case along with the appropriate metrics.
We’ll be leveraging streaming pipelines built on Apache Flink and Apache NiFi for model training and inference. Developers will come away with a better understanding of how to analyze satellite imagery and the different deep learning architectures along with their pros/cons when analyzing satellite imagery for land use. SUNEEL MARTHI and CHRIS OLIVIER, Software Development Engineer Amazon Web Services
This session presents a detailed programmer oriented overview of our SPU based shading system implemented in DICE's Frostbite 2 engine and how it enables more visually rich environments in BATTLEFIELD 3 and better performance over traditional GPU-only based renderers. We explain in detail how our SPU Tile-based deferred shading system is implemented, and how it supports rich material variety, High Dynamic Range Lighting, and large amounts of light sources of different types through an extensive set of culling, occlusion and optimization techniques.
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI ProjectsWee Hyong Tok
In this session, we will share about cutting-edge deep learning innovations, and present emerging trends in the AI community. This session is for data scientists, developers who have a keen interest in getting started in an AI project, and wants to learn the tools of the trade. We will draw on practical experiences from working on various AI projects, and share the key learning, and pitfalls
Developing and Deploying Deep Learning Based Computer Vision Systems - Alka N...CodeOps Technologies LLP
Deep Learning is enabling a wide range of computer vision applications from advanced driver assistance systems to sophisticated medical diagnostic devices. However, designing and deploying these applications involve a lot of challenges like handling large datasets, developing optimized models, effectively performing GPU computing and efficiently deploying deep learning models to embedded boards like NVIDIA Jetson. This session illustrates how MATLAB supports all phases of this workflow starting with algorithm design to automatically generating portable and optimized CUDA code helping engineers and scientists address the commonly observed challenges in deep learning workflow
Locating objects in images (“detection”) quickly and efficiently enables object tracking and counting applications on embedded visual sensors (fixed and mobile). By 2012, progress on techniques for detecting objects in images – a topic of perennial interest in computer vision – had plateaued, and techniques based on histogram of oriented gradients (HOG) were state of the art. Soon, though, convolutional neural networks (CNNs), in addition to classifying objects, were also beginning to become effective at simultaneously detecting objects. Research in CNN-based object detection was jump-started by the groundbreaking region-based CNN (R-CNN). We’ll follow the evolution of neural network algorithms for object detection, starting with R-CNN and proceeding to Fast R-CNN, Faster R-CNN, “You Only Look Once” (YOLO), and up to the latest Single Shot Multibox detector. In this talk, we’ll examine the successive innovations in performance and accuracy embodied in these algorithms – which is a good way to understand the insights behind effective neural-network-based object localization. We’ll also contrast bounding-box approaches with pixel-level segmentation approaches and present pros and cons.
https://imatge.upc.edu/web/publications/region-oriented-convolutional-networks-object-retrieval
BSc thesis by Eduard Fontdevila advised by Amaia Salvador and Xavier Giró-i-Nieto.
EET UPC, June 2015.
講師: Jhen-Wei Huang, Solution Architect, AWS
Artificial Intelligence (AI) and deep learning are now ready to power your business, as it is powering most of the innovation of Amazon.com with autonomous drones, and robots, Amazon Alexa, Amazon Go, and many other hard and important business problems. Come and learn why and how to get started with deep learning, and what you can expect from a future with better AI in the cloud and on the edge.
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
Fast, Cheap and Deep – Scaling Machine Learning: Distributed high throughput machine learning is both a challenge and a key enabling technology. Using a Parameter Server template we are able to distribute algorithms efficiently over multiple GPUs and in the cloud. This allows us to design very fast recommender systems, factorization machines, classifiers, and deep networks. This degree of scalability allows us to tackle computationally expensive problems efficiently, yielding excellent results e.g. in visual question answering.
AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverabi...Amazon Web Services
Deep learning continues to push the state of the art in domains such as video analytics, computer vision, and speech recognition. Deep networks are powered by amazing levels of representational power, feature learning, and abstraction. This approach comes at the cost of a significant increase in required compute power, which makes the AWS cloud an excellent environment for training. Innovators in this space are applying deep learning to a variety of applications. One such innovator, Vilynx, a startup based in Palo Alto, realized that the current pre-roll advertising-based models for mobile video weren’t returning publishers' desired levels of engagement. In this session, we explain the algorithmic challenges of scaling across multiple nodes, and what Intel is doing on AWS to overcome them. We describe the benefits of using AWS CloudFormation to set up a distributed training environment for deep networks. We also showcase Vilynx’s contributions to video discoverability, and explain how Vilynx uses AWS tools to understand video content. This session is sponsored by Intel.
"Wix Engineering Media AI Photo Studio", Mykola MykhailychFwdays
In this talk, we will review components of the Wix Engineering AI-based image processing toolbox: super-resolution, automatic enhancement, cutout, etc. We'll share insights about models in production and their metrics. We'll show how ML models can improve user experience and make core products even more helpful.
Using The New Flash Stage3D Web Technology To Build Your Own Next 3D Browser ...Daosheng Mu
Game Developer Conference China (2012). Programming track.
This speech talks about how to use Stage3D APIs to make a 3D web game engine, and discuss some points about optimizing it.
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...MLconf
Understanding Deep Learning for Big Data: The complexity and scale of big data impose tremendous challenges for their analysis. Yet, big data also offer us great opportunities. Some nonlinear phenomena, features or relations, which are not clear or cannot be inferred reliably from small and medium data, now become clear and can be learned robustly from big data. Typically, the form of the nonlinearity is unknown to us, and needs to be learned from data as well. Being able to harness the nonlinear structures from big data could allow us to tackle problems which are impossible before or obtain results which are far better than previous state-of-the-arts.
Nowadays, deep neural networks are the methods of choice when it comes to large scale nonlinear learning problems. What makes deep neural networks work? Is there any general principle for tackling high dimensional nonlinear problems which we can learn from deep neural works? Can we design competitive or better alternatives based on such knowledge? To make progress in these questions, my machine learning group performed both theoretical and experimental analysis on existing and new deep learning architectures, and investigate three crucial aspects on the usefulness of the fully connected layers, the advantage of the feature learning process, and the importance of the compositional structures. Our results point to some promising directions for future research, and provide guideline for building new deep learning models.
Ähnlich wie On-the-fly Visual Category Search in Web-scale Image Collections (20)
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
2. • Search large unannotated datasets of 1M+
images for object categories
• Do so in real-time and without any prior
knowledge
Motivation and Objectives
7. • Bootstrap training using images from the web
• Use highly compact ConvNet features +
compression as the basis of a OTF system
• Plus: Novel GPU architecture for iterative on-
the-fly learning
Proposed Solution
8. Architecture Outline
Car|
Google Image
Search Sourced
Training Images
Image
Encoder
φ( I )
φ( I+ )
Fixed negative pool
precomputed
features
Linear SVM
φ( I- )
w
Target Dataset
wTφ( It )
Ranking
φ( It )
precomputed
features
Flickr
Pinterest
etc.
9. Need for Speed
Car|
Google Image
Search Sourced
Training Images
Image
Encoder
φ( I )
φ( I+ )
Negative pool
Linear SVM
φ( I- )
w
Target Dataset
wTφ( It )
Ranking
φ( It )
Flickr
Pinterest
etc.
Ranking most critical stage
w wTφ( It )
φ( It )
10. Must compute w.X for all image features in dataset giving
complexity of O(ND) so important to reduce image
representation dimensionality:
• Obtain 128-D representation from CNN
(488 MB / 1M images)
• Then compress further using binarization
(122 MB / 1M images)
• Or using product quantization
(30.5 MB / 1M images)
Fast Ranking = Compact Representation
N – # images in test set
D – dim of image representation
15. Compression
• Binarization by embedding into Hamming space:
e : RD
! BM
Where M > D and U is obtained by taking the first D columns of
the QR-decomposition of a random M x M matrix
bi = sgn(Uxi)
• Product Quantization
…
…
…
…
D
S
d
Q
16. Evaluation Dataset
10,000 annotated images
PASCAL VOC 2007
1M unannotated images
MIRFLICKR-1M
• Want to evaluate CNN features for real-world photo retrieval
• Disjoint from ImageNet (as CNN trained on that) + with less
focus on fine-grained retrieval
18. Evaluation Dataset
1 2 33
Remove false negatives and evaluate Precision @ K…
Using MIRFLICKR-1M dataset as distractors
where K = 100
Or evaluate Precision @ K over MIRFLICKR-1M directly
20. Retrieval Results
Results for two sample classes over VOC + Distractor data
(Retrieve ~500 images from within 1M images – TP are 0.05% of dataset)
! CNN 128 (Prec. 0.32 @ 100) ! CNN 128 (Prec 0.77 @ 100)
24. VOC vs Google Training
! ‘Chair’ – CNN 128 (Prec. 0.92 @ 100) (Prec. 0.86 @ 100)
! ‘Train’ – CNN 128 (Prec. 1.0 @ 100) (Prec. 1.0 @ 100)
VOC Training Google Training
25. Instances & Faces too
Instances
Root SIFT
Extractor
ψ( I ) → xi
φ( I+ )
VQ
Encoder
φ( xi )
Hamming
Encoder
φ( xi )
Spatial
Verification
φ( xi )
ψ( It )
Target Dataset
match?
match?
Ranking
x N
(take max)
Ntraining
images
Faces
Ntraining
images
φ( It )
Target Dataset
Tracks
Ranking
Linear
SVM
w
φ( I- )
Negative Pool
φ( If+ )If+Face
Extractor
ψ( I ) → If
Pre-trained
Face CNN
φ( I )
26. Live Demo
Landing Page1
User enters text query term and
selects search modality
(e.g. ‘forest’ using object
category search)
Ranked Results3
A ranked list of visually matching
images is displayed within 1~30 secs
of entering the cold query
Querying2
A live view of images downloaded
from Google Image search as they
are used to construct a visual
appearance model on-the-fly
Can try out the system live over a dataset of 5M+ images
sourced from BBC News footage at:
http://varro3.robots.ox.ac.uk:9090
27. Question:
How can we adapt standard GPU ConvNet pipeline
for on-the-fly search?
We want:
• simultaneous feature computation/model training
• highly parallel operation by using a GPU-bound
architecture
ConvNet-based Architecture
• Libraries such as Caffe allow for fast computation
of ConvNet features entirely on GPU
29. ConvNet-based Architecture
RGB
xB/2 Pos.
CNN feat.
conv
stack
fc
stack
CNN feat.
xB/2 Neg.
Fixed negative pool
Sheep|
Google Image
Search
Training Images
SVM Loss Layer 5 =
1
B
X
i=1..B
I[yiw>
xi < 1]yixi
Batch Sampler
Batch size = B
precomputed
CNN feats
CPU Frontend GPU Backend
30. ConvNet-based Architecture
RGB
xB/2 Pos.
CNN feat.
conv
stack
fc
stack
CNN feat.
xB/2 Neg.
Fixed negative pool
Sheep|
Google Image
Search
Training Images
SVM Loss Layer 5 =
1
B
X
i=1..B
I[yiw>
xi < 1]yixi
Batch Sampler
Batch size = B
Image Buffer
precomputed
CNN feats
CPU Frontend GPU Backend
31. ConvNet-based Architecture
Batch Sampler
Batch size = B
Fixed negative pool
Sheep|
Google Image
Search
Training Images
Image Buffer
RGB
xB/2 Pos.
CNN feat.
xB/2 Neg.
CNN feat.
Target Dataset:
MIRFLICKR
Every
τsecs
conv
stack
fc
stack
Model
w
precomputed
CNN feats
CPU Frontend GPU Backend
Inner Product Layer
precomputed
CNN feats
SVM Loss Layer
32. 20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
Retrieval Results
• Images are fed into the network at a rate of 12 per second
• Dataset is ranked with current model every ~0.2 seconds
• Most rankings stabilise in under 1 second
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
0.15s0.36s0.54s0.73s
sofasheepbushorse
33. 20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
Retrieval Results
• Images are fed into the network at a rate of 12 per second
• Dataset is ranked with current model every ~0.2 seconds
• Most rankings stabilise in under 1 second
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
0.15s0.36s0.54s0.73s
sofasheepbushorse
34. Retrieval Results
• Images are fed into the network at a rate of 12 per second
• Dataset is ranked with current model every ~0.2 seconds
• Most rankings stabilise in under 1 second
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
0.15s0.36s0.54s0.73s
sofasheepbushorse
35. Retrieval Results
• Images are fed into the network at a rate of 12 per second
• Dataset is ranked with current model every ~0.2 seconds
• Most rankings stabilise in under 1 second
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
0.15s0.36s0.54s0.73s
sofasheepbushorse
36. Retrieval Results
• Images are fed into the network at a rate of 12 per second
• Dataset is ranked with current model every ~0.2 seconds
• Most rankings stabilise in under 1 second
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
0.15s0.36s0.54s0.73s
sofasheepbushorse
37. Retrieval Results
• Images are fed into the network at a rate of 12 per second
• Dataset is ranked with current model every ~0.2 seconds
• Most rankings stabilise in under 1 second
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
0.15s0.36s0.54s0.73s
sofasheepbushorse
38. Retrieval Results
• Images are fed into the network at a rate of 12 per second
• Dataset is ranked with current model every ~0.2 seconds
• Most rankings stabilise in under 1 second
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
0.15s0.36s0.54s0.73s
sofasheepbushorse
39. Retrieval Results
• Images are fed into the network at a rate of 12 per second
• Dataset is ranked with current model every ~0.2 seconds
• Most rankings stabilise in under 1 second
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
0.15s0.36s0.54s0.73s
sofasheepbushorse
40. Retrieval Results
• Images are fed into the network at a rate of 12 per second
• Dataset is ranked with current model every ~0.2 seconds
• Most rankings stabilise in under 1 second
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
0.15s0.36s0.54s0.73s
sofasheepbushorse
41. Retrieval Results
• Images are fed into the network at a rate of 12 per second
• Dataset is ranked with current model every ~0.2 seconds
• Most rankings stabilise in under 1 second
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
0.15s0.36s0.54s0.73s
sofasheepbushorse
42. Retrieval Results
• Images are fed into the network at a rate of 12 per second
• Dataset is ranked with current model every ~0.2 seconds
• Most rankings stabilise in under 1 second
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
0.15s0.36s0.54s0.73s
sofasheepbushorse
43. Currently working on the following extensions:
• How to select negative training images more
intelligently (e.g. selection of most discriminative
negative images per query from a larger 1M+ pool of
non-class images)
• How to establish a confidence measure for images in
the output ranking, so know when a query works well
or not, and source training images more intelligently
• Query attribute refinement (sporty + car)
Continued Work
44. “On-the-fly Learning for Visual Search of Large-scale Image and Video Datasets”
IJMIR 2015 Ken Chatfield, Relja Arandjelovic, Omkar Parkhi, Andrew Zisserman
“Efficient On-the-fly Category Retrieval using ConvNets and GPUs”
ACCV 2014 Ken Chatfield, Karen Simonyan, Andrew Zisserman
“Return of the Devil in the Details: Delving Deep into Convolutional Nets”
BMVC 2014 Ken Chatfield, Karen Simonyan, Andrea Vedaldi, Andrew
Zisserman (Best Paper Prize)
http://www.robots.ox.ac.uk/~ken
Related Publications