Analysis on Implementation of different CNN Architectures on FPGAs | Undergrad Thesis - BITS F421T Thesis | Author: Prayag Mohanty |BITS Pilani KK Birla Goa Campus
DOI: 10.13140/RG.2.2.31819.77604
Convolutional Neural Networks (CNNs) are a special type of neural networks that are exceptionally good at working with data, like images, signals etc. The usage of Field-Programmable Gate Arrays (FPGAs) in high-performance computing has garnered significant attention with the advent of Artificial Intelligence. This thesis investigates the performance and resource utilization of various convolutional neural network (CNN) models for implementation on Field-Programmable Gate Arrays (FPGAs). The primary objective is to identify optimal CNN models for FPGA deployment based on their performance, resource utilization, and other relevant parameters. Two prominent CNN models, AlexNet and MobileNet, were chosen for analysis. Both models were implemented on an FPGA platform. Performance metrics such as resource utilization metrics, including logic slices, memory blocks, and DSP slices, were monitored to assess the hardware requirements of each model. The evaluation results demonstrate that MobileNet exhibits significantly lower resource utilization compared to AlexNet while maintaining a commendable level of performance. This suggests that MobileNet is a more efficient option for deploying CNN models on FPGAs with limited hardware resources. AlexNet, on the other hand, offers superior performance but at the expense of higher resource consumption. This makes it a suitable choice for applications where performance is paramount and resources are less restricted.This analysis provides valuable insights into the suitability of different CNN models for FPGA implementation based on their performance and resource utilization characteristics.
Keywords: Convolutional Neural Networks, FPGA, Performance, Resource Utilization, AlexNet, MobileNet
RESEARCH PROPOSAL ON ENHANCING AUTOMATIC IMAGE CAPTIONING SYSTEM LSTM.pdfMUHUMUZAONAN1
In this research study, the researchers aim to investigate and address these challenges by proposing techniques and architectures for enhancing image captioning systems using CNNs and LSTMs. Specifically, the researchers will focus on developing a system that generates accurate and semantically meaningful captions for a wide range of images. By doing so, the researchers aim to contribute to the development of more effective and reliable image captioning systems.
- Jyoti Tyagi submitted a project titled "Modification in the behavior of Nova Filter Scheduler to enhance the performance in OpenStack cloud" as part of her MSc in Cloud Computing at the National College of Ireland.
- The project aims to assess the behavior of the nova scheduler in OpenStack with the goal of reducing latency when placing virtual machines. It analyzes the default filters of the nova scheduler to understand limitations and attempts to improve performance and resource allocation.
- An experiment was performed to modify essential filters in the scheduler to prove the concept. Various scheduling filters were analyzed to check their impact on performance by changing default values and metrics.
The document describes a project submission for a Masters student's research dissertation on modifying the nova scheduler in OpenStack to enhance performance. The student aims to reduce latency when launching virtual machines by analyzing how scheduling is affected by factors like CPU cores, memory allocation, and disk usage. The submission sheet provides details of the student's name, ID, program of study, module, supervisor, project title, word count, and certification that the work is original. It also includes submission instructions.
Sam Dunham is an electrical engineer looking for new career opportunities. He has a Master's degree in Electrical Engineering from Southern Illinois University Edwardsville with a 3.9 GPA. Currently, he works as an Analog Validation Engineer at Intel Corporation, where he validates serial I/O interfaces like PCIe and Ethernet. Previously, he was a research assistant at Southern Illinois University Edwardsville, where he designed and optimized circuits. He has strong skills in programming, IC design, and working in teams.
John Arigho (X00075278) Final Project [Porcine Vertebra Simulation](Print)John Arigho
This document outlines a project to develop a finite element analysis (FEA) model of a porcine vertebra from CT scan data and validate it against experimental compression test data. The project aims to investigate open source software for medical image processing and FEA mesh generation in order to create a geometrically accurate FEA mesh model of the vertebra. The model will then be loaded and tested in compression simulations in ANSYS to validate the material properties by comparing the results to postgraduate compression test data on porcine vertebrae specimens. Further validation methods such as modeling specific loading conditions and strain measurements will also be explored.
This document is a project report submitted by four students - Apeksha A. Jain, Rohit M. Kulkarni, Soham C. Wadekar, and Kedar D. Wagholikar - for their Bachelor of Engineering degree. The report details a project on dynamic routing of packets in wireless sensor networks conducted under the guidance of Prof. G.R. Pathak. The project aims to implement clustering in a wireless sensor network and analyze the effects of increasing cluster size on cluster head energy. It further aims to implement an energy efficient dynamic algorithm to re-elect cluster heads periodically in order to save energy. The report presents the background, problem statement, project planning, analysis, design,
Efficient Planning and Offline Routing Approaches for IP NetworksEM Legacy
Pre-print archive of my dissertation on Efficient Planning and Offline Routing Approaches for IP Networks, Departement of Communication Networks, Technische Universitaet Hamburg-Harburg 2006
RESEARCH PROPOSAL ON ENHANCING AUTOMATIC IMAGE CAPTIONING SYSTEM LSTM.pdfMUHUMUZAONAN1
In this research study, the researchers aim to investigate and address these challenges by proposing techniques and architectures for enhancing image captioning systems using CNNs and LSTMs. Specifically, the researchers will focus on developing a system that generates accurate and semantically meaningful captions for a wide range of images. By doing so, the researchers aim to contribute to the development of more effective and reliable image captioning systems.
- Jyoti Tyagi submitted a project titled "Modification in the behavior of Nova Filter Scheduler to enhance the performance in OpenStack cloud" as part of her MSc in Cloud Computing at the National College of Ireland.
- The project aims to assess the behavior of the nova scheduler in OpenStack with the goal of reducing latency when placing virtual machines. It analyzes the default filters of the nova scheduler to understand limitations and attempts to improve performance and resource allocation.
- An experiment was performed to modify essential filters in the scheduler to prove the concept. Various scheduling filters were analyzed to check their impact on performance by changing default values and metrics.
The document describes a project submission for a Masters student's research dissertation on modifying the nova scheduler in OpenStack to enhance performance. The student aims to reduce latency when launching virtual machines by analyzing how scheduling is affected by factors like CPU cores, memory allocation, and disk usage. The submission sheet provides details of the student's name, ID, program of study, module, supervisor, project title, word count, and certification that the work is original. It also includes submission instructions.
Sam Dunham is an electrical engineer looking for new career opportunities. He has a Master's degree in Electrical Engineering from Southern Illinois University Edwardsville with a 3.9 GPA. Currently, he works as an Analog Validation Engineer at Intel Corporation, where he validates serial I/O interfaces like PCIe and Ethernet. Previously, he was a research assistant at Southern Illinois University Edwardsville, where he designed and optimized circuits. He has strong skills in programming, IC design, and working in teams.
John Arigho (X00075278) Final Project [Porcine Vertebra Simulation](Print)John Arigho
This document outlines a project to develop a finite element analysis (FEA) model of a porcine vertebra from CT scan data and validate it against experimental compression test data. The project aims to investigate open source software for medical image processing and FEA mesh generation in order to create a geometrically accurate FEA mesh model of the vertebra. The model will then be loaded and tested in compression simulations in ANSYS to validate the material properties by comparing the results to postgraduate compression test data on porcine vertebrae specimens. Further validation methods such as modeling specific loading conditions and strain measurements will also be explored.
This document is a project report submitted by four students - Apeksha A. Jain, Rohit M. Kulkarni, Soham C. Wadekar, and Kedar D. Wagholikar - for their Bachelor of Engineering degree. The report details a project on dynamic routing of packets in wireless sensor networks conducted under the guidance of Prof. G.R. Pathak. The project aims to implement clustering in a wireless sensor network and analyze the effects of increasing cluster size on cluster head energy. It further aims to implement an energy efficient dynamic algorithm to re-elect cluster heads periodically in order to save energy. The report presents the background, problem statement, project planning, analysis, design,
Efficient Planning and Offline Routing Approaches for IP NetworksEM Legacy
Pre-print archive of my dissertation on Efficient Planning and Offline Routing Approaches for IP Networks, Departement of Communication Networks, Technische Universitaet Hamburg-Harburg 2006
This document is a thesis submitted by Shereef B. M. Shehata to Concordia University in 1997 for the degree of Doctor of Philosophy in Electrical and Computer Engineering. The thesis proposes a technique for high level synthesis of digital signal processing cores targeting field programmable gate arrays (FPGAs). The technique aims to optimize the total execution time of the synthesized architecture using integer linear programming while accounting for the structural characteristics of FPGAs early in the synthesis process. This includes optimizing interconnect usage and estimating system clock duration.
The document discusses migrating a mobile core application from its native infrastructure to a simplified infrastructure (SI). It aims to analyze whether the SI can maintain the same level of availability as the native infrastructure without major architecture changes. The author conducted a theoretical study of the native infrastructure and identified functions relying on non-IP interfaces. The study analyzed how these functions could become unavailable on the SI. Laboratory tests were performed on an SI prototype to verify proposals for maintaining availability. The results confirmed the SI could achieve higher availability than the native infrastructure if the proposals are successfully implemented.
The document discusses optimizing and implementing the ParaDiS dislocation dynamics simulation code on a parallel cluster. Key optimization techniques included loop unrolling, SIMD intrinsics for vector calculations, a write buffer, and further OpenMP optimizations. Simulation results using the optimized code supported experimental data and models of strain hardening in materials.
This master's thesis explores integrating a home automation architecture based on EnOcean technology with a telecommunications architecture. It develops an EnOcean resource adaptor for a JAIN SLEE application server to communicate with an EnOcean gateway. It then develops a JAIN service that allows a SIP user agent to control and monitor home automation devices like lamps, motion sensors, and energy meters through an interactive voice response system. The goal is to demonstrate a method for remote management and control of EnOcean-based smart home devices through a telecommunications network and SIP.
This thesis presents a generalized Monte Carlo tool for investigating the properties of materials using a non-parabolic band structure model. The tool allows users to define new material parameters and properties by making every parameter a variable. It incorporates various scattering mechanisms and uses an analytic band structure model, making it fast. The tool has been integrated with the Rappture interface and deployed on nanoHUB.org for broad accessibility. Results from the tool closely match experimental data for common semiconductors like silicon, germanium, and gallium arsenide, demonstrating its versatility. The user-friendly interface allows defining materials and obtaining accurate results without coding.
Design and Development of a Knowledge Community SystemHuu Bang Le Phan
The document is a dissertation submitted by Le Phan Huu Bang to the Department of Computer Science at the National University of Singapore in 2008/2009 describing the design and development of a Knowledge Community System (K-Comm). The dissertation includes chapters on introducing knowledge and the need for knowledge sharing, reviewing existing literature, providing an overview of the K-Comm system and its features, and describing the implementation of K-Comm.
Design and Simulation of Local Area Network Using Cisco Packet TracerAbhi abhishek
This document describes a project to design and simulate a local area network (LAN) for a college using Cisco Packet Tracer. The project aims to study network topologies, design a topology for the college, configure IP addresses and subnets, and simulate packet transmission between departments. It will examine concepts like topology design, IP addressing, and using virtual LANs to separate departmental traffic. The results will provide insights into network simulation and performance analysis.
This project aims to design and implement a system called PHAMONAS to provide soil nutrient availability data. The system will have four subsystems: 1) detection of pH and moisture levels for up to six research nurseries, 2) communication of data to an online database via GPRS, 3) storage of data in the online database, and 4) a decision-making subsystem to provide nutrient availability and treatment recommendations to users. The system seeks to enable more efficient agronomy research by automating data collection, storage, and analysis at a lower cost compared to current manual methods. It will allow research facilities to more easily obtain nutrient data, compare treatments, and share results.
An investigation into the physical build and psychological aspects of an inte...Jessica Navarro
This dissertation investigates creating an interactive information point and examines the psychological effects on users. The student aims to build an animatronic information point that tracks objects and interacts with users. Research covers object tracking hardware/software, human-computer interaction, and effects of anthropomorphism. The student will create a physical animatronic head, programming in LabVIEW and Roborealm, conduct user testing via questionnaire, and analyze the results. The dissertation aims to determine if a more lifelike interactive information point improves the user experience of conveying information.
Makgopa Setati_Machine Learning for Decision Support in Distributed Systems_M...Makgopa Gareth Setati
This document presents research on using machine learning techniques like artificial neural networks and genetic algorithms to optimize a system for predicting time series data, specifically daily stock prices. Neural networks are used for prediction, while genetic algorithms are used to optimize both the input data and neural network architecture to improve predictions. The results show that the machine learning approaches help refine the predictive capability of the system. The document aims to contribute to research on applying machine learning in distributed networks for decision support.
A Software Approach for Lower Power Consumption.pdfHanaTiti
This document summarizes related work on software power optimization techniques, including instruction scheduling for low power consumption. It discusses previous research on software power estimation models, energy code generation, reducing memory access, symbolic algebra optimization, and list scheduling algorithms for low power. The document analyzes various approaches for optimizing power through software techniques such as instruction selection, register allocation, and reordering instructions to minimize overhead costs between pairs of instructions.
Accelerated Prototyping of Cyber Physical Systems in an Incubator ContextSreyas Sriram
The document summarizes the prototyping history of a microscope prototype developed by a startup incubated at a technology business incubator. It describes the evolution from initial prototypes (V1 and V2) composed primarily of 3D printed parts to later prototypes (V3, static prototype, skeleton BOM, mechatronics BOM) composed mainly of commercial off-the-shelf components. This shift led to an increase in the total number of components from around 40 to over 200. The increased use of standardized, replaceable parts like fasteners contributed to making the design more modular. Analysis of contributions by student designers found that the prototyping process, which began in 2017, involved over 700 days of combined work until completion
This project describes integrating wind power into a DC microgrid that stores and transforms power. A microgrid consists of distributed energy sources like wind turbines and solar PV systems connected to electrical loads. The project simulates connecting a wind turbine to an asynchronous machine, rectifier, and DC bus using Simulink. Operational optimization of the microgrid is analyzed to minimize costs and emissions while maintaining supply-demand balance and battery state of charge. Integration of the DC microgrid is proposed and simulation results are presented.
This document discusses using an artificial neural network to predict the success of a logistics network. It begins by introducing the research purpose of analyzing logistics network performance using ANN techniques. It then provides details on the neural network model and methodology, including analyzing sample networks to find shortest paths and relationships between parameters. The research implementation section describes using MATLAB to set up example neural networks for sample logistics networks and comparing predicted outputs to actual industry data. The conclusion suggests that neural networks can provide intelligent predictions for logistics networks given sufficient historical data.
An investigation into the critical factors involved in developing a Knowledge...Gowri Shankar
This document is the final report submitted by Gnanamoorthy Gowri Shankar for the course BN89 Master in Project Management at Queensland University of Technology. The report investigates the critical factors involved in developing a knowledge management system in project-based manufacturing industries, using the automobile components manufacturing industry as a case study. The report includes an introduction outlining the problem statement, objectives and approach. It also includes a literature review, research methodology, results and discussion, conclusion, and recommendations. The aim is to identify issues influencing knowledge management implementation and provide recommendations to improve knowledge management practices in the industry.
ILIC Dejan - MSc: Secure Business Computation by using Garbled Circuits in a ...Dejan Ilic
This thesis introduces a web based system for secure evaluation of economic function, named Secure Business Computation (SBC), in the manner suggested by Yao 1982
The document proposes developing software to enable secure and authorized dynamic group resource management. It aims to implement attribute-based access control and dynamic delegation of access rights to address limitations in existing group-centric applications. The research plan involves three phases: literature review and requirements analysis; core implementation of access control and delegation features; and testing, performance analysis, and real-world deployment. The proposed software would facilitate secure collaboration and resource sharing for educational institutions and organizations.
E.Leute: Learning the impact of Learning Analytics with an authentic datasetHendrik Drachsler
Nowadays, data sets of the interactions of users and their corresponding demographic data are becoming more and more valuable for companies and academic institutions like universities
when optimizing their key performance indicators. Whether it is to develop a model to predict the optimal learning path for a student or to sell customers additional products, data sets to
train these models are in high demand. Despite the importance and need for big data sets it still has not become apparent to every decision-maker how crucial data sets like these are for the
future success of their operations.
The objective of this thesis is to demonstrate the use of a data set, gathered from the virtual learning environment of a distance learning university, by answering a selection of questions in
Learning Analytics. Therefore, a real-world data set was analyzed and the selected questions were answered by using state-of-the-art machine learning algorithms.
This document is the final report for a third year project on designing a wearable kinetic energy harvester. It acknowledges those who assisted with the project and declares that the report contains only the author's original work. The project aimed to create a device that converts arm movements into usable energy through electromagnetic induction using magnets and coils. Tests confirmed the implemented circuit was successful in harvesting kinetic energy from arm motions and charging a battery according to the project's objectives of creating an efficient and cost-effective wearable energy harvester. Final assembly and testing was planned for after submission of the report.
This thesis presents new techniques to diagnose process variability in industrial plasma etching systems. Optical emission spectroscopy and electrical sensors are used to measure plasma parameters during etching. Statistical analysis methods like principal component analysis and gradient boosting trees are applied to sensor data to correlate variability in etch rate with measurements. A case study of an etching process is presented where time series OES data is analyzed across process steps. Variability due to the "first wafer effect" is investigated. Process excursions are also examined by studying shifts in principal components. The combination of in-situ sensors and statistical analysis provides a powerful tool for process engineers to diagnose sources of variability.
"Touch-Me-Not" by Ismat Chughtai: A Critical AnalysisPrayag Mohanty
"Touch-Me-Not" by Ismat Chughtai is a daring and provocative short story that delves into themes of gender, power dynamics, and societal expectations in mid-20th century Indian society. Set against the backdrop of a conservative Muslim household, the story follows the protagonist, Sultana, a young woman who rebels against the traditional roles imposed upon her.
Chughtai's narrative challenges traditional gender norms by portraying Sultana as a defiant figure who refuses to conform to the patriarchal expectations placed upon her. Through her character, Chughtai explores the complexities of female desire and agency in a society that seeks to suppress them.
The title "Touch-Me-Not" serves as a metaphor for Sultana's resistance to being confined or controlled by others. It symbolizes her determination to assert her autonomy and challenge the restrictive norms of her environment.
Chughtai's writing is characterized by its boldness and frankness, tackling taboo subjects with unflinching honesty. She exposes the hypocrisy and double standards inherent in a society that places undue emphasis on female chastity and obedience.
Overall, "Touch-Me-Not" offers a thought-provoking critique of gender roles and societal expectations, while also celebrating the courage and resilience of those who dare to defy them.
Periodic Styles in Indian Traditional Art - Mughal, Kangra, MiniaturePrayag Mohanty
Periodic Styles in Indian Traditional Art showcase the rich cultural heritage and artistic excellence that has evolved over centuries. Among these, three prominent styles stand out: Mughal, Kangra, and Miniature. Each style reflects distinct influences, techniques, and thematic representations, contributing to the diverse tapestry of Indian art.
1. **Mughal Art:**
Mughal art flourished during the Mughal Empire (16th to 18th centuries) under the patronage of emperors like Akbar, Jahangir, and Shah Jahan. It is characterized by its intricate detailing, vivid colors, and a fusion of Persian, Islamic, and Indian styles. Mughal paintings often depict historical events, court scenes, flora, fauna, and portraits of rulers and nobility. Artists employed techniques like meticulous brushwork, precise draftsmanship, and the extensive use of gold leaf to create opulent and lifelike compositions.
2. **Kangra Art:**
Kangra painting originated in the Kangra Valley of Himachal Pradesh during the 17th to 19th centuries. It embodies the serene beauty of nature, love, and devotion. Kangra paintings are renowned for their delicate lines, pastel hues, and ethereal imagery, often depicting scenes from Hindu mythology, particularly the love stories of Radha and Krishna. Artists of Kangra school mastered the art of portraying emotions through subtle facial expressions and graceful gestures. The use of natural pigments derived from minerals and plants imparts a soft and luminous quality to these exquisite works of art.
3. **Miniature Art:**
Miniature painting is a meticulous and intricate art form that reached its pinnacle during the Mughal and Rajput periods (16th to 19th centuries). Miniatures are characterized by their diminutive size and elaborate detailing. Artists employed fine brushes, often made from squirrel hair, to create miniature masterpieces on materials such as paper, ivory, or cloth. Themes ranged from courtly scenes, religious narratives, and portraits to landscapes and flora. Miniatures are distinguished by their vibrant colors, intricate patterns, and meticulous attention to detail, showcasing the artist's skill and imagination within a confined space.
These Periodic Styles in Indian Traditional Art not only serve as visual representations of history, culture, and mythology but also as enduring testaments to the artistic genius and creativity of Indian artisans through the ages. Each style encapsulates its own unique blend of techniques, themes, and aesthetics, contributing to the rich tapestry of India's artistic heritage.
Weitere ähnliche Inhalte
Ähnlich wie Analysis on Implementation of different CNN Architectures on FPGAs | Undergrad Thesis - BITS F421T Thesis | Author: Prayag Mohanty |BITS Pilani KK Birla Goa Campus
This document is a thesis submitted by Shereef B. M. Shehata to Concordia University in 1997 for the degree of Doctor of Philosophy in Electrical and Computer Engineering. The thesis proposes a technique for high level synthesis of digital signal processing cores targeting field programmable gate arrays (FPGAs). The technique aims to optimize the total execution time of the synthesized architecture using integer linear programming while accounting for the structural characteristics of FPGAs early in the synthesis process. This includes optimizing interconnect usage and estimating system clock duration.
The document discusses migrating a mobile core application from its native infrastructure to a simplified infrastructure (SI). It aims to analyze whether the SI can maintain the same level of availability as the native infrastructure without major architecture changes. The author conducted a theoretical study of the native infrastructure and identified functions relying on non-IP interfaces. The study analyzed how these functions could become unavailable on the SI. Laboratory tests were performed on an SI prototype to verify proposals for maintaining availability. The results confirmed the SI could achieve higher availability than the native infrastructure if the proposals are successfully implemented.
The document discusses optimizing and implementing the ParaDiS dislocation dynamics simulation code on a parallel cluster. Key optimization techniques included loop unrolling, SIMD intrinsics for vector calculations, a write buffer, and further OpenMP optimizations. Simulation results using the optimized code supported experimental data and models of strain hardening in materials.
This master's thesis explores integrating a home automation architecture based on EnOcean technology with a telecommunications architecture. It develops an EnOcean resource adaptor for a JAIN SLEE application server to communicate with an EnOcean gateway. It then develops a JAIN service that allows a SIP user agent to control and monitor home automation devices like lamps, motion sensors, and energy meters through an interactive voice response system. The goal is to demonstrate a method for remote management and control of EnOcean-based smart home devices through a telecommunications network and SIP.
This thesis presents a generalized Monte Carlo tool for investigating the properties of materials using a non-parabolic band structure model. The tool allows users to define new material parameters and properties by making every parameter a variable. It incorporates various scattering mechanisms and uses an analytic band structure model, making it fast. The tool has been integrated with the Rappture interface and deployed on nanoHUB.org for broad accessibility. Results from the tool closely match experimental data for common semiconductors like silicon, germanium, and gallium arsenide, demonstrating its versatility. The user-friendly interface allows defining materials and obtaining accurate results without coding.
Design and Development of a Knowledge Community SystemHuu Bang Le Phan
The document is a dissertation submitted by Le Phan Huu Bang to the Department of Computer Science at the National University of Singapore in 2008/2009 describing the design and development of a Knowledge Community System (K-Comm). The dissertation includes chapters on introducing knowledge and the need for knowledge sharing, reviewing existing literature, providing an overview of the K-Comm system and its features, and describing the implementation of K-Comm.
Design and Simulation of Local Area Network Using Cisco Packet TracerAbhi abhishek
This document describes a project to design and simulate a local area network (LAN) for a college using Cisco Packet Tracer. The project aims to study network topologies, design a topology for the college, configure IP addresses and subnets, and simulate packet transmission between departments. It will examine concepts like topology design, IP addressing, and using virtual LANs to separate departmental traffic. The results will provide insights into network simulation and performance analysis.
This project aims to design and implement a system called PHAMONAS to provide soil nutrient availability data. The system will have four subsystems: 1) detection of pH and moisture levels for up to six research nurseries, 2) communication of data to an online database via GPRS, 3) storage of data in the online database, and 4) a decision-making subsystem to provide nutrient availability and treatment recommendations to users. The system seeks to enable more efficient agronomy research by automating data collection, storage, and analysis at a lower cost compared to current manual methods. It will allow research facilities to more easily obtain nutrient data, compare treatments, and share results.
An investigation into the physical build and psychological aspects of an inte...Jessica Navarro
This dissertation investigates creating an interactive information point and examines the psychological effects on users. The student aims to build an animatronic information point that tracks objects and interacts with users. Research covers object tracking hardware/software, human-computer interaction, and effects of anthropomorphism. The student will create a physical animatronic head, programming in LabVIEW and Roborealm, conduct user testing via questionnaire, and analyze the results. The dissertation aims to determine if a more lifelike interactive information point improves the user experience of conveying information.
Makgopa Setati_Machine Learning for Decision Support in Distributed Systems_M...Makgopa Gareth Setati
This document presents research on using machine learning techniques like artificial neural networks and genetic algorithms to optimize a system for predicting time series data, specifically daily stock prices. Neural networks are used for prediction, while genetic algorithms are used to optimize both the input data and neural network architecture to improve predictions. The results show that the machine learning approaches help refine the predictive capability of the system. The document aims to contribute to research on applying machine learning in distributed networks for decision support.
A Software Approach for Lower Power Consumption.pdfHanaTiti
This document summarizes related work on software power optimization techniques, including instruction scheduling for low power consumption. It discusses previous research on software power estimation models, energy code generation, reducing memory access, symbolic algebra optimization, and list scheduling algorithms for low power. The document analyzes various approaches for optimizing power through software techniques such as instruction selection, register allocation, and reordering instructions to minimize overhead costs between pairs of instructions.
Accelerated Prototyping of Cyber Physical Systems in an Incubator ContextSreyas Sriram
The document summarizes the prototyping history of a microscope prototype developed by a startup incubated at a technology business incubator. It describes the evolution from initial prototypes (V1 and V2) composed primarily of 3D printed parts to later prototypes (V3, static prototype, skeleton BOM, mechatronics BOM) composed mainly of commercial off-the-shelf components. This shift led to an increase in the total number of components from around 40 to over 200. The increased use of standardized, replaceable parts like fasteners contributed to making the design more modular. Analysis of contributions by student designers found that the prototyping process, which began in 2017, involved over 700 days of combined work until completion
This project describes integrating wind power into a DC microgrid that stores and transforms power. A microgrid consists of distributed energy sources like wind turbines and solar PV systems connected to electrical loads. The project simulates connecting a wind turbine to an asynchronous machine, rectifier, and DC bus using Simulink. Operational optimization of the microgrid is analyzed to minimize costs and emissions while maintaining supply-demand balance and battery state of charge. Integration of the DC microgrid is proposed and simulation results are presented.
This document discusses using an artificial neural network to predict the success of a logistics network. It begins by introducing the research purpose of analyzing logistics network performance using ANN techniques. It then provides details on the neural network model and methodology, including analyzing sample networks to find shortest paths and relationships between parameters. The research implementation section describes using MATLAB to set up example neural networks for sample logistics networks and comparing predicted outputs to actual industry data. The conclusion suggests that neural networks can provide intelligent predictions for logistics networks given sufficient historical data.
An investigation into the critical factors involved in developing a Knowledge...Gowri Shankar
This document is the final report submitted by Gnanamoorthy Gowri Shankar for the course BN89 Master in Project Management at Queensland University of Technology. The report investigates the critical factors involved in developing a knowledge management system in project-based manufacturing industries, using the automobile components manufacturing industry as a case study. The report includes an introduction outlining the problem statement, objectives and approach. It also includes a literature review, research methodology, results and discussion, conclusion, and recommendations. The aim is to identify issues influencing knowledge management implementation and provide recommendations to improve knowledge management practices in the industry.
ILIC Dejan - MSc: Secure Business Computation by using Garbled Circuits in a ...Dejan Ilic
This thesis introduces a web based system for secure evaluation of economic function, named Secure Business Computation (SBC), in the manner suggested by Yao 1982
The document proposes developing software to enable secure and authorized dynamic group resource management. It aims to implement attribute-based access control and dynamic delegation of access rights to address limitations in existing group-centric applications. The research plan involves three phases: literature review and requirements analysis; core implementation of access control and delegation features; and testing, performance analysis, and real-world deployment. The proposed software would facilitate secure collaboration and resource sharing for educational institutions and organizations.
E.Leute: Learning the impact of Learning Analytics with an authentic datasetHendrik Drachsler
Nowadays, data sets of the interactions of users and their corresponding demographic data are becoming more and more valuable for companies and academic institutions like universities
when optimizing their key performance indicators. Whether it is to develop a model to predict the optimal learning path for a student or to sell customers additional products, data sets to
train these models are in high demand. Despite the importance and need for big data sets it still has not become apparent to every decision-maker how crucial data sets like these are for the
future success of their operations.
The objective of this thesis is to demonstrate the use of a data set, gathered from the virtual learning environment of a distance learning university, by answering a selection of questions in
Learning Analytics. Therefore, a real-world data set was analyzed and the selected questions were answered by using state-of-the-art machine learning algorithms.
This document is the final report for a third year project on designing a wearable kinetic energy harvester. It acknowledges those who assisted with the project and declares that the report contains only the author's original work. The project aimed to create a device that converts arm movements into usable energy through electromagnetic induction using magnets and coils. Tests confirmed the implemented circuit was successful in harvesting kinetic energy from arm motions and charging a battery according to the project's objectives of creating an efficient and cost-effective wearable energy harvester. Final assembly and testing was planned for after submission of the report.
This thesis presents new techniques to diagnose process variability in industrial plasma etching systems. Optical emission spectroscopy and electrical sensors are used to measure plasma parameters during etching. Statistical analysis methods like principal component analysis and gradient boosting trees are applied to sensor data to correlate variability in etch rate with measurements. A case study of an etching process is presented where time series OES data is analyzed across process steps. Variability due to the "first wafer effect" is investigated. Process excursions are also examined by studying shifts in principal components. The combination of in-situ sensors and statistical analysis provides a powerful tool for process engineers to diagnose sources of variability.
Ähnlich wie Analysis on Implementation of different CNN Architectures on FPGAs | Undergrad Thesis - BITS F421T Thesis | Author: Prayag Mohanty |BITS Pilani KK Birla Goa Campus (20)
"Touch-Me-Not" by Ismat Chughtai: A Critical AnalysisPrayag Mohanty
"Touch-Me-Not" by Ismat Chughtai is a daring and provocative short story that delves into themes of gender, power dynamics, and societal expectations in mid-20th century Indian society. Set against the backdrop of a conservative Muslim household, the story follows the protagonist, Sultana, a young woman who rebels against the traditional roles imposed upon her.
Chughtai's narrative challenges traditional gender norms by portraying Sultana as a defiant figure who refuses to conform to the patriarchal expectations placed upon her. Through her character, Chughtai explores the complexities of female desire and agency in a society that seeks to suppress them.
The title "Touch-Me-Not" serves as a metaphor for Sultana's resistance to being confined or controlled by others. It symbolizes her determination to assert her autonomy and challenge the restrictive norms of her environment.
Chughtai's writing is characterized by its boldness and frankness, tackling taboo subjects with unflinching honesty. She exposes the hypocrisy and double standards inherent in a society that places undue emphasis on female chastity and obedience.
Overall, "Touch-Me-Not" offers a thought-provoking critique of gender roles and societal expectations, while also celebrating the courage and resilience of those who dare to defy them.
Periodic Styles in Indian Traditional Art - Mughal, Kangra, MiniaturePrayag Mohanty
Periodic Styles in Indian Traditional Art showcase the rich cultural heritage and artistic excellence that has evolved over centuries. Among these, three prominent styles stand out: Mughal, Kangra, and Miniature. Each style reflects distinct influences, techniques, and thematic representations, contributing to the diverse tapestry of Indian art.
1. **Mughal Art:**
Mughal art flourished during the Mughal Empire (16th to 18th centuries) under the patronage of emperors like Akbar, Jahangir, and Shah Jahan. It is characterized by its intricate detailing, vivid colors, and a fusion of Persian, Islamic, and Indian styles. Mughal paintings often depict historical events, court scenes, flora, fauna, and portraits of rulers and nobility. Artists employed techniques like meticulous brushwork, precise draftsmanship, and the extensive use of gold leaf to create opulent and lifelike compositions.
2. **Kangra Art:**
Kangra painting originated in the Kangra Valley of Himachal Pradesh during the 17th to 19th centuries. It embodies the serene beauty of nature, love, and devotion. Kangra paintings are renowned for their delicate lines, pastel hues, and ethereal imagery, often depicting scenes from Hindu mythology, particularly the love stories of Radha and Krishna. Artists of Kangra school mastered the art of portraying emotions through subtle facial expressions and graceful gestures. The use of natural pigments derived from minerals and plants imparts a soft and luminous quality to these exquisite works of art.
3. **Miniature Art:**
Miniature painting is a meticulous and intricate art form that reached its pinnacle during the Mughal and Rajput periods (16th to 19th centuries). Miniatures are characterized by their diminutive size and elaborate detailing. Artists employed fine brushes, often made from squirrel hair, to create miniature masterpieces on materials such as paper, ivory, or cloth. Themes ranged from courtly scenes, religious narratives, and portraits to landscapes and flora. Miniatures are distinguished by their vibrant colors, intricate patterns, and meticulous attention to detail, showcasing the artist's skill and imagination within a confined space.
These Periodic Styles in Indian Traditional Art not only serve as visual representations of history, culture, and mythology but also as enduring testaments to the artistic genius and creativity of Indian artisans through the ages. Each style encapsulates its own unique blend of techniques, themes, and aesthetics, contributing to the rich tapestry of India's artistic heritage.
Pattachitra - Elements of Design | Intro to Contemporary Arts PresentationPrayag Mohanty
Step into the vibrant world of Pattachitra, an ancient Indian art form that marries storytelling with intricate designs and vibrant colors. In this presentation, we delve deep into the heart of Pattachitra, unraveling its mysteries through the lens of design elements.
As the lights dim, the canvas comes to life, adorned with tales of gods, goddesses, and everyday life, intricately depicted through bold lines and vivid hues. Each stroke of the brush is a testament to the skill and creativity of the artist, conveying emotions and narratives with unparalleled precision.
Our journey begins by dissecting the foundational elements of Pattachitra design. We explore the significance of line, the backbone of this art form, which guides the eye and delineates shapes with grace and fluidity. From delicate curves to bold, decisive strokes, each line tells a story of its own, weaving together a tapestry of tradition and innovation.
Next, we immerse ourselves in the world of color, where every hue holds meaning and symbolism. From the fiery reds symbolizing passion and courage to the serene blues evoking tranquility and spirituality, we decode the language of color in Pattachitra, understanding its role in conveying mood, emotion, and narrative.
Moving forward, we examine the intricate patterns and motifs that adorn Pattachitra compositions. From floral motifs representing nature's bounty to geometric patterns symbolizing cosmic order, these elements add depth and texture to the artwork, inviting viewers to explore their intricacies and unravel their hidden meanings.
But Pattachitra is more than just lines, colors, and patterns—it is a living, breathing art form that reflects the culture, beliefs, and traditions of its creators. As we conclude our presentation, we reflect on the timeless beauty and enduring legacy of Pattachitra, celebrating its ability to transcend time and space, and inspire generations to come.
Join us on this enlightening journey as we unravel the mysteries of Pattachitra through the lens of design elements, gaining a deeper appreciation for this exquisite art form and the artisans who bring it to life.
This document delves into the captivating realm of Modern Indian Art, elucidating its underlying principles of design. Offering a comprehensive exploration, it navigates through the intricate tapestry of artistic expression that characterizes the contemporary Indian art scene. From vibrant color schemes to dynamic compositions, from traditional motifs to avant-garde innovations, this document unveils the fundamental principles that shape and define modern Indian artistic endeavors. Through insightful analysis and illustrative examples, it seeks to unravel the essence of Indian artistry, offering a rich tapestry of inspiration for artists, enthusiasts, and scholars alike. Whether one seeks to understand the fusion of tradition and modernity or explore the diverse cultural influences that permeate Indian art, this document serves as an indispensable guide to unraveling the mysteries of Modern Indian Art's design principles.
PYAAR KA PUNCHNAMA 2024 | BITS Goa Quiz Club | Modern Love Quiz | QM: Prayag ...Prayag Mohanty
Do you believe in 'Ishq wala love' or are you the 'Oyo room' kinda person?
Modern day love is a spectrum which nobody understands everything but yet there is something for everyone.
BITS Goa Quiz Club returns with "Pyaar ka Punchnama", a banger of a set on modern-day love and the traditions surrounding the recently concluded valentines week.
Open to all (this includes couples, singles & hopeful romantics, even potential third-wheelers 😏)
Date: 22 Feb'24
Time: 7 PM
Venue: DLT-5
See you there!!
https://www.instagram.com/p/C3mk-Ykt4GT/?igsh=aHB2Z29rdnNtOG02
AN ANALYSIS ON SEABORNE VESSEL TRAFFIC & ECONOMY | Maritime Studies Prayag Mohanty
Maritime trade is an activity that involves the transportation of goods and people on a body of water. It used to be conducted by ships traveling on open seas, but with the advent of globalization, seaways have become more bustling. The global economy has evolved into a massive interdependent system where all countries participate in many facets of our lives. One way for this system to function effectively is for each country to have a strong maritime industry which is capable of offering food products and energy sources, as well as creating jobs.
To find the answer to the question – How do they contribute to more trade, investments and opportunities? We have used our own analysis methods to analyze cargo and tanker traffic data over 14 days and over two years. Also, it is evident that when comparing containerized cargo and tanker movements conveyed by vessels of different sizes (from bulk carrier to container vessel) both have various advantages like enhanced speed (containerized cargo), reduced turnaround time (containerized cargo), etc.
Quizzinga- Biz & Inno Quiz | Coalescence'23 | BITS Goa Quiz Club | QM:Prayag...Prayag Mohanty
Get ready to test yourself in the world of business, from start ups to product innovation as we bring you our very own Business and Innovation quiz! Join in teams of three to showcase your trivia and creativity skills! With a prize pool of 50K, this is the most lucrative place to flex your quiz skills!
First Prize: 30k
Second Prize: 15k
Third Prize: 5k
Date: 16th Sep
BITS Goa Quiz Club x BITS Goa Culinary Club
present
POTLUCK, A Food & Culinary Quiz open to all culinary connoisseurs! 🧇🍕
Join us at for a fun time to get your brain pickin'🧠 on food, gastronomy, gourmet and everything in between
Open to all BITSians !!
Venue: LT-1
Date: 2nd October
Time: 5:00 PM
🍴🧁 Unleash Your Inner Foodie: Embark on a Culinary Quiz Adventure! 🍔🍣
Calling all food enthusiasts, aspiring chefs, and curious minds with a taste for knowledge! Are you ready to tantalize your taste buds and put your culinary wisdom to the test? Look no further! Join us for an exciting and mouthwatering culinary quiz presentation on SlideShare that promises to challenge your food knowledge and leave you craving for more!
🍳 What Awaits You in the Culinary Quiz:
Prepare yourself for a delectable smorgasbord of food-related questions from around the world! From iconic dishes and culinary techniques to food history and exotic ingredients, our quiz will take you on a journey of flavors and aromas, all while enhancing your understanding of the culinary arts.
Aero Quiz 2022 | QM: Prayag Mohanty | BITS Pilani KK Birla Goa Campus | BITS ...Prayag Mohanty
Do you love watching planes takeoff the tarmac? Or enjoy being updated with the latest trends in tech?
BITS Goa Quiz Club in collaboration with Aerodynamics Club is delighted to announce its latest quiz - Just Wing It!
About:
Time: 7 - 8 PM
Date: 15th October
Venue: DLT8
Topic: Aerodynamics, aviation (broadly, across themes and genres)
Guidelines:
- Open to all BITSians
- Teams of 1-3, individual participation is allowed
- Carry along a pen & a sheet of paper
Register now, link in bio
Expect some good trivia & fun !!
See you there!!
Calling all aviation enthusiasts, engineering aficionados, and science buffs! Are you ready to delve into the world of aerodynamics and uncover the secrets behind flight? Look no further! Join us for an exhilarating aerodynamics quiz presentation on SlideShare that will take you on an educational journey through the fascinating realm of air and wings!
🔍 What Awaits You in the Quiz:
Prepare to be amazed by a comprehensive collection of aerodynamics questions that cover everything from the principles of lift and drag to the evolution of aircraft design. Our quiz has been meticulously curated to appeal to learners of all levels, from beginners to seasoned aviation experts.
Development Economics Assignment (NAFTA + MERCOSUR): A look at the Industrial...Prayag Mohanty
The 21st century has been a tumultuous one for the industrial sector of different countries. The population has seen a slow but a definite rise, so has Gross Domestic Product (GDP) over the past 2 decades. The report below collects and analyses the parameters of the industrial sectors of NAFTA and MERCOSUR countries.
The North American Free Trade Agreement (NAFTA) was implemented in 1994 to encourage trade between the U.S., Mexico, and Canada. NAFTA reduced or eliminated tariffs on imports and exports between the three participating countries, creating a huge free-trade zone.
MERCOSUR, also known as the Common Market of the South, is a trade bloc agreement that exists between the following South American countries: Argentina, Brazil, Paraguay, Uruguay, and Venezuela. The trade bloc was established under the Treaty of Asuncion in March 1991; it was then expanded under the 1994 Treaty of Ouro Preto, which set up a formal customs union.
The main objective of Mercosur is to bring about the free movement of goods, capital, services, and people among its member states. In addition to the four founding members of Mercosur and Venezuela, there are five countries with associate member status. These countries are Bolivia, Chile, Colombia, Ecuador, and Peru. As associate members, they can join free-trade agreements but do not receive the benefits of the customs union.
In the report, we have considered the following variables for analysing the data: 1. Population changes
2. GDP changes
3. Wages changes
4. L changes
5. Femaledistribution
6. Unemploymentrate
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
artificial intelligence and data science contents.pptxGauravCar
What is artificial intelligence? Artificial intelligence is the ability of a computer or computer-controlled robot to perform tasks that are commonly associated with the intellectual processes characteristic of humans, such as the ability to reason.
› ...
Artificial intelligence (AI) | Definitio
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Analysis on Implementation of different CNN Architectures on FPGAs | Undergrad Thesis - BITS F421T Thesis | Author: Prayag Mohanty |BITS Pilani KK Birla Goa Campus
1. Analysis on Implementation of different
CNN Architectures on FPGAs
UNDERGRADUATE THESIS
Submitted in partial fulfillment of the requirements
of BITS F421T Thesis
By
PRAYAG MOHANTY
ID No. 2020A3PS0566G
Under the supervision of:
Dr. AMALIN PRINCE A.
BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE PILANI, GOA CAMPUS
December 2023
1
2. Declaration of Authorship
I, Prayag Mohanty, declare that this Undergraduate Thesis titled, ‘Analysis on
implementation of different CNN Architectures on FPGA’ and the work presented in it
are my own. This was undertaken in the First Semester of 2023-24. I confirm that:
● This research was primarily conducted while I was a candidate for a research
degree at this University.
● Any portions of this thesis previously submitted for a degree or qualification at
this or another institution are explicitly identified.
● I consistently and clearly credit any consulted published works of others.
● All quotations are attributed to their original sources. With the exception of such
quotations, the content of this thesis is entirely my own original work.
● I have expressed my gratitude for all significant sources of assistance.
● If the thesis draws on work I conducted collaboratively with others, I have clearly
outlined each individual's contribution, including my own.
Signed:
Date: 12 / 12 / 23
i
3. Certificate
This is to certify that the thesis entitled, “Analysis on implementation of different CNN
Architectures on FPGA” and submitted by Prayag Mohanty ID No. 2020A3PS0566G in
partial fulfillment of the requirements of BITS F421T Thesis embodies the work done by
him under my supervision.
_____________________________
Supervisor
Dr. Amalin Prince A.
Professor, Dept. of EEE
BITS-Pilani K.K.Birla Goa Campus
Date: 12 / 12 / 23
ii
4. “Knowledge is a tool, best shared. So is my thesis :) ”
-Prayag Mohanty
iii
5. BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE PILANI, K.K.BIRLA GOA
CAMPUS
Abstract
Bachelor of Engineering (Hons.)
Analysis on implementation of different CNN Architectures on FPGA
by Prayag Mohanty
Convolutional Neural Networks (CNNs) are a special type of neural networks that are
exceptionally good at working with data, like images, signals etc. The usage of
Field-Programmable Gate Arrays (FPGAs) in high-performance computing has garnered
significant attention with the advent of Artificial Intelligence. This thesis investigates
the performance and resource utilization of various convolutional neural network (CNN)
models for implementation on Field-Programmable Gate Arrays (FPGAs). The primary
objective is to identify optimal CNN models for FPGA deployment based on their
performance, resource utilization, and other relevant parameters. Two prominent CNN
models, AlexNet and MobileNet, were chosen for analysis. Both models were
implemented on an FPGA platform. Performance metrics such as resource utilization
metrics, including logic slices, memory blocks, and DSP slices, were monitored to assess
the hardware requirements of each model. The evaluation results demonstrate that
MobileNet exhibits significantly lower resource utilization compared to AlexNet while
maintaining a commendable level of performance. This suggests that MobileNet is a
more efficient option for deploying CNN models on FPGAs with limited hardware
resources. AlexNet, on the other hand, offers superior performance but at the expense of
higher resource consumption. This makes it a suitable choice for applications where
performance is paramount and resources are less restricted.This analysis provides
valuable insights into the suitability of different CNN models for FPGA implementation
based on their performance and resource utilization characteristics.
Keywords: Convolutional Neural Networks, FPGA, Performance, Resource Utilization,
AlexNet, MobileNet
iv
6. Acknowledgements
The journey of completing this thesis has been a rewarding but challenging one, and I
would like to express my heartfelt gratitude to those who have supported me throughout
the process. First and foremost, I want to thank my family for their unwavering love and
support. Their constant encouragement and belief in me have been instrumental in
helping me overcome obstacles and persevere through difficulties. I am especially grateful
for the sacrifices they made to enable me to pursue my educational goals.I extend my
sincere thanks to my relatives and friends for their encouragement and understanding. I
owe a debt of immense gratitude to my thesis supervisor, Professor Amalin Prince A.
whose guidance, expertise, and patience have been invaluable in shaping my research and
helping me refine my work. I am deeply grateful for their insightful feedback,
constructive criticism, and unwavering support throughout the research process. Finally, I
would like to express my sincere appreciation to my institute, BITS Pilani KK Birla Goa
Campus. The institution's excellent academic environment, equipment, and dedicated
faculty have provided me with the foundation and resources necessary to conduct my
research.
Thank you all for your invaluable contributions.
v
7. Contents
Declaration of Authorship i
Certificate ii
Abstract iv
Acknowledgements v
Contents vi
List of Figures viii
List of Tables ix
Abbreviations x
1 Introduction 1
1.1 Motivation.................................................................................................................. 1
1.2 Scope & Structure......................................................................................................2
2 Fundamentals 3
2.1 Current Work............................................................................................................ 3
2.1.1 Theoretical Background...................................................................................3
2.1.2 FPGA................................................................................................................ 3
2.2 Literature Review..................................................................................................... 4
1.2.1 AlexNet.........................................................................................................10
1.2.2 ResNet...........................................................................................................11
1.2.3 MobileNet..................................................................................................... 14
3 Design and Implementation 15
3.1 Design......................................................................................................................15
3.2 Implementation.......................................................................................................16
4 Hardware Implementation 20
4.1 Design Methodology................................................................................................20
4.2 HLS Methodology....................................................................................................20
4.3 Design Overview .................................................................................................... 20
4.4 Caching Strategy.....................................................................................................21
vi
11. List of Tables
1.20 Table 1: Specifications of Zedboard……………………………................................................................ 19
1.21 Table 2: Resource Utilization of Final Design. ...................................................................................... 21
1.22 Table 3: Hardware execution times of each AlexNet Layer…………..................................................... 24
1.23 Table 4: Simulation Model vs Hardware Implementation.................................................................... 25
1.24 Table 5: Comparing AlexNet vs MobileNet……..................................................................................... 25
1.25 Table 6: Comparison of other works to this work………………………….............................................. 26
ix
12. Abbreviations
CNN Convolutional Neural Networks
FPGA Field Programmable Gate Arrays
AI Artificial Intelligence
ML Machine Learning
HLS High Level Design
DSP Digital Signal Processing
13. Dedicate this to my family, friends, relatives
and electronics.
14
14. 1.Introduction
1.1 Motivation
The field of high-performance computing (HPC) has witnessed a significant shift in recent
years, driven by the ever-increasing demand for processing power across diverse application
domains. This growth is fueled by advancements in various fields, including science,
engineering, finance, and healthcare, each requiring the ability to analyze and process
massive datasets in real-time. To address this growing demand, researchers have turned to
Field-Programmable Gate Arrays (FPGAs) as a promising alternative to traditional CPUs
and GPUs.
FPGAs offer several key advantages over traditional computing architectures. Their
reconfigurable nature allows them to be tailored to specific tasks, leading to significant
performance improvements compared to general-purpose CPUs. Additionally, FPGAs excel
in energy efficiency due to their parallel processing capabilities and optimized hardware
design. This combination of performance and efficiency makes FPGAs ideal candidates for
accelerating computationally intensive workloads in HPC.
Over the past few decades, the field of Artificial Intelligence (AI) has experienced
tremendous progress, revolutionizing numerous aspects of our lives. From image and
speech recognition to natural language processing and autonomous vehicles, AI has
demonstrably impacted various industries and scientific domains. This rapid advancement
is fueled by the increasing availability of computing resources and data, enabling the
development and deployment of complex machine learning algorithms and neural
networks.
However, the growing demand for AI applications necessitates the development of efficient
and scalable neural networks. Traditional software-based implementations often struggle to
handle the demands of real-time processing and resource limitations on mobile and
embedded systems. This is where FPGAs present a compelling solution. With their inherent
1
15. parallelism and hardware flexibility, FPGAs can be leveraged to implement efficient neural
networks that deliver superior performance and energy savings compared to software-based
approaches.
The motivation for this project stems from the desire to explore the potential of FPGAs in
accelerating Convolutional Neural Networks (CNNs), a class of neural networks widely
used in various AI applications, particularly image and video processing. CNNs excel in
extracting features and identifying patterns in images, making them instrumental for tasks
such as image recognition, object detection, and image segmentation.
My primary objective is to analyze and compare different CNN architectures available for
implementation on FPGAs. This analysis focuses on key performance metrics like resource
utilization, scalability, and real-time processing capabilities. The ultimate goal is to identify
and optimize a CNN model that delivers the best performance on the Zedboard, a popular
FPGA development platform.
Additionally, the potential for deploying CNNs on low-resource systems like smartphones
motivates this project. This enables the processing of sensitive data directly on the device,
eliminating the need for internet data transmission and ensuring data privacy.
Furthermore, integrating CNNs into embedded systems opens up exciting possibilities for
real-time applications in areas like robotics, autonomous vehicles, and smart home
technologies.
By exploring the implementation of various CNNs on FPGAs, this project aims to
contribute to the development of efficient and scalable AI solutions for resource-constrained
environments. The insights and findings will provide valuable knowledge and pave the way
for future research in the field of hardware-accelerated AI.
1.2 Scope & Structure
The prospect of creating a whole framework capable of analyzing data in real time
piqued my interest. However, due to the task's complexity and the lack of intensive
experience with Neural Networks earlier, the scope was reduced to the following points.
2
16. 1. The data set was restricted to numbers. This would be a simple & good starter for
other forms of information like written language, signals etc.
2. Only individual pre-existing images were used for static analysis. The main
reason for this decision is because, while there exist Neural Networks capable of properly
analyzing video, their complexity has risen and the analysis for their use in embedded
systems has not yet been fully established, which would add an additional risk to the
project.
The project needs to be broken down into two independent sub-problems that can be tackled
separately. However, when combined, they will provide the desired overall outcome.
1. This work aims to develop a system configured to run as many layers as desired and
test it using a currently defined CNN configuration, AlexNet. This type of system would
allow a developer to scale a design to fit any size of FPGA.
2. Comparing two CNN architectures, AlexNet and MobileNet on the basis of their
measurable parameters like performance, speed, DSP slice, LUTs etc. on a Zedboard.
This would help determine the compatibility of these models on a sample Zedboard.
3
17. 2.Fundamentals
2.1 Current Work
2.1.1 Background
Convolutional Neural Networks (CNNs) are a type of artificial intelligence that fall within
the field of machine learning and are also categorized as a deep learning technique.
Neural networks: Inspired by the human brain, neural networks are computational
structures composed of interconnected nodes called neurons. These neurons receive and
process information from each other, mimicking the way synapses in the brain facilitate
communication. This intricate network of connections, numbering in the millions, underlies
the complex thought processes and behavior observed in humans and other intelligent
beings.
Artificial Neural Networks use the way neurons interact - to construct systems in which
each of the building blocks (usually referred to as neurons) receives several inputs that are
weighed using weights and produces an output that is sent to several other building blocks.
Fig 1. shows the hardware architecture of a neuron.
Fig.1 Neuron Architecture (Reddy, 2019)
4
18. A neuron receives multiple inputs, such as pixel values or sound data, depending on the
application. It multiplies the inputs (say x) with suitable weights (w) and adds bias (b) . The
function σ(w⋅x+b)is obtained.
Functionality: Neural networks excel at classifying inputs into predetermined categories.
This ability stems from assigned weights to each neuron within the network. A crucial step
called training determines the specific combination of weights that enables accurate
classification. During this phase, the network receives numerous inputs with known
outputs, and the weights are adjusted iteratively until an optimal configuration is achieved.
Topology: To provide all neurons with a suitable structure for analyzing input data, they
can be organized in various ways. In our project, we will focus on networks where neurons
are arranged in ordered layers, only receiving input from the preceding layer and sending
output to the subsequent one. Consequently, the network's topology is defined by how the
layers are interconnected and the operations performed within each layer, often utilizing
previously learned weights.
Convolutional Neural Networks (CNNs) are a special type of neural networks that are
really good at working with 2D data, like images. They are commonly used for tasks like
identifying objects in images or labeling scenes.
Imagine a 256x256 image with three color channels (RGB). Feeding this pixel data into a
conventional neural network would require millions of weights, due to the typical
connectivity between neurons across layers. However, CNNs leverage the inherent spatial
locality of information in images. For instance, to identify a car in an image, analyzing
pixels in the top-right corner isn't crucial. Features like edges, lines, circles, and contours
provide enough context.
This is where convolutional layers come in. These specialized layers replace fully-connected
layers, allowing the network to focus on local information and extract meaningful features.
Each convolutional layer receives a stack of images as input and generates another stack as
output. These layers utilize small filters (kernels) to scan the input and extract features.
These filters, equipped with learned weights, help the network recognize patterns and
objects in the images.
5
19. In essence, CNNs employ convolutional layers to efficiently capture key features in images,
facilitating accurate image understanding and classification.
Convolutional Layer Details:
● Each input layer receives a stack of 2D images (chin) with dimensions hin×win,
referred to as input feature maps.
● Each layer outputs a stack of 2D images (chout) with dimensions hout×wout, called
output feature maps.
● Each layer utilizes a stack of chin×chout kernels (or 2D filters) with dimensions k×k
(typically ranging from 1x1 to 11x1) containing the trained weights.
By focusing on local information and utilizing efficient convolutional layers, CNNs achieve
exceptional performance in image-related tasks, solidifying their position as a powerful tool
for image processing and computer vision applications.
Fig.2 Layers in a CNN model (Goodfellow,2016)
Activation and Pooling
Activation: Each linear activation is then passed through a non-linear activation function.
This stage, also known as the "detector stage," introduces non-linearity into the network,
allowing it to learn complex relationships between features. A popular choice for the
activation function is the rectified linear unit (ReLU), which outputs the input value if it is
positive, and zero otherwise.
Pooling: This stage further modifies the layer's output by applying a pooling function.
Pooling functions summarize the output within a specific neighborhood, often reducing the
6
20. dimensionality of the data. Common pooling functions include:
● Max pooling: Replaces each output with the maximum value within its rectangular
neighborhood.
● Average pooling: Replaces each output with the average value within its rectangular
neighborhood.
● L2-norm pooling: Replaces each output with the L2 norm of the values within its
rectangular neighborhood.
● Weighted average pooling: Replaces each output with a weighted average based on
the distance from the central pixel.
By performing these stages sequentially, CNNs extract and learn features from input data,
enabling them to perform complex tasks like image recognition and natural language
processing.[4] (Shown below Fig.3)
Fig.3: A typical convolutional neural network layer's components (Goodfellow, 2016)
Convolutional networks (ConvNets) can be described using two distinct sets of terminology.
Left-hand View: This perspective treats the ConvNet as a collection of relatively complex layers, each containing
multiple "stages." Each kernel tensor directly corresponds to a network layer in this interpretation.
Right-hand View: This perspective presents the ConvNet as a sequence of simpler layers. Every processing step
within the network is considered its own individual layer. Consequently, not every "layer" possesses learnable
parameters.[4]
Practical Convolution
Convolution in the context of neural networks transcends a singular operation. It involves
the parallel application of multiple convolutions, leveraging the strength of extracting
diverse features across multiple spatial locations. A single kernel can only identify one type
of feature, limiting the richness of extracted information. By employing multiple kernels in
parallel, the network extracts a broader spectrum of features, enhancing its
representational power.
7
21. Neural networks often handle data with a richer structure than mere grids of real values.
The input typically consists of "vector-valued observations," where each data point holds
additional information beyond a single value. For instance, a color image presents red,
green, and blue intensity values at each pixel, creating a 3-dimensional tensor. One index
denotes the different channels (red, green, blue), while the other two specify the spatial
coordinates within each channel[4].
Software implementations of convolution often employ "batch mode," processing multiple
data samples simultaneously. This introduces an additional dimension (the "batch axis") to
the tensor, representing different examples within the batch. For clarity, we will disregard
the batch axis in our subsequent discussion [4].
A crucial element of convolutional networks is "multi-channel convolution," where both the
input and output possess multiple channels. This multi-channel nature introduces an
interesting property: the linear operations involved are not guaranteed to be commutative,
even with the implementation of "kernel flipping." Commutativity only holds true when
each operation involves the same number of input and output channels.
To illustrate these concepts, consider a 3-channel color image as the input to a convolutional
layer with multiple kernels. Each kernel extracts a specific type of feature from each
channel, resulting in multiple "feature maps." These feature maps, when combined, form
the output of the convolution operation[4].
Training a Neural Network
Because training is computationally expensive, there are frameworks and tools available to
help with this process. Two popular ones are Caffe and Tensorflow.
In this thesis, different frameworks were explored gradually, starting with simpler ones and
gradually moving towards more advanced ones, as we had limited prior knowledge.
There exist two primary forms of training for neural networks:
1. Full training: In situations where an ample amount of data is accessible, it is possible
to train all the network weights to enhance results tailored to the specific application.
2. Transfer learning: Frequently, insufficient data is available to train all the weights
from the ground up. In such instances, a prevalent strategy involves employing a
pre-trained network designed for a distinct application. The majority of layer weights are
repurposed, with only the final layer being adjusted to align with the requirements of the
8
22. new application.
2.1.2 FPGAs
Field-Programmable Gate Arrays (FPGAs) are a type of integrated circuit that can be
reprogrammed and reconfigured countless times after they have been manufactured.
These devices form the foundation of reconfigurable computing, a computing approach
that emphasizes splitting applications into parallel, application-specific pipelines. FPGAs
have reconfigurable logic resources like LUT (Look-Up Tables), DSPs (Digital Signal
Processors), and BRAMs (Block RAMs). These resources can be connected and configured
in various ways, allowing the implementation of different electronic circuits. The allure
of reconfigurable computing lies in its ability to merge the rapidity of hardware with the
adaptability of software, essentially bringing together the most advantageous features of
both hardware and software.
Harnessing the computational power of FPGAs takes a leap forward with distributed
computing. This strategy clusters FPGAs, dividing problems into smaller tasks for
parallel processing. By working as a team, this distributed network unlocks significant
performance gains through parallelization.
This approach offers key benefits:
● Scalability: Easily add FPGAs to the cluster as computational demands grow.
● Efficiency: Shared resources and coordinated tasks optimize resource utilization.
● Flexibility: Adapt and optimize the configuration to meet specific needs.
● Performance: Parallelization boosts processing speed for quicker results.
Distributed FPGAs hold promise in various fields:
● HPC: Solve complex scientific and engineering problems faster.
● AI: Train and deploy AI models with the necessary power and scalability.
● Real-Time Applications: Meet the demanding requirements of latency-sensitive
fields like robotics and autonomous systems.
9
23. High Level Synthesis and FPGAs
For over three decades, engineers have relied on Hardware Description Languages (HDLs)
to design electronic circuits implemented in FPGAs. This approach, while established,
requires a significant investment of time and expertise. Writing detailed descriptions of
each hardware component can be tedious and demands a deep understanding of the
underlying hardware structure.
However, a fresh and promising paradigm shift has emerged in recent years: High-Level
Synthesis (HLS). This innovative approach leverages the familiarity and convenience of
high-level languages like C to design hardware. Dedicated tools then translate this
high-level code into an equivalent hardware description in a lower-level language, known as
Register Transfer Level (RTL).
Several compelling advantages make HLS an increasingly attractive choice for hardware
design:
● Maturity and Stability: HLS tools have evolved significantly, offering improved
reliability and a clearer understanding of the generated hardware behavior.
● Efficiency and Performance: HLS can often produce hardware that rivals, or even
surpasses, the efficiency achieved by manually crafted HDL code. This efficiency
gain, combined with the significantly faster development cycle, makes HLS a
compelling option.
Given these undeniable benefits, HLS has been chosen as the technology of choice for this
thesis, paving the way for a more efficient and accessible approach to FPGA design.
2.2 Literature Review
Deep Learning
Deep learning utilizes artificial neural networks, inspired by the human brain, to perform
10
24. machine learning tasks. These networks consist of multiple layers organized hierarchically,
enabling them to learn complex patterns from data.
Each layer progressively builds upon the knowledge acquired by the previous layer. The
initial layers extract fundamental features, like edges or lines, from the input data.
Subsequent layers combine these basic features into more complex shapes and objects,
culminating in the identification of the desired target.
Imagine training a deep learning model to recognize hands in images. The initial layers
would learn to detect edges and lines, the building blocks of shapes. Moving up the
hierarchy, the network would combine these basic elements into more complex features, like
ovals and rectangles, which could represent whiskers, paws, and tails. Finally, the topmost
layers would recognize these combined features as specific to hands, allowing the network
to differentiate them from other animals.
While focusing on hand identification, the network simultaneously learns about other
objects present in the training data. This allows it to generalize its knowledge and apply it
to other contexts, recognizing hands in diverse environments and situations. This
hierarchical learning process, where simple features are gradually combined to form
complex representations, is the core of deep learning's success. It allows the network to
effortlessly handle complex tasks, making it a powerful tool for various applications.
Derived from the SqueezeNet topology [8], initially designed for embedded systems,
Zynqnet is tailored to be FPGA-friendly through modifications during development. The
topology comprises an initial convolutional layer, 8 identical fire modules, and a final
classification layer, each containing 3 convolutional layers. Notably, efforts were made to
align hyperparameters with power-of-two values.
Key points of improvement in this thesis include:
1. HW Definition: Zynqnet's original hardware accelerator is only partially implemented
on the Xilinx Zynq board [6], working closely with an ARM processor. In contrast, the
presented accelerator is fully hardware-designed, adapting to runtime layer variations
without software intervention.
2. Fixed Point: To mitigate FPGA overhead, fixed-point computations replace the 32-bit
11
25. floating-point implementation used in Zynqnet. The Ristretto tool [7] guides bit width and
fractional bits, applying manual fine-tuning.
3. Data vs. Mem: Significant size and memory reductions occur by reducing classification
items and employing 8-bit fixed-point weights. This optimization simplifies the system,
eliminating external memory access and prioritizing computation speed over memory
volume in the accelerator.
2.2.1 AlexNet
Introduced in 2012, AlexNet, a pioneering Deep Learning architecture developed using the
ImageNet database. They trained a deep convolutional neural network on 1.2 million
high-resolution images, each with dimensions of 224x224 RGB pixels (Li, F., et. al, 2017).
Achieving a worst error rate of 37.5% and a best average error rate of 17.0%, the neural
network boasted 60 million parameters, 650,000 neurons, five Convolutional Layers
followed by ReLu and Max Pool Layers, three Fully Connected Layers, and a 1000-way
Softmax Classifier [9]. The architecture, illustrated below, marked the first use of a
Rectified Linear Unit as an activation layer, deviating from the conventional Sigmoid
Activation function. Their groundbreaking implementation secured victory in the ImageNet
LSVRC-2012 competition. The entire project was conducted using two GTX 580 GPUs.
Fig.4: Visual representation of AlexNet architecture.
Illustration shows the layers used and their interconnectivity.(Krizhevsky, 2012)
12
26. Fig.5: Visual representation of AlexNet architecture.
Illustration shows the layers used and their interconnectivity. (Li, F., et. al, 2017)
2.2.2 VGGNet
In 2014, Karen Simonyan and Andrew Zisserman, researchers at the University of Oxford's
Visual Geometry Group, introduced VGGNet, a groundbreaking architecture that
significantly improved upon the capabilities of its predecessor, AlexNet. VGGNet's key
innovation was its increased depth, achieved by adding more convolutional layers. These
layers utilized smaller receptive fields, primarily 3x3 and 1x1 filters, enabling the network
to extract more detailed and nuanced features from the input images.
Simonyan and Zisserman tested various configurations of their network, all adhering to a
general design but differing in depth. They experimented with 11, 13, and 19 weight layers,
with each depth further divided into sub-configurations. Among these configurations,
VGG16 and VGG19 emerged as the top performers. VGG16 achieved a remarkable
maximum error rate of 27.3% and an impressive average error rate of 8.1%. VGG19, with
its increased depth, further improved upon these results, achieving a maximum error rate
of 25.5% and an average error rate of 8.0%.[7]
As expected, the increased depth of VGG16 and VGG19 led to a significant rise in the
number of parameters. VGG16 boasts 138 million parameters, while VGG19 possesses an
even more impressive 144 million parameters.
13
27. Figure 6. provides a visual comparison of VGG16, VGG19, and their predecessor AlexNet,
highlighting the significant architectural advancements made by VGGNet. This innovative
architecture ultimately led to its victory in the 2014 ImageNet LSRVC Challenge,
solidifying its place as a landmark achievement in the field of deep learning.
Fig. 6: Visual representation of VGGnet architecture & AlexNet (right)
Illustration shows the layers used and their interconnectivity.
(Li, F., et. al, 2017)
2.2.3 ResNet
In 2015, a team from Microsoft, including Kaiming He, Xiangyu Zhang, Shaoqing Ren, and
Jian Sun, developed the ResNet architecture as an enhancement to VGGnet. Recognizing
the importance of network depth for accuracy, they addressed the "vanishing gradient"
problem during backpropagation by introducing "deep residual learning." This novel
framework incorporated "Shortcut Connections," hypothesized to simplify training and
optimization while overcoming the gradient issue (He et al., 2015; Li, F., et al., 2017).
14
28. Fig.7: Residual Learning: a building block of the ResNet architecture. (He et. al., 2015)
For their experimentation they constructed a 34-layer plain network with no Shortcut
connections and a 34 Layer network with shortcut connections, a Resnet. They also
configured several networks with incrementally increasing layer count from 34 layers to
152 layers. Overall, 34-layer ResNet outperformed the 34-layer plain network and the
average error rate achieved on the 152-layer network for the 2015 ImageNet LSVRC
competition was 3.57%. This network architecture won the 2015 ImageNet LSVRC
challenge. (Li, F., et. al, 2017)
Fig.8 represents the building block of the ResNet Architecture.
15
30. 2.2.4 MobileNet
MobileNets, which were originally developed by Google for mobile and embedded vision
applications [12], are distinguished by their use of depth-wise separable convolutions,
which reduce trainable parameters when compared to networks with regular convolutions
of the same depth. MobileNetv2 introduced linear bottlenecks and inverted residuals,
resulting in lightweight deep neural networks that are ideal for the scenario under
consideration in this work.
Fig. 9: Visual representation of MobileNet architecture
2.2.5 Other work
Several research groups have explored implementing Convolutional Neural Networks
(CNNs) on FPGAs, achieving impressive results in terms of performance and efficiency.
Here's a summary of five notable works:
1. Real-Time Video Object Recognition System (Neurocoms, South Korea, 2015)
● Architecture: Custom 5-layer CNN developed in Matlab.
● Input: Grayscale images (28x28).
● Platform: Xilinx KC705 evaluation board.
● Frequency: 250MHz.
● Power consumption: 3.1 watts.
● Resource utilization: 42,616 LUTs, 32 BRAMs, 326 DSP48s.
● Data format: 16-bit fixed point.
● Performance: Focused on frames per second.
17
31. Figure 10: Neurocoms work using 6 neurons and 2 receptor units
(Ahn, B,2015)
This paper describes a real-time video object recognition system implemented on an FPGA.
The system consists of a receiver, a feature map, and a detector. The receiver decodes and
pre-processes the video stream, the feature map extracts features using a CNN, and the
detector identifies objects by comparing the features to a database.
Key takeaways:
● Real-time performance
● Very efficient (3.1 watts power consumption)
● FPGA implementation enables high performance
2. Small CNN Implementation (Institute of Semiconductors, Chinese Academy of
Sciences, Beijing, China, 2015)
● Architecture: 3 convolutional layers with activation, 2 pooling layers, 1 softmax
classifier.
● Input: 32x32 images.
● Platform: Altera Arria V FPGA board.
● Frequency: 50MHz.
● Data format: 8-bit fixed point.
● Performance: Focused on images per second.
18
32. Figure 11: Chinese Academy logic architecture
(Li, H et.al., 2015)
This paper presents a small CNN implementation on an FPGA. The CNN consists of three
convolutional layers with activation, two pooling layers, and a softmax classifier. The input
images are 32x32 and the data format is 8-bit fixed point. The CNN is implemented on an
Altera Arria V FPGA board and operates at a frequency of 50MHz.
Key takeaways:
● The CNN achieves a frames per second rate of 50, which is sufficient for real-time
video processing.
● The CNN achieves an accuracy of 93.6% on the MNIST handwritten digit
classification task.
● The CNN uses 118K LUTs, 112K BRAMs, and 13K DSPs.
3. Angel-Eye System (Tsinghua University and Stanford University, 2016)
● Architecture: Array of custom processing elements.
● Platform: Xilinx Zynq XC7Z045.
● Frequency: 150MHz.
● Data format: 16-bit fixed point.
● Power consumption: 9.63 watts.
● Performance: 187.80 GFLOPS (VGG16 ConvNet).
● Custom compiler: Minimizes external memory access.
19
33. Figure 12: Angel-Eye (Left) Angel-Eye architecture. (Right) Processing Element
(Guo, K. et. al., 2016)
4. Customized Software Tools for CNN Accelerator (Purdue University, 2016)
● Platform: Xilinx Kintex-7 XC7K325T.
● Performance: 58-115 GFLOPS.
● Architecture: Custom software tools for optimization.
● Data format: Not specified.
5. Scalable FPGA Implementation of CNN (Arizona State University, 2016)
● Platform: Stratix-V GXA7.
● Frequency: 100MHz.
● Data format: 16-bit fixed point.
● Power consumption: 19.5 watts.
● Performance: 114.5 GFLOPS.
● Resource utilization: 256 DSPs, 112K LUTs, 2,330 BRAMs.
● Shared multiplier bank: Optimizes multiplication operations.
So far
One major challenge in deploying Deep Learning (DL) models on FPGAs has been their
limited design size. The inherent trade-off between reconfigurability and density restricts
the implementation of large neural networks on FPGAs. However, advancements in
fabrication technology, particularly the use of smaller feature sizes, are enabling denser
FPGAs. Additionally, the integration of specialized computational units alongside the
general FPGA fabric enhances processing capabilities. These advancements are paving the
way for the implementation of complex DL models on single FPGA systems, opening up
new possibilities for hardware-accelerated AI.
20
34. 3. Design and Implementation
3.1 Design
Let's delve deeper into the individual layers of a convolutional neural network:
1. Input: The network begins with the input image, typically represented as a 3D matrix
with dimensions representing width, height, and color channels (e.g., RGB). In this case,
the image size is 32x32 pixels with three color channels.
2. Convolutional Layer: This layer applies filters to the input image, extracting features
through localized dot product calculations. Applying 12 filters would result in a new 3D
volume with dimensions 32x32x12, where each element represents the activation of a
specific feature at a specific location.
3. ReLU Layer: The rectified linear unit (ReLU) layer applies a non-linear activation
function, typically max(0,x), to each element in the previous volume. This introduces
non-linearity and sparsity into the feature representation, enhancing the network's ability
to learn complex patterns. The volume size remains unchanged (32x32x12).
4. Max Pooling Layer: This layer performs downsampling by selecting the maximum value
within a predefined neighborhood in the input volume. By reducing the spatial dimensions
(e.g., by a factor of 2), the network can achieve translational invariance and reduce
computational complexity. In this case, the resulting volume would be 16x16x12.
5. Affine/Fully Connected Layer: This layer connects all neurons in the previous volume to
each output neuron, essentially performing a weighted sum followed by a bias addition.
This final step calculates the class scores for each possible category, resulting in a 1x1x10
volume where each element represents the score for a specific class.
Sequential Processing and Parameter Learning:
Convolutional Neural Networks transform the input image through a series of layers,
gradually extracting features and building increasingly complex representations. While
21
35. some layers like ReLU and Max Pooling operate with fixed functions, others like
Convolutional and Fully Connected layers involve trainable parameters (weights and
biases). These parameters are adjusted through gradient descent optimization during
training, allowing the network to learn optimal representations based on labeled data.
3.2 Implementation
Having covered the foundational aspects of Deep Learning and reviewed prominent Deep
Convolutional Neural Network architectures, along with their implementations on FPGA,
let's delve into the specifics of this design. This section outlines the implementation of Deep
Convolutional Neural Networks in FPGA, discussing similarities & distinctions from prior
works, design goals, and the tools employed. Following this, we provide an overview of the
overall architecture intended for implementation on the FPGA. Due to a focus on hardware
implementation and constraints in time, along with the availability of pre-existing, trained
image system data for CNN, code was sourced from the internet. Finally, we
comprehensively examine three key sub-designs: Convolutional/Affine Layer, ReLu Layer,
Max Pooling Layer, and Softmax Layer.
Similarities
In scrutinizing previous works where groups implemented DCNNs on FPGAs, numerous
similarities emerge between their implementations and the present work. Certain aspects
of DCNNs are inherently common across designs aimed at accelerating DCNNs.
Consequently, essential elements like required layers (e.g., convolution, ReLu, max pool)
and adder trees for summing channel products will not be explicitly discussed in this
section.
Bus Protocol
Firstly, prior works showcase designs employing sub-module intercommunication. Several
designs that utilized separate sub-modules in their overall architecture employed a
communication bus protocol. This approach leverages existing intellectual property from
FPGA manufacturers such as Intel or AMD, allowing the focus to be on the DCNN portion
of the task rather than the infrastructure. Additionally, hardware microprocessors or
implemented co-processors can communicate with the submodules, providing valuable
insights for both software and hardware developers during debugging and verification. The
22
36. drawback, however, is that a bus protocol introduces additional overhead to the design due
to handshaking between sub-modules for reliable communication. Moreover, the presence of
the bus protocol necessitates more signal routing, utilizing overall FPGA resources and
potentially leading to increased dwell time with no task being executed. Despite these
drawbacks, effective management can be achieved by carefully planning the overall design's
concept of operations.
DSP Slices
A prevalent aspect shared among prior works and the present study involves the utilization
of Digital Signal Processing (DSP) Slices. These dedicated hardware components excel in
performing multiply and add operations for both floating-point and fixed-precision
numbers. DSP slices outperform custom designs implemented in hardware description
language (HDL). FPGAs benefit from maximizing available DSP slices, enhancing the
speed of designs, especially in Deep Convolutional Neural Networks (DCNNs).
Data Format
In the software domain, Deep Learning research employs 64-bit double precision floating
point signed digits for weight data. While some works have employed 32-bit single
numbers, there is mounting evidence suggesting that reducing the bit size and format can
significantly impact overall performance. A common alteration is the use of 16-bit
fixed-precision numbers. Alternatively, truncating the 32-bit single number to a 16-bit
"Half" precision number is proposed, presenting a potentially more effective design.
Scalability
Scalability, a crucial feature in previous works and this study, revolves around navigating
through the CNN.As witnessed in other works, the increasing size of software
implementations of DCNNs, exemplified by the 152-layer ResNet design, poses a challenge
for FPGA implementation. To address this, strategies involve implementing reusable
designs capable of performing the functions of all necessary layers in the DCNN
architecture.
Simple Interface
Unlike many previous works, considerable effort has been invested in creating a custom
compiler to completely describe a Deep Convolutional Neural Network in this design. The
aim is to make the DCNN accessible to both software and hardware designers by making
23
37. FPGA hardware programmable. The FPGA can be commanded through function calls in the
microprocessor, performing register writes to the FPGA implementation.
Flexible Design
Unlike prior works where CNN designs are tailored to specific hardware boards, this work
aims for a configurable number of DSPs depending on the FPGA in use. Each layer in the
CNN is modular and can interact through a bus protocol, allowing developers to insert
multiple instances of the Convolutional Layer, Affine Layer, and Max Pooling Layer.
Tools
Throughout the development process of implementing a CNN on an FPGA, various tools
were employed. The choice of utilizing Xilinx chips was influenced by their extensive usage
and the author's prior experience with Xilinx products. Consequently, the tools selected for
this development were drawn from the diverse set offered by AMD Xilinx. The central
design environment was the Xilinx Vivado 2021.3 package (refer to Figure 13), serving as
the primary design hub throughout the developmental phase. Within Vivado, each neural
network layer type was crafted as an AXI-capable submodule. Additionally, Vivado
facilitated integration with pre-existing Xilinx Intellectual Property (IP), such as the Zynq
SoC Co-Processor and Memory Interface Generator (MIG). Lastly, Vivado acted as a
platform for software development, enabling the creation of straightforward software to run
on the Zynq SoC.
Fig.13 Xilinx Vivado 2021.3 IDE
Hardware: The FPGA chosen was a Zedboard. In the context of Digilent’s Zedboard
Development Kit, it consists of a matrix of programmable logic blocks and programmable
interconnections. Zedboard is a built-around Zynq-7000 SoC Xilinx that combines a
two-core ARM Cortex-A9 processor along with a FPGA fabric.
24
38. Fig. 14 Digilent Zedboard Avnet AES-series Evaluation Kit Zynq-7000 System-on-Chip (SoC) (www.digilent.com)
Table.1:Specifications for Zedboard (www.digilent.com)
25
SPECIFICATION DESCRIPTION
SOC OPTIONS • XC7Z020 - CLG484-1
MEMORY • 512 MB DDR3
• 256 Mb Quad - SPI Flash
VIDEO DISPLAY •1080p HDMI
•8 - bit VGA
• 128 x 32 OLED
USER INPUTS • 8 User Switches and 7 user push buttons
AUDIO • 12S Audio CODEC
ANALOG • XADC Header
• Onboard USB JTAG
POWER • 12VDC
CERTIFICATION • CE
• RoHS
DIMENSIONS • 5.3 " x6.3 "
CONFIGURATION MEMORY • 256 Mb Quad - SPI Flash , SDCARD
ETHERNET • 10/100/1000 Ethernet
USB • USB 2.0
COMMUNICATIONS • USB 2.0
•USB - UART
•10/100/1000 Ethernet
USER I / O • ( See User Inputs )
OTHER • PetaLinux BSP
39. 4. Hardware Implementation
4.1 Design Methodology
In extensive code projects with multiple instances and increasing complexity, defining the
order and scope of steps, along with how they will be executed, is crucial—comprising the
project's methodology.
4.2 HLS Methodology
As detailed in Section 2.2, High Level Synthesis (HLS) is chosen for hardware
implementation due to its suitability. Xilinx® Vivado HLS is employed in this project,
following its three-step methodology:
1. Software Simulation: This involves testing code execution using a regular software
compiler and CPU, aided by a test bench.
2. Synthesis: Generating HDL files crucial for code implementation and HLS pragmas.
This step is critical, executed after successful software simulation.
3. Co-Simulation: The most significant step, testing synthesized code functionality using a
hardware simulation. It leverages the test bench from software simulation, comparing
outputs and ensuring hardware-software consistency.
Table 2: Simulation Model vs Hardware Implementation
26
Layer SIM FOPs HW FOPs Diff
CONV1 0.7407 G 0.73530 G 0.74%
CONV2 126.897 M 113.796 M 12.89%
CONV3 35.158 M 29.106 M 27.66%
CONV4 26.645 M 20.830 M 27.91%
CONVS 26.574 M 20.763 M 27.99%
AFFINE1 176.322 M 113.884 M 54.83%
AFFINE2 87.677 M 38.077 M 130.26%
AFFINE3 33.919 M 20.229 M 83.23%
40. 4.3 Design Overview
Fig.15 Final Top-Level FPGA Design
Before delving into the pipelined core and other enhancements, understanding the module's
top-level functionality is vital. The system comprises three modules and a group of
memories:
1. Pipelined Core: This module serves as the computational powerhouse, receiving layer
parameters, weight information, and input data from the Flow Control module. It executes
the necessary calculations and generates the desired outputs.
2. Convolution Flow Control: This module acts as the conductor, ensuring the proper
execution of the network topology. It determines whether update or classification tasks are
required and orchestrates access to all memory units and relevant layer parameters.
3. Memory Controller: This module acts as the memory interface, deciphering read/write
positions for data exchange with the memory units. It receives instructions from both the
Flow Control and Pipelined Core modules, ensuring smooth data flow and efficient memory
27
41. utilization.
By understanding the interactions and responsibilities of these modules, we gain a clear
understanding of how the system operates as a whole. This high-level perspective provides
a valuable foundation for delving deeper into the specific details of the individual
components and their contributions to the overall system performance.
4.4 Caching Strategy
Organized loop order and data reuse optimization lead to local storage of reused
information to avoid accessing on-chip memory overhead. Caches are needed for kernel and
bias, output, and input.
1. Kernel and Bias Caches: Simplest caches loaded at the beginning and updated during
channel changes.
2. Output Cache: More complex due to irregular access pattern, loading bias and
computing ReLU for performance maximization.
3. Input Cache: Most complex, addressing reuse issues with a group of multiple registers
that displace information every iteration.
Memory Controller: Arrays Merging
Adapting access patterns between layers and facilitating simultaneous access to multiple
elements are essential for varying memory requirements.
Fixed-Point Implementation
Following Ristretto's fixed-point analysis of the network, bit width and fractional bits are
defined. So, Xilinx® Vivado HLS utilizes a fixed-point arithmetic type definition
(ap_fixed<bit width, frac bits>). Runtime reconfiguration is managed using integers and bit
shifts for fixed-point operations since Vivado HLS requires a compile-time definition of
fractional bits.
28
43. Following is a detailed explanation of the four test benches shown in the diagrams:
1. Convolutional/Affine Layer Virtual Memory Test Bench
This test bench verifies the functionality of the convolutional/affine layer implementation
by comparing its outputs to the expected outputs generated by a reference software model.
The test bench loads the input and kernel data into virtual memory and then performs the
convolutions/affine operations. The outputs are then compared to the expected outputs to
ensure that the implementation is correct.
2. Convolutional/Affine Layer Block RAM Test Bench
This test bench is similar to the virtual memory test bench, but it stores the input and
kernel data in block RAM instead of virtual memory. This test bench is useful for verifying
the performance of the convolutional/affine layer implementation, as it can achieve higher
throughput by avoiding the overhead of accessing virtual memory.
3. Max Pool Layer Virtual Memory Test Bench
This test bench verifies the functionality of the max pool layer implementation by
comparing its outputs to the expected outputs generated by a reference software model. The
test bench loads the input data into virtual memory and then performs the max pooling
operation. The outputs are then compared to the expected outputs to ensure that the
implementation is correct.
4. Max Pool Layer Block RAM Test Bench
This test bench is similar to the virtual memory test bench, but it stores the input data in
block RAM instead of virtual memory. This test bench is useful for verifying the
performance of the max pool layer implementation, as it can achieve higher throughput by
avoiding the overhead of accessing virtual memory.
The diagram shows the four test benches connected to a common input and output
interface. This allows the test benches to be easily swapped in and out, depending on the
layer being tested.
30
44. Input and Output Interface: This interface provides a common way to load input data into
the test benches and to read the output data from the test benches. The interface can be
implemented using a variety of different methods, such as FIFO buffers, DMA transfers, or
direct memory access.
Virtual Memory: Virtual memory is used to store the input and kernel data for the
convolutional/affine layer and the max pool layer virtual memory test benches. Virtual
memory allows the test benches to access large amounts of data without having to load it
all into physical memory at once.
Block RAM: Block RAM is used to store the input data for the convolutional/affine layer and
the max pool layer block RAM test benches. Block RAM is a type of on-chip memory that is
faster than virtual memory, but it has a limited capacity.
Test Bench Control Logic:
The test bench control logic is responsible for loading the input and kernel data into the test
benches, performing the convolutions/affine operations or the max pooling operation, and
comparing the outputs to the expected outputs. The test bench control logic can be
implemented using a variety of different methods, such as a finite state machine, a
microcontroller, or a software program.
The four test benches described above are essential tools for verifying the functionality and
performance of convolutional neural network implementations on FPGAs. By using these
test benches, designers can ensure that their implementations are correct and that they
meet the desired performance requirements.
Performance Evaluation and Analysis
After implementing all optimization techniques, the accelerator was ready to classify
images using trained network weights. To simulate the hardware behavior, Xilinx® Vivado
HLS Co-simulation was employed. Images from the validation dataset, which achieved 73%
accuracy with Ristretto, were evaluated. The simulation process, spanning over 185 hours,
resulted in an overall 58% accuracy, requiring 26 million cycles per image.
31
45. With a relatively small critical path, a 100MHz clock can be utilized, enabling the
processing of approximately 4 frames per second. These results are deemed successful, as
the achieved accuracy meets the project's minimum threshold, and the performance
surpasses the lower limit by nearly fourfold. Consequently, no further modifications are
required, and the accelerator is prepared for deployment.
Table 3 Resource Utilization of Final Design (AlexNet)
Resource Utilization Optimization
While the accelerator described in Section 3.2 is functional and implementable, the
pipelined core's low resource footprint (35 DSPs, 41,000 Flip-flops, and 36,500 LUTs) allows
for potential modifications or duplications to reduce the pipeline depth. This situation is
particularly suited for HLS optimization, as it can sometimes surpass human design
capabilities.(See Table 3)
Initially, Vivado generated two core instances with a 4-stage pipeline, requiring 26,596,261
cycles, due to different memory inputs. To improve this design, various configurations were
explored using the function_instantiate pragma, creating four core instances. By sharing
resources effectively, only 15% more DSPs, 27% more flip-flops, and 33% more LUTs were
utilized compared to the double-mode core implementation. This configuration enabled
reducing two out of the four pipelines by one stage each. However, this modification
resulted in a negligible 0.2% performance improvement, ultimately leading to its rejection.
Here are some parameters to compare different CNN implementations on FPGA:
32
Resource Utilization Available Utilization %
LUT 36527 53200 68.66
LUTRAM 2594 46200 5.61
FF 41198 106400 38.72
BRAM 54 140 38.22
DSP 35 220 16.08
IO 69 285 24.21
BUFG 7 32 21.88
MMCM 2 10 20
PLL 1 10 10
46. ● Throughput: Throughput is the number of input data that can be processed per
unit time. It is an important parameter to measure the performance of a CNN
implementation on an FPGA. Measured through FOPS
● Latency: Latency is the time taken by the CNN to process one input data.
● Resource utilization: It is an important parameter to measure the efficiency of a
CNN implementation on an FPGA.
● Power consumption: It is a crucial parameter to measure the energy efficiency
of a CNN implementation on an FPGA.
● Accuracy: It is an important parameter to measure the effectiveness of a CNN
implementation on an FPGA.
● Flexibility: Flexibility is the ability of the CNN implementation to adapt to
different CNN models and configurations.
● Ease of use
Table 4: Hardware execution times of each AlexNet Layer
Table.4 shows that the convolutional layers (CONV1-CONV5) are the most time-consuming
33
Layer Start Time End Time Total Time FOPS
CONV1 0
71198.67 us
Epoch = 0x1161e
Cycle = 0x43 71.19867 ms 0.7456 G
CONV2 0
547753.71 us
Epoch = 0x85BA9
Cycle = 0x47 547.75371 ms 108.806 M
CONV3 0
463776.90 us
Epoch = 0x713A0
Cycle = 0x5A 463.77690 ms 24.858 M
CONV4 0
697862.14 us
Epoch Oxaa606
Cycle = 0x0E 697.86214 ms 16.551 M
CONV5 0
466757.25 us
Epoch = 0x71f45
Cycle = 0x19 466.75725 ms 16.543 M
AFFINE1 0
796440.32 us
Epoch = 0xc2718
Cycle = 0x20 796.44032 ms 110.922 M
AFFINE2 0
1018890.52 us
Epoch Oxf8c0a
Cycle = 0x34 1018.89052 ms 33.446 M
AFFINE3 0
4682.26 us
Epoch = 0x124A
Cycle = 0x1A 4.68226 ms 17.769 M
47. layers in the network, accounting for over 90% of the total execution time. This is because
convolutional layers perform a large number of floating-point operations.
The fully connected layers (AFFINE1-AFFINE3) are much faster, but they still account for
a significant portion of the total execution time. This is because fully connected layers also
perform a large number of floating-point operations, and they also require more memory
bandwidth.
The table also shows that the FOPS of each layer is inversely proportional to the execution
time. This means that the layers with the longest execution times have the lowest FOPS.
Overall, the table provides insights into the performance of the AlexNet CNN when
implemented on an FPGA. It shows that the convolutional layers are the most
time-consuming layers in the network, and that the FOPS of each layer is inversely
proportional to the execution time.
Here are some specific observations from the table:
● The CONV1 layer has the longest execution time, at 71.1986 milliseconds. This is
because the CONV1 layer has the largest number of filters.
● The CONV5 layer has the shortest execution time, at 466.7572 milliseconds. This is
because the CONV5 layer has the smallest number of filters.
● The AFFINE1 layer has the highest FOPS, at 110.922 million FOPS. This is because
the AFFINE1 layer has the smallest number of connections.
● The AFFINE3 layer has the lowest FOPS, at 17.769 million FOPS. This is because
the AFFINE3 layer has the largest number of connections.
Table.4 also shows that the total execution time for the AlexNet CNN is 796.4403
milliseconds. This means that the network can process approximately 1.25 frames per
second.
Table5 AlexNet vs MobileNet
34
Layer
Ops
To Perform
AlexNet
FOPS
MobileNet
FOPS Difference
CONV1 210249696 0.7407 G 2.9530 G 34878.85
CONV2 62332672 126.897 M 113.796 M 287.09
CONV3 13498752 35.158 M 29.106 M 42.42
CONV4 14537088 26.645 M 20.830 M 30.31
CONV5 9691392 26.574 M 20.763 M 30.15
AFFINE1 90701824 176.322 M 113.884 M 115.9
AFFINE2 38797312 87.677 M 38.077 M 24.67
AFFINE3 94720 33.919 M 20.229 M 19.76
48. It is important to note that the performance of a CNN implementation on an FPGA can be
affected by a variety of factors, such as the FPGA platform, the CNN architecture, and the
optimization techniques used. Table5 only provides a comparison of 2 specific CNN
implementations on a specific FPGA platform.
Guo , K. et .
al . , 2016
Ma , Y.et. al . ,
2016
Zhang, C.et . al
. , 2015
Espinosa.M.,
2019 This Work
FPGA
Zynq
XC7Z045
Stratix - V
GXA7 FPGA
Virtex7
VX485T
Artix7
XC7A200T
Zedboard Zynq
AAES-Z7EV
Clock Freq 150 MHz 100MHz 100MHz 100MHz 100MHz
Data format 16 - bit fixed Fixed ( 8-16b ) 32 - bit Float 32 - bit Float 32 - bit Fixed
Power
9.63 W
( measured )
19.5 W
( measured )
18.61 W
( measured )
1.5 W
( estimated )
0.9 W
( estimated )
FF 127653 ? 205704 103610 41198
LUT 182616 121000 186251 91865 36527
BRAM 486 1552 1024 139.5 54
DSP 780 256 2240 119 35
Performance
187.80
GFOPS
114.5
GFOPS
61.62
GFOPS
2.93
GFOPS
0.74
GFOPS
Table 6: Comparison of other works to this work. (AlexNet)
Methods of Improvement / Scope
This implementation of a Convolutional Neural Network in an AlexNet configuration is a
first pass attempt and leaves a lot room for improvement and optimization. There are a few
ways the performance of this implementation can be increased which would be areas for
future work. Looking at Table 6, we can see the differences in resource utilization and
performance between other recent works and this one. Although this implementation
achieved a lower amount of GFOPs performance, the number of chip resources is far lower
than any of the other implementations. Also, the estimated power consumption is far lower.
35
49. 5. Conclusions
5.1 Results
While Deep Learning and Convolutional Neural Networks (CNNs) have traditionally
resided within the realm of Computer Science, with massive computations performed on
GPUs housed in desktop computers, their increasing power demands raise concerns about
efficiency. Existing FPGA implementations for CNNs primarily focus on accelerating the
convolutional layer and often have rigid structures limiting their flexibility.
This work aims to address these limitations by proposing a scalable and modular FPGA
implementation for CNNs. Unlike existing approaches, this design seeks to configure the
system for running an arbitrary number of layers, offering greater flexibility and
adaptability.
The proposed architecture was evaluated on publicly available CNN architectures like
AlexNet, ResNet, and MobileNet on a Zedboard platform. Performance analysis revealed
MobileNet as the fastest among the three, achieving an accuracy of 47.5%. This
demonstrates the system's potential for efficient and adaptable execution of diverse CNN
architectures.
This work paves the way for further research in scalable and flexible FPGA
implementations for CNNs, offering promising avenues for resource-efficient deep learning
beyond traditional computing platforms.
36
50. Appendix
// CNN Sample Layer model
module Layer_1
#(parameterNN=30,numWeight=784,dataWidth=16,layerNum=1,sigmoidSize=10,weightIntWidth=
4,actType="relu")
(
input clk,
input rst,
input weightValid,
input biasValid,
input [31:0] weightValue,
input [31:0] biasValue,
input [31:0] config_layer_num,
input [31:0] config_neuron_num,
input x_valid,
input [dataWidth-1:0] x_in,
output [NN-1:0] o_valid,
output [NN*dataWidth-1:0] x_out
);
neuron
#(.numWeight(numWeight),.layerNo(layerNum),.neuronNo(0),.dataWidth(dataWidth),.sigmoidSize(
sigmoidSize),.weightIntWidth(weightIntWidth),.actType(actType),.weightFile("w_1_0.mif"),.biasFil
e("b_1_0.mif"))n_0(
.clk(clk),
.rst(rst),
.myinput(x_in),
.weightValid(weightValid),
.biasValid(biasValid),
.weightValue(weightValue),
.biasValue(biasValue),
.config_layer_num(config_layer_num),
.config_neuron_num(config_neuron_num),
.myinputValid(x_valid),
.out(x_out[0*dataWidth+:dataWidth]),
.outvalid(o_valid[0])
);
endmodule
Due to space constraints, all the data, references and code can be accessed here: Thesis_Appendix
37
51. Bibliography
[1] D. M. Harris and S. L. Harris, Digital Design and Computer Architecture. Elsevier,
(2007)
[2] S.Authors, History of artificial intelligence
[3] Farabet, C., Martini, B., Akselrod, P., Talay, S., LeCun, Y., Culurciello, E.: Hardware
accelerated convolutional neural networks for synthetic vision systems. In: Circuits
and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium on. pp.
257–260. IEEE (2010)
[4] Goodfellow, I., & Bengio, Y., & Courville, A Convolutional Networks. In Dietterich,
T.,(Ed.), Deep Learning(326-339). Cambridge, Massachusetts: The MIT Press.(2016)
[5] D. Gschwend, Zynqnet: An fpga-accelerated embedded convolutional neural network.
[6] Xilinx (2017). Zynq-7000 All Programmable SoC Family Product Tables and Product
SelectionGuide. Retrieved from
https://www.xilinx.com/support/documentation/selection-guides/zynq-7000-product-se
lection-guide.pdf
[7] Romén Neris , Adrián Rodríguez , Raúl Guerra. FPGA-Based Implementation of a
CNN Architecture for the On-Board Processing of Very High-Resolution Remote
Sensing Images, IEEE Journal Of Selected Topics In Applied Earth Observations
And Remote Sensing, Vol. 15, 2022.
[8] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, And K. Keutzer,
Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <0.5mb model size,
arXiv:1602.07360, (2016)
[9] Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). ImageNet Classification with Deep
Convolutional Neural Networks. Advances in Neural Information Processing Systems,
25 (NIPS 2012). Retrieved from
https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neu
ral-networks.pdf
[10] Zisserman, A. & Simonyan, K. (2014). Very Deep Convolutional Networks For
Large-Scale Image Recognition. Retrieved from https://arxiv.org/pdf/1409.1556.pdf
[11] Li, F., et. al. CNN Architectures [PDF document]. Retrieved from Lecture Notes Online
Website: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture9.pdf
38
52. [12] Qiao, Y., & Shen, J., & Xiao, T., & Yang, Q., & Wen, M., & Zhang, C. FPGA‐
accelerated deep convolutional neural networks for high throughput and energy
efficiency. Concurrency and Computation Practice and Experience. John Wiley & Sons
Ltd.(May 06, 2016).
[13] Lacey, G., & Taylor, G., & Areibi, S. Deep Learning on FPGAs: Past, Present and
Future. Cornell University Library. https://arxiv.org/abs/1602.04283 (Feb. 13, 2016)
[14] Gomez, P. Implementation of a Convolutional Neural Network (CNN) on a FPGA for
Sign Language's Alphabet recognition. Archivo Digital UPM. Retrieved December 6,
2023, from https://oa.upm.es/53784/1/TFG_PABLO_CORREA_GOMEZ.pdf (2018, July)
[15] Espinosa, M. A. Implementation of Convolutional Neural Networks in FPGA for Image
Classification. ScholarWorks. Retrieved December 6, 2023, from
https://scholarworks.calstate.edu/downloads/hd76s209r (2019, Spring)
[16] He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image
Recognition.Retrieved from https://arxiv.org/pdf/1512.03385.pdf
[17] Reddy, G. (2019, January 1). FPGA Implementation of Multiplier-Accumulator Unit
using Vedic multiplier and Reversible gates. Semantic Scholar.
https://www.semanticscholar.org/paper/FPGA-Implementation-of-Multiplier-Accumul
ator-Unit-Rajesh-Reddy/edab41b3600b2b51d6887042487bac32c80182b5
[18] Guo, K., & Sui, L., & Qiu, J., & Yao, S., & Han, S., & Wang, Y., & Yang, H. (July. 13,
2016). Angel-Eye: A Complete Design Flow for Mapping CNN onto Customized
Hardware. IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2016,
pp.24-29. doi:10.1109/ISVLSI.2016.129
[19] Ahn, B. (Oct. 01, 2015). Real-time video object recognition using convolutional neural
networks. International Joint Conference on Neural Networks (IJCNN), 2015.
doi:10.1109/IJCNN.2015.7280718
39