Convolutional neural networks (CNNs) are a type of deep neural network used for image and video recognition. CNNs consist of three main layers: convolutional layers that apply filters to input images via small kernels; pooling layers that downsample outputs from convolutional layers; and fully connected layers where every neuron is connected to the previous layer to perform classification. Kernels, stride, and padding are used in convolutional layers to preserve spatial dimensions while performing convolutions. Max and average pooling are used to select maximum or average values in pooling layers.