A Typical Convolutional Neural Network (CNN) Architecture

A note on the common architecture of the CNN!

Subscribe to my newsletter and never miss my upcoming articles

CNN(a.ka ConvNets) has become a decent plugin for computer vision tasks. In this write-up, I want to give you an intuition about the common architecture of the Convolutional Neural Networks (CNN).

A basic CNN will have the following blocks:

  1. Convolutional layer(s)
  2. Pooling layer(s)
  3. Fully connected layer

CNN Architecture.png

Let's go a little deep into these blocks, one by one.

1. Convolutional layer(s)

The convolution layer, the backbone of the whole CNN is used to extract the features in the images using filters. These filters learn both low-level features such as lines, edges and high-level features such as the face, ear, nose, etc...High-level features are what later become useful during image recognition.

The process of convolution is that we pass the filter to each pixel in an image, we multiply the corresponding pixels and then we calculate the sum, such sum making the new pixel. We repeat the process until the filter is slid over all image pixels.

Most popular deep learning frameworks such as TensorFlow or PyTorch allow you to create the convolutional layer in one line of code. This is how you would create a Convolution layer in TensorFlow.

import tensorfow as tf


The output of the convolutional layer will be high dimensional feature maps and its dimension will directly depend on the number of filters we have in a layer. If the layer has 32 filters, then you will have 32 feature maps at the output. How do we go around that high dimensions?

2. Pooling Layers

A pooling layer is introduced in the network to compress or shrink these feature maps. There various pooling options but to preserve the best part of the images, Maxpooling is used. It will reduce the image size while retaining the best pixels.

This is how you would implement the Maxpooling layer in TensorFlow.


The output of the pooling layers is reduced size feature maps. How does the network make sense of what these features represent?

Fully Connected Layer (FC)

At the end of the ConvNets, there is always going to be a fully connected layer whose job is to match the produced feature maps from the pooling layer with the exact labels of the original image.

Take an example. If the input image was a human, the final high-level features will be something like ear, noses, eye, face, etc (whatever can show what makes a human a human). Once the neural network has learned that, these features will need to be labeled and that is the use of the fully connected layers.

They are typical made of densely connected layers. There is no restriction of how many dense layers you should have, but one thing to be careful of is that the last dense layer must have the right number of neurons and appropriate activation function, best representing the task at hand.

This is how you can implement a FC layer having two dense layers. I made an assumption that we are building a classier with 10 categories (say Fashion MNIST) and we are using softmax as an activation function to give the output of exact categories instead of probabilities.

tf.keras.layers.Dense(64, activation='relu'), 
tf.keras.layers.Dense(10, activation='softmax')

A CNN may have multiple blocks of Convolutional and Maxpooling layers. The right number of these layers will depend on the scope of the task at hand and the size of the dataset. As you can also see in a below example of a full network for Fashion classifier, other layers may be used: Flatten was used to convert the feature maps into a single column vector because that's the format Dense layer expects and Dropout was used for regularization purpose.

cnn_model = tf.keras.models.Sequential([

    tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(28, 28, 1)),

    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),


    tf.keras.layers.Dense(64, activation='relu'), 
    tf.keras.layers.Dense(10, activation='softmax')  

Until the next time, may your models always generalize on the new data!

No Comments Yet