Trending February 2024 # Beginner’s Guide To Object Detection For Computer Vision Project # Suggested March 2024 # Top 4 Popular

You are reading the article Beginner’s Guide To Object Detection For Computer Vision Project updated in February 2024 on the website We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested March 2024 Beginner’s Guide To Object Detection For Computer Vision Project

Object detection can be defined as a computer vision technique which aims to identify and locate objects on an image or a video. Computers might be able to process information way faster than humans, however, it is still difficult for computers to detect various objects on an image or video. The reason for this is that the computer interprets the majority of the outputs in the binary language only. This article aims to briefly discuss:

The basics of object detection

The object detection models

The benefits of object detection

The challenges and solutions

Before we get to the points above, we need to understand the difference between image classification and object detection. Beginners tend to confuse these two.

Difference between Object Detection and Image Classification

Let us break down these techniques, to know the difference between them. When you look at a picture of a dog you can instantly say it’s an image of an animal i.e. tell what the image is about. This is what image classification is all about. As long as there is only one object,

Object Detection Models

Now that we are clear with the definition of Object Detection, let’s have a look at some popular Object Detection models.

 R-CNN, Faster R-CNN, Mask R-CNN

The most popular object detection models belong to the family of regional based CNN models. This model has revolutionized the way the world of Object Detection used to work. In the past few years, they’ve not only become more accurate but more efficient too.


There are a plethora of models belonging to the single shot detector family which were published in 2024. Although SSDs are faster than CNN models, their accuracy rate is much lower than that of the CNNs. YOLO or you only look once, is quite different from region-based algorithms. Just like SDDs, yolo is faster than R-CNNs but lags behind because of low accuracy. For mobile or embedded devices, SDDs are the perfect choice.


In recent years, these object detection models are gaining more popularity. CentreNet follows a key point-based approach for object detection. When compared with SSD or R-CNN approaches, this model proves to be more efficient and as well as more accurate. The only drawback of this method is slow training process.

Benefits of Object detection to Real-world

Object detection is completely inter-linked with other similar computer vision techniques such as image segmentation and image recognition that assist us to understand and analyze the scenes in videos and images. Nowadays, several real-world use cases are implemented in the market of object detection which make a tremendous impact on different industries. Here we’ll specifically examine how object detection applications have impacted in the following areas.

Self-driving cars

The primary reason behind the success of autonomous vehicles is real-time object detection artificial intelligence based models. These systems allow us to locate, identify and track the objects around them, for the purpose of safety and efficiency.

Video Surveillance

Real-time object detection and tracking the movements of objects allow video surveillance cameras to track the record of scenes of a particular location such as an airport. This state-of-the-art technique accurately recognizes and locates several instances of a given object in the video. In real-time, as the object moves through a given scene or across the particular frame, the system stores the information with real-time tracking feeds.

Crowd Counting

For heavily populated areas such as shopping malls, airports, city squares and theme parks, this application performs unbelievably well. Generally, this object detection application proves to be helpful to large enterprises and municipalities for tracking road traffic, violation of laws and number of vehicles passing in a particular time frame.

Anomaly detection

There are several anomaly detection applications available for different industries which use object detection. For instance, in agriculture, object detection models can accurately recognize and find the potential instances of plant disease. With the help of this, farmers will get notified and they will be able to prevent their crops from such threats. As another example, this model has been used to identify the skin infections and symptomatic lesions. Some applications are already built for skin care and acne treatment using object detection models. Keep in mind, there are some problems encountered while creating any kind of object detection model. However, solutions are also available to limit the challenges.

Challenges and Solutions of Object detection Modelling Dual Synchronization

The first challenge for object detection is to classify the image and position of the object, which is known as

Solution: Regional based Convolutional neural networks displays one class of object detection framework that consist of region generation proposals where objects are likely to be located, followed by CNN models processing to classify and rectify the object locations. Fast-R CNN model can improve the initial results with R-CNN. As its name denotes, this Fast R-CNN model provides tremendous speed, but accuracy also improves only because the localization and object classification tasks are optimized using a multi-task loss function.

Real-time detection speed 

Fast speed of object detection algorithms has always been a major problem to classify and localize the crucial objects accurately at same time to meet the real-time video processing. Over the years, several algorithms improved the test time from 0.02 frames per second to 155 fps.

Solution: Faster R-CNN and Fast R-CNN models aim to speed up the original speed of R-CNN approach. Because R-CNN uses the selective search to produce 2000 candidate regions of interest and passes through each CNN based model individually, that may cause a heavy bottleneck since the model processing gets down. Whereas, Fast R-CNN model transmits the whole image through CNN base once and then matches the ROIs created with selective search to feature map, considering 20-fold reduction in processing time.

Multiple aspects ratios and spatial scales

For several object detection applications, items of interest may appear in   huge range of aspect ratios and sizes. Researchers proved numerous methods to ensure the detection algorithms which are able to recognize different objects at different views and scales.

Solution: Rather than selective search, faster R-CNN has been updated with a region proposal network that uses a small sliding window over the picture’s convolutional feature map to produce candidate regions of interest. Several regions of Interests can be predicted at different positions and described relative to reference anchor boxes. The size and shape of these anchor boxes are selected to span a range of aspect ratios and different scales. It lets several types of objects identify with a hope that bounding box coordinates do not need to be adjusted during the localization task.

Limited data

One of the undeniable facts to be considered is the limited amount of annotated data which becomes a hurdle to build an application. These datasets are specifically containing ground truth examples for dozens to hundreds of objects, while image classification datasets include approximately 100,000 different classes.

Final thought

You're reading Beginner’s Guide To Object Detection For Computer Vision Project

Deep Learning For Computer Vision – Introduction To Convolution Neural Networks


The power of artificial intelligence is beyond our imagination. We all know robots have already reached a testing phase in some of the powerful countries of the world. Governments, large companies are spending billions in developing this ultra-intelligence creature. The recent existence of robots have gained attention of many research houses across the world.

Does it excite you as well ? Personally for me, learning about robots & developments in AI started with a deep curiosity and excitement in me! Let’s learn about computer vision today.

The earliest research in computer vision started way back in 1950s. Since then, we have come a long way but still find ourselves far from the ultimate objective. But with neural networks and deep learning, we have become empowered like never before.

Applications of deep learning in vision have taken this technology to a different level and made sophisticated things like self-driven cars possible in near future. In this article, I will also introduce you to Convolution Neural Networks which form the crux of deep learning applications in computer vision.

Note: This article is inspired by Stanford’s Class on Visual Recognition. Understanding this article requires prior knowledge of Neural Networks. If you are new to neural networks, you can start here. Another useful resource on basics of deep learning can be found here.

You can also learn Convolutional neural Networks in a structured and comprehensive manner by enrolling in this free course: Convolutional Neural Networks (CNN) from Scratch

Table of Contents

Challenges in Computer Vision

Overview of Traditional Approaches

Review of Neural Networks Fundamentals

Introduction to Convolution Neural Networks

Case Study: Increasing power of of CNNs in IMAGENET competition

Implementing CNNs using GraphLab (Practical in Python)

1. Challenges in Computer Vision (CV)

As the name suggests, the aim of computer vision (CV) is to imitate the functionality of human eye and brain components responsible for your sense of sight.

Doing actions such as recognizing an animal, describing a view, differentiating among visible objects are really a cake-walk for humans. You’d be surprised to know that it took decades of research to discover and impart the ability of detecting an object to a computer with reasonable accuracy.

Let’s get familiar with it a bit more:

Object detection is considered to be the most basic application of computer vision. Rest of the other developments in computer vision are achieved by making small enhancements on top of this. In real life, every time we(humans) open our eyes, we unconsciously detect objects.

Since it is super-intuitive for us, we fail to appreciate the key challenges involved when we try to design systems similar to our eye. Lets start by looking at some of the key roadblocks:

Variations in Viewpoint

The same object can have different positions and angles in an image depending on the relative position of the object and the observer.

There can also be different positions. For instance look at the following images:

Though its obvious to know that these are the same object, it is not very easy to teach this aspect to a computer (robots or machines).

Difference in Illumination

Though this image is so dark, we can still recognize that it is a cat. Teaching this to a computer is another challenge.

Hidden parts of images

Here, only the face of the puppy is visible and that too partially, posing another challenge for the computer to recognize.

Background Clutter

If you observe carefully, you can find a man in this image. As simple as it looks, it’s an uphill task for a computer to learn.

These are just some of the challenges which I brought up so that you can appreciate the complexity of the tasks which your eye and brain duo does with such utter ease. Breaking up all these challenges and solving individually is still possible today in computer vision. But we’re still decades away from a system which can get anywhere close to our human eye (which can do everything!).

This brilliance of our human body is the reason why researchers have been trying to break the enigma of computer vision by analyzing the visual mechanics of humans or other animals. Some of the earliest work in this direction was done by Hubel and Weisel with their famous cat experiment in 1959. Read more about it here.

This was the first study which emphasized the importance of edge detection for solving the computer vision problem. They were rewarded the nobel prize for their work.

Before diving into convolutional neural networks, lets take a quick overview of the traditional or rather elementary techniques used in computer vision before deep learning became popular.

2. Overview of Traditional Approaches

Various techniques, other than deep learning are available enhancing computer vision. Though, they work well for simpler problems, but as the data become huge and the task becomes complex, they are no substitute for deep CNNs. Let’s briefly discuss two simple approaches.

KNN (K-Nearest Neighbours)

Each image is matched with all images in training data. The top K with minimum distances are selected. The majority class of those top K is predicted as output class of the image.

Various distance metrics can be used like L1 distance (sum of absolute distance), L2 distance (sum of squares), etc.


Here the same dog is on right side in first image and left side in second. Though its the same image, KNN would give highly non-zero distance for the 2 images.

Similar to above, other challenges mentioned in section 1 will be faced by KNN.

Linear Classifiers

They use a parametric approach where each pixel value is considered as a parameter.

It’s like a weighted sum of the pixel values with the dimension of the weights matrix depending on the number of outcomes.

Intuitively, we can understand this in terms of a template. The weighted sum of pixels forms a template image which is matched with every image. This will also face difficulty in overcoming the challenges discussed in section 1 as single template is difficult to design for all the different cases.

I hope this gives some intuition into the challenges faced by approaches other than deep learning. Please note that more sophisticated techniques can be used than the ones discussed above but they would rarely beat a deep learning model.

3. Review of Neural Networks Fundamentals

Let’s discuss some properties of a neural networks. I will skip the basics of neural networks here as I have already covered that in my previous article – Fundamentals of Deep Learning – Starting with Neural Networks.

Once your fundamentals are sorted, let’s learn in detail some important concepts such as activation functions, data preprocessing, initializing weights and dropouts.

Activation Functions

There are various activation functions which can be used and this is an active area of research. Let’s discuss some of the popular options:

Sigmoid Function

Sigmoid activation, also used in logistic regression regression, squashes the input space from (-inf,inf) to (0,1)

But it has various problems and it is almost never used in CNNs:

Saturated neurons kill the gradient

If you observe the above graph carefully, if the input is beyond -5 or 5, the output will be very close to 0 and 1 respectively. Also, in this region the gradients are almost zero. Notice that the tangents in this region will be almost parallel to x-axis thus ~0 slope.

As we know that gradients get multiplied in back-propogation, so this small gradient will virtually stop back-propogation into further layers, thus killing the gradient.

Outputs are not zero-centered

As you can see that all the outputs are between 0 and 1. As these become inputs to the next layer, all the gradients of the next layer will be either positive or negative. So the path to optimum will be zig-zag. I will skip the mathematics here. Please refer the stanford class referred above for details.

Taking the exp() is computationally expensive

Though not a big drawback, it has a slight negative impact

tanh activation

It is always preferred over sigmoid because it solved problem #2, i.e. the outputs are in range (-1,1).

But it will still result in killing the gradient and thus not recommended choice.

 ReLU (Rectified Linear Unit)

Gradient won’t saturate in the positive region

Computationally very efficient as simple thresholding is required

Empirically found to converge faster than sigmoid or tanh.

Output is not zero-centered and always positive

Gradient is killed for x<0. Few techniques like leaky ReLU and parametric ReLU are used to overcome this and I encourage you to find these

Gradient is not defined at x=0. But this can be easily catered using sub-gradients and posts less practical challenges as x=0 is generally a rare case

To summarize, ReLU is mostly the activation function of choice. If the caveats are kept in mind, these can be used very efficiently.

Data Preprocessing

For images, generally the following preprocessing steps are done:

Same Size Images: All images are converted to the same size and generally in square shape.

Mean Centering: For each pixel, its mean value among all images can be subtracted from each pixel. Sometimes (but rarely) mean centering along red, green and blue channels can also be done

Note that normalization is generally not done in images.

Weight Initialization

There can be various techniques for initializing weights. Lets consider a few of them:

All zeros

This is generally a bad idea because in this case all the neuron will generate the same output initially and similar gradients would flow back in back-propagation

The results are generally undesirable as network won’t train properly.

Gaussian Random Variables

The weights can be initialized with random gaussian distribution of 0 mean and small standard deviation (0.1 to 1e-5)

This works for shallow networks, i.e. ~5 hidden layers but not for deep networks

In case of deep networks, the small weights make the outputs small and as you move towards the end, the values become even smaller. Thus the gradients will also become small resulting in gradient killing at the end.

Note that you need to play with the standard deviation of the gaussian distribution which works well for your network.

Xavier Initialization

It suggests that variance of the gaussian distribution of weights for each neuron should depend on the number of inputs to the layer.

The recommended variance is square root of inputs. So the numpy code for initializing the weights of layer with n inputs is: np.random.randn(n_in, n_out)*sqrt(1/n_in)

A recent research suggested that for ReLU neurons, the recommended update is: np.random.randn(n_in, n_out)*sqrt(2/n_in). Read this blog post for more details.

One more thing must be remembered while using ReLU as activation function. It is that the weights initialization might be such that some of the neurons might not get activated because of negative input. This is something that should be checked. You might be surprised to know that 10-20% of the ReLUs might be dead at a particular time while training and even in the end.

These were just some of the concepts I discussed here. Some more concepts can be of importance like batch normalization, stochastic gradient descent, dropouts which I encourage you to read on your own.

4. Introduction to Convolution Neural Networks

Before going into the details, lets first try to get some intuition into why deep networks work better.

As we learned from the drawbacks of earlier approaches, they are unable to cater to the vast amount of variations in images. Deep CNNs work by consecutively modeling small pieces of information and combining them deeper in network.

One way to understand them is that the first layer will try to detect edges and form templates for edge detection. Then subsequent layers will try to combine them into simpler shapes and eventually into templates of different object positions, illumination, scales, etc. The final layers will match an input image with all the templates and the final prediction is like a weighted sum of all of them. So, deep CNNs are able to model complex variations and behaviour giving highly accurate predictions.

There is an interesting paper on visualization of deep features in CNNs which you can go through to get more intuition – Understanding Neural Networks Through Deep Visualization.

For the purpose of explaining CNNs and finally showing an example, I will be using the CIFAR-10 dataset for explanation here and you can download the data set from here. This dataset has 60,000 images with 10 labels and 6,000 images of each type. Each image is colored and 32×32 in size.

A CNN typically consists of 3 types of layers:

Convolution Layer

Pooling Layer

Fully Connected Layer

You might find some batch normalization layers in some old CNNs but they are not used these days. We’ll consider these one by one.

Convolution Layer

Since convolution layers form the crux of the network, I’ll consider them first. Each layer can be visualized in the form of a block or a cuboid. For instance in the case of CIFAR-10 data, the input layer would have the following form:

Here you can see, this is the original image which is 32×32 in height and width. The depth here is 3 which corresponds to the Red, Green and Blue colors, which form the basis of colored images. Now a convolution layer is formed by running a filter over it. A filter is another block or cuboid of smaller height and width but same depth which is swept over this base block. Let’s consider a filter of size 5x5x3.

We start this filter from the top left corner and sweep it till the bottom left corner. This filter is nothing but a set of eights, i.e. 5x5x3=75 + 1 bias = 76 weights. At each position, the weighted sum of the pixels is calculated as WTX + b and a new value is obtained. A single filter will result in a volume of size 28x28x1 as shown above.

Note that multiple filters are generally run at each step. Therefore, if 10 filters are used, the output would look like:

Here the filter weights are parameters which are learned during the back-propagation step. You might have noticed that we got a 28×28 block as output when the input was 32×32. Why so? Let’s look at a simpler case.

Suppose the initial image had size 6x6xd and the filter has size 3x3xd. Here I’ve kept the depth as d because it can be anything and it’s immaterial as it remains the same in both. Since depth is same, we can have a look at the front view of how filter would work:

Here we can see that the result would be 4x4x1 volume block. Notice there is a single output for entire depth of the each location of filter. But you need not do this visualization all the time. Let’s define a generic case where image has dimension NxNxd and filter has FxFxd. Also, lets define another term stride (S) here which is the number of cells (in above matrix) to move in each step. In the above case, we had a stride of 1 but it can be a higher value as well. So the size of the output will be:

output size = (N – F)/S + 1

You can validate the first case where N=32, F=5, S=1. The output had 28 pixels which is what we get from this formula as well. Please note that some S values might result in non-integer result and we generally don’t use such values.

Let’s consider an example to consolidate our understanding. Starting with the same image as before of size 32×32, we need to apply 2 filters consecutively, first 10 filters of size 7, stride 1 and next 6 filters of size 5, stride 2. Before looking at the solution below, just think about 2 things:

What should be the depth of each filter?

What will the resulting size of the images in each step.

Here is the answer:

Notice here that the size of the images is getting shrunk consecutively. This will be undesirable in case of deep networks where the size would become very small too early. Also, it would restrict the use of large size filters as they would result in faster size reduction.

To prevent this, we generally use a stride of 1 along with zero-padding of size (F-1)/2. Zero-padding is nothing but adding additional zero-value pixels towards the border of the image.

Consider the example we saw above with 6×6 image and 3×3 filter. The required padding is (3-1)/2=1. We can visualize the padding as:

Here you can see that the image now becomes 8×8 because of padding of 1 on each side. So now the output will be of size 6×6 same as the original image.

Now let’s summarize a convolution layer as following:

Input size: W1 x H1 x D1


K: #filters

F: filter size (FxF)

S: stride

P: amount of padding

Output size: W2 x H2 x D2




#parameters = (F.F.D).K + K

F.F.D : Number of parameters for each filter (analogous to volume of the cuboid)

(F.F.D).K : Volume of each filter multiplied by the number of filters

+K: adding K parameters for the bias term

Some additional points to be taken into consideration:

K should be set as powers of 2 for computational efficiency

F is generally taken as odd number

F=1 might sometimes be used and it makes sense because there is a depth component involved

Filters might be called kernels sometimes

Having understood the convolution layer, lets move on to pooling layer.

Pooling Layer

When we use padding in convolution layer, the image size remains same. So, pooling layers are used to reduce the size of image. They work by sampling in each layer using filters. Consider the following 4×4 layer. So if we use a 2×2 filter with stride 2 and max-pooling, we get the following response:

Here you can see that 4 2×2 matrix are combined into 1 and their maximum value is taken. Generally, max-pooling is used but other options like average pooling can be considered.

Fully Connected Layer

At the end of convolution and pooling layers, networks generally use fully-connected layers in which each pixel is considered as a separate neuron just like a regular neural network. The last fully-connected layer will contain as many neurons as the number of classes to be predicted. For instance, in CIFAR-10 case, the last fully-connected layer will have 10 neurons.

5. Case Study: AlexNet

I recommend reading the prior section multiple times and getting a hang of the concepts before moving forward.

In this section, I will discuss the AlexNet architecture in detail. To give you some background, AlexNet is the winning solution of IMAGENET Challenge 2012. This is one of the most reputed computer vision challenge and 2012 was the first time that a deep learning network was used for solving this problem.

Also, this resulted in a significantly better result as compared to previous solutions. I will share the network architecture here and review all the concepts learned above.

The detailed solution has been explained in this paper. I will explain the overall architecture of the network here. The AlexNet consists of a 11 layer CNN with the following architecture:

Here you can see 11 layers between input and output. Lets discuss each one of them individually. Note that the output of each layer will be the input of next layer. So you should keep that in mind.

Layer 0: Input image

Size: 227 x 227 x 3

Note that in the paper referenced above, the network diagram has 224x224x3 printed which appears to be a typo.

Layer 1: Convolution with 96 filters, size 11×11, stride 4, padding 0

Size: 55 x 55 x 96

(227-11)/4 + 1 = 55 is the size of the outcome

96 depth because 1 set denotes 1 filter and there are 96 filters

Layer 2: Max-Pooling with 3×3 filter, stride 2

Size: 27 x 27 x 96

(55 – 3)/2 + 1 = 27 is size of outcome

depth is same as before, i.e. 96 because pooling is done independently on each layer

Layer 3: Convolution with 256 filters, size 5×5, stride 1, padding 2

Size: 27 x 27 x 256

Because of padding of (5-1)/2=2, the original size is restored

256 depth because of 256 filters

Layer 4: Max-Pooling with 3×3 filter, stride 2

Size: 13 x 13 x 256

(27 – 3)/2 + 1 = 13 is size of outcome

Depth is same as before, i.e. 256 because pooling is done independently on each layer

Layer 5: Convolution with 384 filters, size 3×3, stride 1, padding 1

Size: 13 x 13 x 384

Because of padding of (3-1)/2=1, the original size is restored

384 depth because of 384 filters

Layer 6: Convolution with 384 filters, size 3×3, stride 1, padding 1

Size: 13 x 13 x 384

Because of padding of (3-1)/2=1, the original size is restored

384 depth because of 384 filters

Layer 7: Convolution with 256 filters, size 3×3, stride 1, padding 1

Size: 13 x 13 x 256

Because of padding of (3-1)/2=1, the original size is restored

256 depth because of 256 filters

Layer 8: Max-Pooling with 3×3 filter, stride 2

Size: 6 x 6 x 256

(13 – 3)/2 + 1 = 6 is size of outcome

Depth is same as before, i.e. 256 because pooling is done independently on each layer

Layer 9: Fully Connected with 4096 neuron

In this later, each of the 6x6x256=9216 pixels are fed into each of the 4096 neurons and weights determined by back-propagation.

Layer 10: Fully Connected with 4096 neuron

Similar to layer #9

Layer 11: Fully Connected with 1000 neurons

This is the last layer and has 1000 neurons because IMAGENET data has 1000 classes to be predicted.

I understand this is a complicated structure but once you understand the layers, it’ll give you a much better understanding of the architecture. Note that you fill find a different representation of the structure if you look at the AlexNet paper. This is because at that GPUs were not very powerful and they used 2 GPUs for training the network. So the work processing was divided between the two.

ZFNet: winner of 2013 challenge

GoogleNet: winner of 2014 challenge

VGGNet: a good solution from 2014 challenge

ResNet: winner of 2024 challenge designed by Microsoft Research Team

This video gives a brief overview and comparison of these solutions towards the end.

6. Implementing CNNs using GraphLab

Having understood the theoretical concepts, lets move on to the fun part (practical) and make a basic CNN on the CIFAR-10 dataset which we’ve downloaded before.

I’ll be using GraphLab for the purpose of running algorithms. Instead of GraphLab, you are free to use alternatives tools such as Torch, Theano, Keras, Caffe, TensorFlow, etc. But GraphLab allows a quick and dirty implementation as it takes care of the weights initializations and network architecture on its own.

We’ll work on the CIFAR-10 dataset which you can download from here. The first step is to load the data. This data is packed in a specific format which can be loaded using the following code:

import pandas as pd import numpy as np import cPickle #Define a function to load each batch as dictionary: def unpickle(file): fo = open(file, 'rb') dict = cPickle.load(fo) fo.close() return dict #Make dictionaries by calling the above function: batch1 = unpickle('data/data_batch_1') batch2 = unpickle('data/data_batch_2') batch3 = unpickle('data/data_batch_3') batch4 = unpickle('data/data_batch_4') batch5 = unpickle('data/data_batch_5') batch_test = unpickle('data/test_batch') #Define a function to convert this dictionary into dataframe with image pixel array and labels: def get_dataframe(batch): df = pd.DataFrame(batch['data']) df['image'] = df.as_matrix().tolist() df.drop(range(3072),axis=1,inplace=True) df['label'] = batch['labels'] return df #Define train and test files: train = pd.concat([get_dataframe(batch1),get_dataframe(batch2),get_dataframe(batch3),get_dataframe(batch4),get_dataframe(batch5)],ignore_index=True) test = get_dataframe(batch_test)

We can verify this data by looking at the head and shape of data as follow:

print train.head()

print train.shape, test.shape

Since we’ll be using graphlab, the next step is to convert this into a graphlab SFrame and run neural network. Let’s convert the data first:

import graphlab as gl gltrain = gl.SFrame(train) gltest = gl.SFrame(test) model = gl.neuralnet_classifier.create(gltrain, target='label', validation_set=None)

Here it used a simple fully connected network with 2 hidden layers and 10 neurons each. Let’s evaluate this model on test data.


As you can see that we have a pretty low accuracy of ~15%. This is because it is a very fundamental network. Lets try to make a CNN now. But if we go about training a deep CNN from scratch, we will face the following challenges:

The available data is very less to capture all the required features

Training deep CNNs generally requires a GPU as a CPU is not powerful enough to perform the required calculations. Thus we won’t be able to run it on our system. We can probably rent an Amazom AWS instance.

To overcome these challenges, we can use pre-trained networks. These are nothing but networks like AlexNet which are pre-trained on many images and the weights for deep layers have been determined. The only challenge is to find a pre-trianed network which has been trained on images similar to the one we want to train. If the pre-trained network is not made on images of similar domain, then the features will not exactly make sense and classifier will not be of higher accuracy.

Before proceeding further, we need to convert these images into the size used in ImageNet which we’re using for classification. The GraphLab model is based on 256×256 size images. So we need to convert our images to that size. Lets do it using the following code:

#Convert pixels to graphlab image format gltrain['glimage'] = gl.SArray(gltrain['image']).pixel_array_to_image(32, 32, 3, allow_rounding = True) gltest['glimage'] = gl.SArray(gltest['image']).pixel_array_to_image(32, 32, 3, allow_rounding = True) #Remove the original column gltrain.remove_column('image') gltest.remove_column('image') gltrain.head()

Here we can see that a new column of type graphlab image has been created but the images are in 32×32 size. So we convert them to 256×256 using following code:

#Convert into 256x256 size gltrain['image'] = gl.image_analysis.resize(gltrain['glimage'], 256, 256, 3) gltest['image'] = gl.image_analysis.resize(gltest['glimage'], 256, 256, 3) #Remove old column: gltrain.remove_column('glimage') gltest.remove_column('glimage') gltrain.head()

Now we can see that the image has been converted into the desired size. Next, we will load the ImageNet pre-trained model in graphlab and use the features created in its last layer into a simple classifier and make predictions.

Lets start by loading the pre-trained model.

#Load the pre-trained model:

Now we have to use this model and extract features which will be passed into a classifier. Note that the following operations may take a lot of computing time. I use a Macbook Pro 15″ and I had to leave it for whole night!

gltrain['features'] = pretrained_model.extract_features(gltrain) gltest['features'] = pretrained_model.extract_features(gltest)

Lets have a look at the data to make sure we have the features:


Though, we have the features with us, notice here that lot of them are zeros. You can understand this as a result of smaller data set. ImageNet was created on 1.2Mn images. So there would be many features in those images that don’t make sense for this data, thus resulting in zero outcome.

simple_classifier = graphlab.classifier.create(gltrain, features = ['features'], target = 'label')

The various outputs are:

The final model selection is based on a validation set with 5% of the data. The results are:

So we can see that Boosted Trees Classifier has been chosen as the final model. Let’s look at the results on test data:


So we can see that the test accuracy is now ~50%. It’s a decent jump from 15% to 50% but there is still huge potential to do better. The idea here was to get you started and I will skip the next steps. Here are some things which you can try:

Remove the redundant features in the data

Perform hyper-parameter tuning in models

Search for pre-trained models which are trained on images similar to this dataset


Now, its time to take the plunge and actually play with some other real datasets. So are you ready to take on the challenge? Accelerate your deep learning journey with the following Practice Problems:

End Notes

In this article, we covered the basics of computer vision using deep Convolution Neural Networks (CNNs). We started by appreciating the challenges involved in designing artificial systems which mimic the eye. Then, we looked at some of the traditional techniques, prior to deep learning, and got some intuition into their drawbacks.

We moved on to understanding the some aspects of tuning a neural networks such as activation functions, weights initialization and data-preprocessing. Next, we got some intuition into why deep CNNs should work better than traditional approaches and we understood the different elements present in a general deep CNN.

Subsequently, we consolidated our understanding by analyzing the architecture of AlexNet, the winning solution of ImageNet 2012 challenge. Finally, we took the CIFAR-10 data and implemented a CNN on it using a pre-trained AlexNet deep network.

You can test your skills and knowledge. Check out Live Competitions and compete with best Data Scientists from all over the world.


A Beginner’s Guide To Boosting Your Holiday Sales Online

15. Create Urgency 14. Partner With a Non-Profit

Besides finding gifts for our friends and family, this is also the season of giving. Team up with a non-profit and donate a percentage of sales to that charity. It will make people feel better about purchasing something from your site since you’re giving back.

13. Bundle Products

Think of what Amazon does here. When you search for one product, Amazon will offer a couple of other suggestions that can be bundled together at a discount. You could also offer gift sets to stand out from other retailers. A nice move if you want to increase sales.

12. Email Previous Customers

11. Offer Holiday Bonuses/Coupons/Upsells

People enjoy being rewarded for being a customer. Offer a complimentary gift when you purchase a certain amount, coupons or something extra like free gift-wrapping to attract repeat customers.

10. Great Customer Service

People want to know that if there is a problem with their order or if they have a question regarding a product that they can reach a real, live person. Place your customer service number prominently throughout your site and have a live person on the other end of the phone. This is how Zappos became such a juggernaut.

9. Free Shipping

We all love freebies. And, does it get any better when you notice that a site offers free shipping? An essential move for people who are most-likely on a budget throughout the holidays.

8. Personal Suggestions/Experiences

Customers are constantly searching for gift recommendations or what the must-have item is this year. Offer consumers gift ideas through a list that was personally written by you or a ghost writer. Go the extra mile and share your own personal experiences with that product as well. Besides providing some personality, people may be more inclined to purchase a product if they know it’s been recommended by a real person.

7. Offer One Day Sale or Discounts

Pretty much every online retailer will be offering some sort of sale or discount during the holidays. So, this should be a no-brainer. If you don’t offer a one-day sale or end of year discounts, people will move on to a site that has the better deal.

6. Host Contests

Contests are a proven technique to gain attention. Whether it’s a wish list or caption contest, they are a simple and effective way to draw visitors to your site, which will hopefully result in more sales.

5. Use Social Media

There’s so much more you can do with social media than just informing customers that you have a website where they can buy stuff. Have a holiday flash sale via Facebook. Post hashtags on Twitter that link to discounts or coupons. Conduct a Pinterest holiday board contest. There a number of ways that you can use social media networks to bring people to your site for the final sale.

4. Make Sure You’re Smartphone Ready

The Google survey we mentioned earlier also discovered that three-quarters of smart phone owners will browse on their phones this season. Common sense, right here. You need to make sure that your site is compatible with smartphones. Whether that’s by having the correct size images or making sure that your checkout works properly, your site must be effective on these devices.

3. Offer Gift Cards/Certificates

Gift cards and gift certificates are big business, which is why your site must absolutely offer one. Be certain that your site has a section devoted to gift cards that is prominently displayed on the homepage. It also wouldn’t hurt for the card to be enclosed in a decorated box or envelope.

2. Feature Holiday Themes

Since everyone is in the holiday spirit, make sure that the graphics on your site are just as festive – just to remind people it is indeed the holidays! This could also be good chance to highlight some of your products, if you sell goods, right on the homepage. Also, make sure that all your social networks have holiday themed content. Tis’ the season.

1. Follow Social Media

Keep up with trends via social media. We’re not just talking about only Facebook and Twitter, but also Instagram, Tumblr, and Pinterest. By following trends on social networks you’ll be aware of what items shoppers are searching for this season so you know what to push.

Beginner’s Guide To Web Scraping In Python Using Beautifulsoup


Learn web scraping in Python using the BeautifulSoup library

Web Scraping is a useful technique to convert unstructured data on the web to structured data

BeautifulSoup is an efficient library available in Python to perform web scraping other than urllib

A basic knowledge of HTML and HTML tags is necessary to do web scraping in Python


The need and importance of extracting data from the web is becoming increasingly loud and clear. Every few weeks, I find myself in a situation where we need to extract data from the web to build a machine learning model.

For example, last week we were thinking of creating an index of hotness and sentiment about various data science courses available on the internet. This would not only require finding new courses, but also scraping the web for their reviews and then summarizing them in a few metrics!

This is one of the problems / products whose efficacy depends more on web scraping and information extraction (data collection) than the techniques used to summarize the data.

Note: We have also created a free course for this article – Introduction to Web Scraping using Python. This structured format will help you learn better.

Ways to extract information from web

There are several ways to extract information from the web. Use of APIs being probably the best way to extract data from a website. Almost all large websites like Twitter, Facebook, Google, Twitter, StackOverflow provide APIs to access their data in a more structured manner. If you can get what you need through an API, it is almost always preferred approach over web scraping. This is because if you are getting access to structured data from the provider, why would you want to create an engine to extract the same information.

Sadly, not all websites provide an API. Some do it because they do not want the readers to extract huge information in a structured way, while others don’t provide APIs due to lack of technical knowledge. What do you do in these cases? Well, we need to scrape the website to fetch the information.

There might be a few other ways like RSS feeds, but they are limited in their use and hence I am not including them in the discussion here.

What is Web Scraping?

You can perform web scraping in various ways, including use of Google Docs to almost every programming language. I would resort to Python because of its ease and rich ecosystem. It has a library known as ‘BeautifulSoup’ which assists this task. In this article, I’ll show you the easiest way to learn web scraping using python programming.

For those of you, who need a non-programming way to extract information out of web pages, you can also look at . It provides a GUI driven interface to perform all basic web scraping operations. The hackers can continue to read this article!

Libraries required for web scraping

As we know, Python is an open source programming language. You may find many libraries to perform one function. Hence, it is necessary to find the best to use library. I prefer BeautifulSoup (Python library), since it is easy and intuitive to work on. Precisely, I’ll use two Python modules for scraping data:

Urllib2: It is a Python module which can be used for fetching URLs. It defines functions and classes to help with URL actions (basic and digest authentication, redirections, cookies, etc). For more detail refer to the documentation page. Note: urllib2 is the name of the library included in Python 2. You can use the urllib.request library included with Python 3, instead. The urllib.request library works the same way urllib.request works in Python 2. Because it is already included you don’t need to install it.

BeautifulSoup: It is an incredible tool for pulling out information from a webpage. You can use it to extract tables, lists, paragraph and you can also put filters to extract information from web pages. In this article, we will use latest version BeautifulSoup 4. You can look at the installation instruction in its documentation page.

BeautifulSoup does not fetch the web page for us. That’s why, I use urllib2 in combination with the BeautifulSoup library.

Python has several other options for HTML scraping in addition to BeatifulSoup. Here are some others:

Basics – Get familiar with HTML (Tags)

While performing web scarping, we deal with html tags. Thus, we must have good understanding of them. If you already know basics of HTML, you can skip this section. Below is the basic syntax of HTML:This syntax has various tags as elaborated below:

Other useful HTML tags are:

If you are new to this HTML tags, I would also recommend you to refer HTML tutorial from W3schools. This will give you a clear understanding about HTML tags.

Scraping a web page using BeautifulSoup

Here, I am scraping data from a Wikipedia page. Our final goal is to extract list of state, union territory capitals in India. And some basic detail like establishment, former capital and others form this wikipedia page. Let’s learn with doing this project step wise step:

#import the library used to query a website import urllib2 #if you are using python3+ version, import urllib.request #specify the url #Query the website and return the html to the variable 'page' page = urllib2.urlopen(wiki) #For python 3 use urllib.request.urlopen(wiki) #import the Beautiful soup functions to parse the data returned from the website from bs4 import BeautifulSoup #Parse the html in the 'page' variable, and store it in Beautiful Soup format soup = BeautifulSoup(page) Above, you can see that structure of the HTML tags. This will help you to know about different available tags and how can you play with these to extract information.

Work with HTML tags


In [38]:






Out[38]:u'List of state and union territory capitals in India - Wikipedia, the free encyclopedia'

In [40]:





Above, it is showing all links including titles, links and other information.  Now to show only links, we need to iterate over each a tag and then return the link using attribute “href” with get.

Find the right table: As we are seeking a table to extract information about state capitals, we should identify the right table first. Let’s write the command to extract information within all table tags. all_tables=soup.find_all('table') right_table=soup.find('table', class_='wikitable sortable plainrowheaders') right_table Above, we are able to identify right table.

#Generate lists A=[] B=[] C=[] D=[] E=[] F=[] G=[] for row in right_table.findAll("tr"): cells = row.findAll('td') states=row.findAll('th') #To store second column data if len(cells)==6: #Only extract table body not heading A.append(cells[0].find(text=True)) B.append(states[0].find(text=True)) C.append(cells[1].find(text=True)) D.append(cells[2].find(text=True)) E.append(cells[3].find(text=True)) F.append(cells[4].find(text=True)) G.append(cells[5].find(text=True)) #import pandas to convert list to data frame import pandas as pd df=pd.DataFrame(A,columns=['Number']) df['State/UT']=B df['Admin_Capital']=C df['Legislative_Capital']=D df['Judiciary_Capital']=E df['Year_Capital']=F df['Former_Capital']=G df

Similarly, you can perform various other types of web scraping using “BeautifulSoup“. This will reduce your manual efforts to collect data from web pages. You can also look at the other attributes like .parent, .contents, .descendants and .next_sibling, .prev_sibling and various attributes to navigate using tag name. These will help you to scrap the web pages effectively.-

But, why can’t I just use Regular Expressions?

Now, if you know regular expressions, you might be thinking that you can write code using regular expression which can do the same thing for you. I definitely had this question. In my experience with BeautifulSoup and Regular expressions to do same thing I found out:

Code written in BeautifulSoup is usually more robust than the one written using regular expressions. Codes written with regular expressions need to be altered with any changes in pages. Even BeautifulSoup needs that in some cases, it is just that BeautifulSoup is relatively better.

Regular expressions are much faster than BeautifulSoup, usually by a factor of 100 in giving the same outcome.

So, it boils down to speed vs. robustness of the code and there is no universal winner here. If the information you are looking for can be extracted with simple regex statements, you should go ahead and use them. For almost any complex work, I usually recommend BeautifulSoup more than regex.

End Note

In this article, we looked at web scraping methods using “BeautifulSoup” and “urllib2” in Python. We also looked at the basics of HTML and perform the web scraping step by step while solving a challenge. I’d recommend you to practice this and use it for collecting data from web pages.

Note: We have also created a free course for this article – Introduction to Web Scraping using Python. This structured format will help you learn better.

If you like what you just read & want to continue your analytics learning, subscribe to our emails, follow us on twitter or like our facebook page.


Here Are 8 Powerful Sessions To Learn The Latest Computer Vision Techniques

Do you want to build your own smart city?

Picture it – self-driving cars strolling around, traffic lights optimised to maintain a smooth flow, everything working at the touch of your fingers. If this is the future you dream of, then you’ve come to the right place.

“If We Want Machines to Think, We Need to Teach Them to See.” – Fei-Fei Li

Now, I want you take five seconds (exactly five), and look around you. How many objects did you notice? We have a remarkably good sense of observation but it’s impossible to notice and remember everything.

The beauty about training our machines is that they notice even the most granular details – and they retain them until we want them to.

Think about it – from airport face detection applications to your local store’s bar scanner, computer vision use cases are all around us. Of course your smartphone is the most relatable example – we use it to unlock our phone. How does that happen? Face detection using computer vision!

Honestly, the use cases of computer vision are limitless. It is revolutionising sectors from agriculture to banking, from hospitality to security, and much more. In short, there is a lot of demand for computer vision experts – are you game to step up and fill the gap?

We’re thrilled to present you a chance to learn the latest computer vision libraries, frameworks and developments form leading data scientists and AI experts at DataHack Summit 2023! Want to learn how to build your own image tagging system? Or how to create and deploy your own yoga trainer? Or how about morphing images using the popular GAN models?

Well – what are you waiting for? Tickets are almost sold out so

Let’s take a spin around the various computer vision topics that’ll be covered at DataHack Summit 2023.

Hack Sessions and Power Talks on Computer Vision at DataHack Summit 2023

Morphing images using Deep Generative Models (GANs)

Image ATM (Automatic Tagging Machine) – Image Classification for Everyone

Deep Learning for Aesthetics: Training a Machine to See What’s Beautiful

Creating and Deploying a Pocket Yoga Trainer using Deep Learning

Content-Based Recommender System using Transfer Learning

Generating Synthetic Images from Textual Description using GANs

Haptic Learning – Inferring Anatomical Features using Deep Networks

Feature Engineering for Image Data

Hack sessions are one-hour hands-on coding sessions on the latest frameworks, architectures and libraries in machine learning, deep learning, reinforcement learning, NLP, and other domains.

Morphing Images using Deep Generative Models (GANs) by Xander Steenbrugge

GANs have seen amazing progress ever since Ian Goodfellow went mainstream with the concept in 2014. There have been several iterations since, including BigGAN and StyleGAN. We are at a point where humans are unable to differentiate between images generated by GANs and the original image.

But what do we do with these models? It seems like you can only use them to sample random images, right? Well, not entirely. It turns out that Deep Generative models learn a surprising amount of structure about the dataset they are trained on.

Our rockstar speaker, Xander Steenbrugge, will be taking a hands-on hack session on this topic at DataHack Summit 2023. Xander will explain how you can leverage this structure to deliberately manipulate image attributes by adjusting image representations in the latent space of a GAN.

This hack session will use GPU-powered Google Colab notebooks so you can reproduce all the results for yourself!

Here’s Xander elaborating on what you can expect to learn from this hack session:

I recommend checking out the two guides below if you are new to GANs:

Labeling our data is one of the most time consuming and mind numbing tasks a data scientist can do. Anyone who has worked with unlabelled images will understand the pain. So is there a way around this?

There sure is – you can automate the entire labelling process using deep learning! And who better to learn this process than a person who led the entire project?

Dat Tran, Head of AI at Axel Springer Ideas Engineering, will be taking a hands-on hack session on “Image ATM (Automatic Tagging Machine) – Image Classification for Everyone”.

With the help of transfer learning, Image ATM enables the user to train a Deep Learning model without knowledge or experience in the area of Machine Learning. All you need is data and spare couple of minutes!

In this hack session, he will discuss the state-of-art technologies available for image classification and present Image ATM in the context of these technologies.

It’s one of the most fascinating hack sessions on computer vision – I can’t wait to watch Dat unveil the code.

Here’s Dat with a quick explainer about what you can expect from this hack session:

I would recommend going through the below article before you join Dat for his session at DataHack Summit 2023:

I would recommend going through the below article before you join Dat for his session at DataHack Summit 2023:

Deep Learning for Aesthetics: Training a Machine to See What’s Beautiful by Dat Tran

Source: TechCrunch

There’s more from Dat! We know how much our community is looking forward to hearing from him, so we’ve pencilled him in for another session. And this one is as intriguing at the above Image ATM concept.

Have you ever reserved a hotel room online from a price comparison website? Do you know there are hundreds of images to choose from before any website posts hotels for listing? We see the nice images but there’s a lot of effort that goes on behind the scenes.

Imagine the pain of manually selecting images for each hotel listing. It’s a crazy task! But as you might have guessed already – deep learning takes away this pain in spectacular fashion.

In this Power Talk, Dat will present how his team solved this difficult problem. In particular, he will share his team’s training approaches and the peculiarities of the models. He will also show the “little tricks” that were key to solving this problem.

Here’s Dat again expanding on the key takeaways from this talk:

I recommend the below tutorial if you are new to Neural Networks:

Creating and Deploying a Pocket Yoga Trainer using Deep Learning by Mohsin Hasan and Apurva Gupta

This is one of my personal favourites. And I’m sure a lot of you will be able to relate this as well, especially if you’ve set yourself fitness goals and never done anything about it. 🙂

It is quite difficult to keep to a disciplined schedule when our weekdays are filled with work. Yes, you can work out at home but then are you doing it correctly? Is it even helping you achieve your objective?

Well – this intriguing hack session by Mohsin Hasan and Apurva Gupta might be the antidote to your problems! They will showcase how to build a model that teaches exercise with continuous visual feedback and keeps you engaged.

And they’ll be doing a live demo of their application as well!

Here are the key takeaways explained by both our marvelous speakers:

This is why you can’t miss being at DataHack Summit 2023!

Content-Based Recommender System using Transfer Learning by Sitaram Tadepalli

Recommendation engines are all the rage in the industry right now. Almost every B2C organisation is leaning heavily on recommendation engines to prop up their bottomline and drive them into a digital future.

All of us have interacted with these recommendation engines at some point. Amazon, Flipkart, Netflix, Hotstar, etc. – all of these platforms have recommendation engines at the heart of their business strategy.

As a data scientist, analyst, CxO, project manager or whatever level you’re at – you need to know how to harness the power of recommendation engines.

In this unique hack session by Sitaram Tadepalli, an experienced Data Scientist at TCS, you will learn how to build content-based recommender systems using image data.

Sitaram elaborates in the below video on what he plans to cover in this hack session:

Here are a few resources I recommend going through to brush up your Recommendation Engine skills:

Generating Synthetic Images from Textual Description using GANs by Shibsankar Das

Here’s another fascinating hack session on GANs!

Generating captions about an image is a useful application of computer vision. But how about the other way round? What if you could build a computer vision model that could generate images using a small string of text we provide?

It’s entirely possible thanks to GANs!

Synthetic image generation is actually gaining quite a lot of popularity in the medical field. Synthetic images have the potential to improve diagnostic reliability, allowing data augmentation in computer-assisted diagnosis. Likewise, this has a lot of possibilities across various domains.

In the hack session by Shibsankar Das, you will discover how GANs can be leveraged to generate a synthetic image given a textual demonstration about the image. The session will have tutorials on how to build a text-to-image model from scratch.

Key Takeaways from this Hack Session:

End to end understanding of GANs

Implement GANs from scratch

Understand how to use Adversarial training to solve Domain gap alignment

I would suggest you go through this article to gain a deeper understanding of GANs before attending the session:

Haptic Learning – Inferring Anatomical Features using Deep Networks by Akshay Bahadur

A machine learning model consists of an algorithm that draws some meaningful correlation between data without being tightly coupled to a specific set of rules. It’s crucial to explain the subtle nuances of the network and the use-case we are trying to solve.

The main question, however, is to discuss the need to eliminate an external haptic system and use something which feels natural and inherent to the user.

In this hack session, Akshay Bahadur will talk about the development of applications specifically aimed to localize and recognize human features which could then, in turn, be used to provide haptic feedback to the system.

These applications will range from recognizing digits and alphabets which the user can ‘draw’ at runtime; developing state of the art facial recognition systems; predicting hand emojis along with Google’s project of ‘Quick, Draw’ of hand doodles, and more.

Key Takeaways from this Hack Session:

Gain an understanding of building vision-based optimized models which can take feedback from anatomical features

Learn how to proceed while building such a computer vision model

Feature Engineering for Image Data by Aishwarya Singh and Pulkit Sharma

Feature engineering is an often used tool in a data scientist’s armoury. But that’s typically when we’re working with tabular numerical data, right? How does it work when we need to build a model using images?

There’s a strong belief that when it comes to working with unstructured image data, deep learning models are the way forward. Deep learning techniques undoubtedly perform extremely well, but is that the only way to work with images?

Not really! And that’s where the fun begins.

Our very own data scientists Aishwarya Singh and Pulkit Sharma will be presenting a very code-oriented hack session on how you can engineer features for image data.

Key Takeaways from this Hack Session:

Learn how to extract primary features from images, like edge features, HOG and SIFT features

Extracting image features using Convolutional Neural Networks (CNNs)

Building an Image classification model using Machine Learning

Performance comparison among primary and CNN features using Machine Learning Models

End Notes

I can’t wait to see these amazing hack sessions and power talks at DataHack Summit 2023. The future is coming quicker than most people imagine – and this is the perfect time to get on board and learn how to program it yourself.

If you haven’t yet booked your seat yet, then here is a great chance for you to do it right away! Hurry, as there are only a few seats remaining for India’s Largest Conference on Applied Artificial Intelligence & Machine Learning.

I am looking forward to networking with you there!


Using Slicers In Excel Pivot Table – A Beginner’s Guide

A Pivot Table Slicer enables you to filter the data when you select one or more than one options in the Slicer box (as shown below).

Let’s get started.

Suppose you have a dataset as shown below:

This is a dummy data set (US retail sales) and spans across 1000 rows. Using this data, we have created a Pivot Table that shows the total sales for the four regions.

Read More: How to Create a Pivot Table from Scratch.

Once you have the Pivot Table in place, you can insert Slicers.

One may ask – Why do I need Slicers? 

You may need slicers when you don’t want the entire Pivot Table, but only a part of it. For example, if you don’t want to see the sales for all the regions, but only for South, or South and West, then you can insert the slicer and quickly select the desired region(s) for which you want to get the sales data.

Slicers are a more visual way that allows you to filter the Pivot Table data based on the selection.

Here are the steps to insert a Slicer for this Pivot Table:

Select any cell in the Pivot Table.

In the Insert Slicers dialog box, select the dimension for which you the ability to filter the data. The Slicer Box would list all the available dimensions and you can select one or more than one dimensions at once. For example, if I only select Region, it will insert the Region Slicer box only, and if I select Region and Retailer Type both, then it’ll insert two Slicers.

Note that Slicer would automatically identify all the unique items of the selected dimension and list it in the slicer box.

You can also insert multiple slicers by selecting more than one dimension in the Insert Slicers dialog box.

To insert multiple slicers:

Select any cell in the Pivot Table.

In the Insert Slicers dialog box, select all the dimensions for which you want to get the Slicers.

This will insert all the selected Slicers in the worksheet.

Note that these slicers are linked to each other. For example, If I select ‘Mid West’ in the Region filter and ‘Multiline’ in the Retailer Type filter, then it will show the sales for all the Multiline retailers in Mid West region only.

Also, if I select Mid West, note that the Specialty option in the second filter gets a lighter shade of blue (as shown below). This indicates that there is no data for Specialty retailer in the Mid West region.

What’s the difference between Slicers and Report Filters?

Here are some key differences between Slicers and Report Filters:

Slicers don’t occupy a fixed cell in the worksheet. You can move these like any other object or shape. Report Filters are tied to a cell.

Report filters are linked to a specific Pivot Table. Slicers, on the other hand, can be linked to multiple Pivot Tables (as we will see later in this tutorial).

Since a report filter occupies a fixed cell, it’s easier to automate it via VBA. On the other hand, a slicer is an object and would need a more complex code.

A Slicer comes with a lot of flexibility when it comes to formatting.

Here are the things that you can customize in a slicer.

If you don’t like the default colors of a slicer, you can easily modify it.

Select the slicer.

If you don’t like the default styles, you can create you own. To do this, select the New Slicer Style option and specify your own formatting.

By default, a Slicer has one column and all the items of the selected dimension are listed in it. In case you have many items, Slicer shows a scroll bar that you can use to go through all the items.

You may want to have all the items visible without the hassle of scrolling. You can do that by creating multiple column Slicer.

To do this:

Select the Slicer.

Change the Columns value to 2.

This will instantly split the items in the Slicer into two column. However, you may get something looking as awful as shown below:

This looks cluttered and the full names are not displayed. To make it look better, you change the size of the slicer and even the buttons within it.

To do this:

Select the Slicer.

Change Height and Width of the Buttons and the Slicer. (Note that you can also change the size of the slicer by simply selecting it and using the mouse to adjust the edges. However, to change the button size, you need to make the changes in the Options only).

By default, a Slicer picks the field name from the data. For example, if I create a slicer for Regions, the header would automatically be ‘Region’.

You may want to change the header or completely remove it.

Here are the steps:

In the Slicer Settings dialog box, change the header caption to what you want.

This would change the header in the slicer.

If you don’t want to see the header, uncheck the Display Header option in the dialog box.

By default, the items in a Slicer are sorted in an ascending order in case of text and Older to Newer in the case of numbers/dates.

You can change the default setting and even use your own custom sort criteria.

Here is how to do this:

In the Slicer Settings dialog box, you can change the sorting criteria, or use your own custom sorting criteria.

Read More: How to create custom lists in Excel (to create your own sorting criteria)

It may happen that some of the items in the Pivot Table have no data in it. In such cases, you can make the Slicers hide that item.

In such cases, you can choose not display it at all.

Here are the steps to do this:

In the Slicer Settings dialog box, with the ‘Item Sorting and Filtering’ options, check the option ‘Hide items with no data’.

A slicer can be connected to multiple Pivot Tables. Once connected, you can use a single Slicer to filter all the connected Pivot Tables simultaneously.

Remember, to connect different Pivot Tables to a Slicer, the Pivot Tables need to share the same Pivot Cache. This means that these are either created using the same data, or one of the Pivot Table has been copied and pasted as a separate Pivot Table.

Read More: What is Pivot Table Cache and how to use it?

Below is an example of two different Pivot tables. Note that the Slicer in this case only works for the Pivot Table on the left (and has no effect on the one on the right).

To connect this Slicer to both the Pivot  Tables:

In the Report Connections dialog box, you will see all the Pivot Table names that share the same Pivot Cache. Select the ones you want to connect to the Slicer. In this case, I only have two Pivot Tables and I’ve connected both with the Slicer.

Now your Slicer is connected to both the Pivot Tables. When you make a selection in the Slicer, the filtering would happen in both the Pivot Tables (as shown below).

Just as you use a Slicer with a Pivot Table, you can also use it with Pivot Charts.

Something as shown below:

Here is how you can create this dynamic chart:

Make the fields selections (or drag and drop fields into the area section) to get the Pivot chart you want. In this example, we have the chart that shows sales by region for four quarters. (Read here on how to group dates as quarters).

Select the Slicer dimension you want with the Chart. In this case, I want the retailer types so I check that dimension.

Format the Chart and the Slicer and you’re done.

Note that you can connect multiple Slicers to the same Pivot Chart and you can also connect multiple charts to the same Slicer (the same way we connected multiple Pivot Tables to the same Slicer).

You May Also Like the Following Pivot Table Tutorials:

Update the detailed information about Beginner’s Guide To Object Detection For Computer Vision Project on the website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!