Trending December 2023 # Implementing Artificial Neural Network(Classification) In Python From Scratch # Suggested January 2024 # Top 21 Popular

You are reading the article Implementing Artificial Neural Network(Classification) In Python From Scratch updated in December 2023 on the website We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested January 2024 Implementing Artificial Neural Network(Classification) In Python From Scratch

This article was published as a part of the Data Science Blogathon

Neural networks. One of the booming technological breakthroughs in the 21st century.

Are you interested in creating your own neural network from scratch in Python?. Well, you are at the right place. In this article, we will be creating an artificial neural network from scratch in python. The Artificial Neural Network that we are going to develop here is the one that will solve a classification problem. So stretch your fingers, and let’s get started.

Interesting Sidenote

Artificial Neural Networks(ANN) are part of supervised machine learning where we will be having input as well as corresponding output present in our dataset. Our whole aim is to figure out a way of mapping this input to the respective output. ANN can be used for solving both regression and classification problems.

From the perspective of this blog, we will be developing an ANN for solving the classification class of problems.

Pre-Requisites for Artificial Neural Network Implementation

Following will be the libraries and software that we will be needing in order to implement ANN.

1. Python – 3.6 or later

2. Jupyter Notebook ( Google Colab can also be used )

3. Pandas

4. Numpy

5. Tensorflow 2. x

6. Scikit-Learn

Understanding the Problem Statement for Artificial Neural Network

Here we are dealing with a dataset from the finance domain. We have a dataset where we are having 14 dimensions in total and 100000 records. The dimensions that we will be dealing with are as follows:-

RowNumber:- Represents the number of rows

CustomerId:- Represents customerId

Surname:- Represents surname of the customer

CreditScore:- Represents credit score of the customer

Geography:- Represents the city to which customers belongs to

Gender:- Represents Gender of the customer

Age:- Represents age of the customer

Tenure:- Represents tenure of the customer with a bank

Balance:- Represents balance hold by the customer

NumOfProducts:- Represents the number of bank services used by the customer

HasCrCard:- Represents if a customer has a credit card or not

IsActiveMember:- Represents if a customer is an active member or not

EstimatedSalary:- Represents estimated salary of the customer

Exited:- Represents if a customer is going to exit the bank or not.

Structure of dataset

As we can see from the above data dictionary, we are dealing with a total of 14 dimensions.

Here our main goal is to create an artificial neural network that will take into consideration all independent variables(first 13) and based on that will predict if our customer is going to exit the bank or not(Exited is dependent variable here).

Once we understand the steps for constructing neural networks, we can directly implement those same steps to other datasets as well.

One of the ways where we can find such datasets is the UCI machine learning repository. These datasets are classified into regression and classification problems. Since we are implementing this neural network to solve classification problems, you can download any classification dataset from there and can apply the same steps on any dataset of your choice !. How cool is that?

Importing Necessary Libraries for Artificial Neural Network

Let’s import all the necessary libraries here

#Importing necessary Libraries import numpy as np import pandas as pd import tensorflow as tf Importing Dataset

In this step, we are going to import our dataset. Since our dataset is in csv format, we are going to use the read_csv() method of pandas in order to load the dataset.

#Loading Dataset data = pd.read_csv("Churn_Modelling.csv") Generating Matrix of Features (X)

The basic principle while creating a machine learning model is to generate X also called as Matrix of Features. This X basically contains all our independent variables. Let’s create the same here.

Python Code:

Here I have used iloc method of Pandas data frame which allows us to fetch the desired values from the desired column within the dataset. Here as we can see that we are fetching all the data from the 3rd column till the last minus one column. The reason for that is the first 3 columns i.e RowNumber, CustomerId, and Surname have nothing to do with deciding whether the customer is going to exit or not. Hence in this case we started fetching all the values from the 3rd column onwards. Lastly, since our last column is basically a dependent variable hence we have mentioned -1 in iloc method using which allows us to exclude the last column from being included in our matrix of features X.

Generating Dependent Variable Vector(Y)

In the same fashion where we have created our matrix of features(X) for the independent variable, we also have to create a dependent variable vector(Y) which will only contain our dependent variable values.

#Generating Dependent Variable Vectors Y = data.iloc[:,-1].values Encoding Categorical Variable Gender

Now we have defined our X and Y, from this point on we are going to start with one of the highly time-consuming phases in any machine learning problem-solving. This phase is known as feature engineering. To define it in a simple manner, feature engineering is a phase where we either generate new variables from existing ones or modify existing variables so as to use them in our machine learning model.

In the above image depicting the structure of the dataset, we can see that most of the variables are numeric in nature with exception of a few – Gender, Country. Essentially, a machine learning model is a mathematical formula that is only going to accept digits as input. So we try to create an ML model using this dataset which contains a mix of data( numeric + string), our model will simply fail during the creation process itself. Hence we need to convert those string values into their numerical equivalent without losing their significance.

One of the most efficient ways of doing this is by using a technique called encoding. It is a process that will convert strings or categories directly into their numerical equivalent without losing significance.

Here our gender column has only 2 categories which are male and female, we are going to use LabelEncoding. This type of encoding will simply convert this column into a column having values of 0 and 1. In order to use Label Encoding, we are going to use LabelEncoder class from sklearn library.

#Encoding Categorical Variable Gender from sklearn.preprocessing import LabelEncoder LE1 = LabelEncoder() X[:,2] = np.array(LE1.fit_transform(X[:,2]))

Here we have applied label encoding on the Gender column of our dataset.

Encoding Categorical Variable Country

Now let’s deal with another categorical column named country. This column has a cardinality of 3 meaning that it has 3 distinct categories present i.e France, Germany, Spain.

Here we have 2 options:-

1. We can use Label Encoding here and directly convert those values into 0,1,2 like that

2. We can use One Hot Encoding here which will convert those strings into a binary vector stream. For example – Spain will be encoded as 001, France will be 010, etc.

The first approach is easy and faster to implement. However, once those values are encoded, those will be converted into 0,1,2. However, there does exist another method of encoding known as one-hot encoding. In one hot encoding, all the string values are converted into binary streams of 0’s and 1’s. One-hot encoding ensures that the machine learning algorithm does not assume that higher numbers are more important.

#Encoding Categorical variable Geography from sklearn.preprocessing import OneHotEncoder ct =ColumnTransformer(transformers=[('encoder',OneHotEncoder(),[1])],remainder="passthrough") X = np.array(ct.fit_transform(X))

Here we have used OneHotEncoder class from sklearn in order to perform one-hot encoding. Now you might have a query here. What is the use of ColumnTransformer? Well, ColumnTransformer is another class in sklearn that will allow us to select a particular column from our dataset on which we can apply one-hot encoding.

Splitting Dataset into Training and Testing Dataset

In this step, we are going to split our dataset into training and testing datasets. This is one of the bedrocks of the entire machine learning process. The training dataset is the one on which our model is going to train while the testing dataset is the one on which we are going to test the performance of our model.

#Splitting dataset into training and testing dataset from sklearn.model_selection import train_test_split X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.2,random_state=0)

Here we have used the train_test_split function from the sklearn library. We have split our dataset in a configuration such that 80 percent of data will be there in the training phase and 20 percent of data will be in the testing phase.

Additionally, the best part about using the train_test_split function from sklearn is that, while splitting it will also be performing data shuffling in order to create a more generalized dataset.

Performing Feature Scaling

The very last step in our feature engineering phase is feature scaling. It is a procedure where all the variables are converted into the same scale. Why you might ask?. Sometimes in our dataset, certain variables have very high values while certain variables have very low values. So there is a chance that during model creation, the variables having extremely high-value dominate variables having extremely low value. Because of this, there is a possibility that those variables with the low value might be neglected by our model, and hence feature scaling is necessary.

Now here I am going to answer one of the most important questions asked in a machine learning interview. ” When to perform feature scaling – before the train-test split or after the train-test split?”.

Well, the answer is after we split the dataset into training and testing datasets. The reason being, the training dataset is something on which our model is going to train or learned itself. While the testing dataset is something on which our model is going to be evaluated. If we perform feature scaling before the train-test split then it will cause information leakage on testing datasets which neglects the purpose of having a testing dataset and hence we should always perform feature scaling after the train-test split.

Now how we are going to perform feature scaling? Well, there are many ways of performing feature scaling. The two most efficient techniques in the context are:-

1. Standardization

2. Normalization

Whenever standardization is performed, all values in the dataset will be converted into values ranging between -3 to +3. While in the case of normalization, all values will be converted into a range between -1 to +1.

There are few conditions on which technique to use and when. Usually, Normalization is used only when our dataset follows a normal distribution while standardization is a universal technique that can be used for any dataset irrespective of the distribution. Here we are going to use Standardization.

#Performing Feature Scaling from sklearn.preprocessing import StandardScaler sc = StandardScaler() X_train = sc.fit_transform(X_train) X_test = sc.transform(X_test)

Here we have used StandardScalar class from the sklearn library in order to perform standardization.

Now we have completed our feature engineering phase. We can now start with the creation of our artificial neural network from the next point onwards.

Initializing Artificial Neural Network

This is the very first step while creating ANN. Here we are going to create our ann object by using a certain class of Keras named Sequential.

#Initialising ANN ann = tf.keras.models.Sequential()

As a part of tensorflow 2.0, Keras is now integrated with tensorflow and is now considered as a sub-library of tensorflow. The Sequential class is a part of the models module of Keras library which is a part of the tensorflow library now.

Creating Hidden Layers

Once we initialize our ann, we are now going to create layers for the same. Here we are going to create a network that will have 2 hidden layers, 1 input layer, and 1 output layer. So, let’s create our very first hidden layer

 #Adding First Hidden Layer ann.add(tf.keras.layers.Dense(units=6,activation="relu"))

Here we have created our first hidden layer by using the Dense class which is part of the layers module. This class accepts 2 inputs:-

1. units:- number of neurons that will be present in the respective layer

2. activation:- specify which activation function to be used

For the first input, I had tested with many values in the past and the optimal value that I had found is 6. Obviously, we can try with any other value as there is no hard rule about the number of neurons that should be present in the layer.

For the second input, we are always going to use “relu”[rectified linear unit] as an activation function for hidden layers. Since we are going to create two hidden layers, this same step we are going to repeat for the creation of the second hidden layer as well.

 #Adding Second Hidden Layer ann.add(tf.keras.layers.Dense(units=6,activation="relu")) Creating Output Layer

In this step, we are going to create our output layer for ann. The output layer will be responsible for giving output.

 #Adding Output Layer ann.add(tf.keras.layers.Dense(units=1,activation="sigmoid"))

Here again, we are going to use the Dense class in order to create the output layer. Two important things to remember here:-

1. In a binary classification problem(like this one) where we will be having only two classes as output (1 and 0), we will be allocating only one neuron to output this result. For the multiclass classification problem, we have to use more than one neuron in the output layer. For example – if our output contains 4 categories then we need to create 4 different neurons[one for each category].

2. For the binary classification Problems, the activation function that should always be used is sigmoid. For a multiclass classification problem, the activation function that should be used is softmax.

Here since we are dealing with binary classification hence we are allocating only one neuron in the output layer and the activation function which is used is softmax.

Compiling Artificial Neural Network

We have now created layers for our neural network. In this step, we are going to compile our ANN.

#Compiling ANN

We have used compile method of our ann object in order to compile our network. Compile method accepts the below inputs:-

1. optimizer:- specifies which optimizer to be used in order to perform stochastic gradient descent. I had experimented with various optimizers like RMSProp, adam and I have found that adam optimizer is a reliable one that can be used with any neural network.

2. loss:- specifies which loss function should be used. For binary classification, the value should be binary_crossentropy. For multiclass classification, it should be categorical_crossentropy.

3. metrics:- which performance metrics to be used in order to compute performance. Here we have used accuracy as a performance metric.

Fitting Artificial Neural Network

This is the last step in our ann creation process. Here we are just going to train our ann on the training dataset.

#Fitting ANN,Y_train,batch_size=32,epochs = 100)

Here we have used the fit method in order to train our ann. The fit method is accepting 4 inputs in this case:-

1.X_train:- Matrix of features for the training dataset

2.Y_train:- Dependent variable vectors for the training dataset

3.batch_size: how many observations should be there in the batch. Usually, the value for this parameter is 32 but we can experiment with any other value as well.

4. epochs: How many times neural networks will be trained. Here the optimal value that I have found from my experience is 100.

Are you interested to see how the training process looks like? Well here is the snap for the same.

Training of Artificial Neural Network

Here we can see that in each epoch our loss is decreasing and our accuracy is increasing. As we can see here that our final accuracy is 86.59 which is pretty remarkable for a neural network with this simplicity.

That’s it :). We have created our artificial neural network from scratch using Python.

As an additional bonus, I am attaching the code below that will allow us to perform single-point prediction for any custom values of input.

Predicting Result for Single Point Observation #Predicting result for Single Observation



Here our neural network is trying to predict whether our customer is going to exit or not based on the values of independent variables


Now we have created our model. I am giving you a pro tip now on how can you save your created neural network.

#Saving created neural network"ANN.h5")

That’s it. Using this one line of code allows us to save our ML model. You might have a query here?

What is the h5 file format? Well, h5 is a specific file format used by neural networks. Using this format we can directly save our neural network as a serialized object. It is similar to the pickle file format implementation that we use for storing traditional machine learning models.

Well, that’s all about implementing neural networks from scratch in Python.

If you’re an enthusiast who is looking forward to unravel the world of Generative AI. Then, please register for our upcoming event, DataHack Summit 2023.


Hope you like this article.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.


You're reading Implementing Artificial Neural Network(Classification) In Python From Scratch

Implementing Audio Classification Project Using Deep Learning

This article was published as a part of the Data Science Blogathon.

where we will be exploring learnings for audio and sound classification using Machine learning and deep learning. It is amazing and interesting to know – how machines are capable of understanding human language, and responding in the same way. NLP(Natural Language Processing) is one of the most researched and studied topics of today’s generation, it helps to make machines capable of handling human language in the form of speech as well as text.

Image source – Created using Canva

Table of Contents

Introduction to Audio Classification

Project Overview

Dataset Overview

Hands-on Implementing Audio Classification project

EDA On Audio Data

Data Preprocessing

Building ANN for Audio Classification

Testing some unknown Audio

End Notes

Introduction to Audio Classification

Audio Classification means categorizing certain sounds in some categories, like environmental sound classification and speech recognition. The task we perform same as in Image classification of cat and dog, Text classification of spam and ham. It is the same applied in audio classification. The only difference is the type of data where we have images, text, and now we have a certain type of audio file of a certain length.

Why Audio Classification is termed as difficult than other types of classification?

There are many techniques to classify images as we have different in-built neural networks under CNN, especially to deal with images. And it is easy to extract features from images because images already come in the form of numbers, as the formation of an image is a collection of pixels, and pixels are in the form of numbers. When we have data as text, we use the sequential encoder and decoder-based techniques to find features. But when the buzzing is about recognizing speech, it becomes difficult to compare it to text because it is based on frequency and time. So you need to extract proper pitch and frequency.

Audio classification employs in industries across different domains like voice lock features, music genre identification, Natural Language classification, Environment sound classification, and to capture and identify different types of sound. It is used in chatbots to provide chatbots with the next level of power.

Project Overview

Sound classification is a growing area of research that everyone is trying to learn and implement on some kinds of projects. The project we will build in this article is simply such that a beginner can easily follow – where the problem statement to apply the deep learning techniques to classify environmental sounds, specifically focusing on identifying the urban sounds.

Given an audio sample of some category with a certain duration in .wav extension and determine whether it contains target urban sounds. It lies under the supervised machine learning category, so we have a dataset as well as a target category.

Dataset Overview

The dataset we will use is called as Urban Sound 8k dataset. The dataset contains 8732 sound files of 10 different classes and is listed below. Our task is to extract different features from these files and classify the corresponding audio files into respective categories. You can download the dataset from the official website from here, and it is also available on Kaggle. The size of the dataset is a little bit large, so if it’s not possible to download, then you can create Kaggle Notebook and can practice it on Kaggle itself.

Air Conditioner

Car Horn

Children Playing

Dog Bark

Drilling Machine

Engine Idling

Gun Shot



Street Music

Hands-On Practice of Audio Classification Project Libraries Installation

The very important and great library that supports audio and music analysis is Librosa. Simply use the Pip command to install the library. It provides building blocks that are required to construct an information retrieval model from music. Another great library we will use is for deep learning modeling purposes is TensorFlow, and I hope everyone has already installed TensorFlow.

pip install librosa pip install tensorflow Exploratory Data Analysis of Audio data

We have 10 different folders under the urban dataset folder. Before applying any preprocessing, we will try to understand how to load audio files and how to visualize them in form of the waveform. If you want to load the audio file and listen to it, then you can use the IPython library and directly give it an audio file path. We have taken the first audio file in the fold 1 folder that belongs to the dog bark category.

import IPython.display as ipd filepath = "../input/urbansound8k/fold1/101415-3-0-2.wav" ipd.Audio(filepath)

Image Source – screenshot by Author

Now we will use Librosa to load audio data. So when we load any audio file with Librosa, it gives us 2 things. One is sample rate, and the other is a two-dimensional array. Let us load the above audio file with Librosa and plot the waveform using Librosa.

2-D Array – The first axis represents recorded samples of amplitude. And the second axis represents the number of channels. There are different types of channels – Monophonic(audio that has one channel) and stereo(audio that has two channels).

import librosa import librosa.display data, sample_rate = librosa.load(filepath) plt.figure(figsize=(12, 5)) librosa.display.waveshow(data, sr=sample_rate)

Source – Author

As we read, if you try to print the sample rate, then it’s output will be 22050 because when we load the data with librosa, then it normalizes the entire data and tries to give it in a single sample rate. The same we can achieve using scipy python library also. It will also give us two pieces of information – one is sample rate, and the other is data.

from chúng tôi import wavfile as wav

wave_sample_rate, wave_audio =



import matplotlib.pyplot as plt

plt.figure(figsize=(12, 4))


When you print the sample rate using scipy-it is different than librosa. Now let us visualize the wave audio data. One important thing to understand between both is- when we print the data retrieved from librosa, it can be normalized, but when we try to read an audio file using scipy, it can’t be normalized. Librosa is now getting popular for audio signal processing because of the following three reasons.

It tries to converge the signal into mono(one channel).

It can represent the audio signal between -1 to +1(in normalized form), so a regular pattern is observed.

It is also able to see the sample rate, and by default, it converts it to 22 kHz, while in the case of other libraries, we see it according to a different value.

Imbalance Dataset check

Now we know about the audio files and how to visualize them in audio format. Moving format to data exploration we will load the CSV data file provided for each audio file and check how many records we have for each class.

import pandas as pd metadata = pd.read_csv('/urbansound8k/UrbanSound8K.csv') metadata.head(10)

Source – Author

The data we have is a filename and where it is present so let us explore 1st file, so it is present in fold 5 with category as a dog bark. Now use the value counts function to check records of each class.


When you see the output so data is not imbalanced, and most of the classes have an approximately equal number of records. We can also visualize the count of records in each category using a bar plot or count plot.

import seaborn as sns plt.figure(figsize=(10, 6)) sns.countplot(metadata['class']) plt.title("Count of records in each class") plt.xticks(rotation="vertical")

Image Source – Code Output

Data Preprocessing

Some audios are getting recorded at a different rate-like 44KHz or 22KHz. Using librosa, it will be at 22KHz, and then, we can see the data in a normalized pattern. Now, our task is to extract some important information, and keep our data in the form of independent(Extracted features from the audio signal) and dependent features(class labels). We will use Mel Frequency Cepstral coefficients to extract independent features from audio signals.

MFCCs – The MFCC summarizes the frequency distribution across the window size. So, it is possible to analyze both the frequency and time characteristics of the sound. This audio representation will allow us to identify features for classification. So, it will try to convert audio into some kind of features based on time and frequency characteristics that will help us to do classification. To know and read more about MFCC, you can watch this video and can also read this research paper by springer.

To demonstrate how we apply MFCC in practice, first, we will apply it on a single audio file that we are already using.

mfccs = librosa.feature.mfcc(y=data, sr=sample_rate, n_mfcc=40) print(mfccs.shape) print(mfccs) def features_extractor(file): #load the file (audio) audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast') #we extract mfcc mfccs_features = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40) #in order to find out scaled feature we do mean of transpose of value mfccs_scaled_features = np.mean(mfccs_features.T,axis=0) return mfccs_scaled_features

👉 Now, to extract all the features for each audio file, we have to use a loop over each row in the dataframe. We also use the TQDM python library to track the progress. Inside the loop, we’ll prepare a customized file path for each file and call the function to extract MFCC features and append features and corresponding labels in a newly formed dataframe.

#Now we ned to extract the featured from all the audio files so we use tqdm import numpy as np from tqdm import tqdm ### Now we iterate through every audio file and extract features ### using Mel-Frequency Cepstral Coefficients extracted_features=[] for index_num,row in tqdm(metadata.iterrows()): file_name = os.path.join(os.path.abspath(audio_dataset_path),'fold'+str(row["fold"])+'/',str(row["slice_file_name"])) final_class_labels=row["class"] data=features_extractor(file_name) extracted_features.append([data,final_class_labels])

The loop will take a little bit of time to run because it will iterate over 8000 rows, and after that, you can observe the dataframe of extracted features as shown below.

### converting extracted_features to Pandas dataframe extracted_features_df=pd.DataFrame(extracted_features,columns=['feature','class']) extracted_features_df.head()

Image source – Screenshot by Author

Train Test split

First, we split the dependent and independent features. After that, we have 10 classes, so we use label encoding(Integer label encoding) from number 1 to 10 and convert it into categories. After that, we split the data into train and test sets in an 80-20 ratio.

### Split the dataset into independent and dependent dataset X=np.array(extracted_features_df['feature'].tolist()) y=np.array(extracted_features_df['class'].tolist()) from tensorflow.keras.utils import to_categorical from sklearn.preprocessing import LabelEncoder labelencoder=LabelEncoder() y=to_categorical(labelencoder.fit_transform(y)) ### Train Test Split from sklearn.model_selection import train_test_split X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=0)

Totally, we have 6985 records in the train set and 1747 samples in the test set. Let’s head over to Model creation.

Audio Classification Model Creation

We have extracted features from the audio sample and splitter in the train and test set. Now we will implement an ANN model using Keras sequential API. The number of classes is 10, which is our output shape(number of classes), and we will create ANN with 3 dense layers and architecture is explained below.

The first layer has 100 neurons. Input shape is 40 according to the number of features with activation function as Relu, and to avoid any overfitting, we’ll use the Dropout layer at a rate of 0.5.

The second layer has 200 neurons with activation function as Relu and the drop out at a rate of 0.5.

The third layer again has 100 neurons with activation as Relu and the drop out at a rate of 0.5.

from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense,Dropout,Activation,Flatten from tensorflow.keras.optimizers import Adam from sklearn import metrics ### No of classes num_labels=y.shape[1] model=Sequential() ###first layer model.add(Dense(100,input_shape=(40,))) model.add(Activation('relu')) model.add(Dropout(0.5)) ###second layer model.add(Dense(200)) model.add(Activation('relu')) model.add(Dropout(0.5)) ###third layer model.add(Dense(100)) model.add(Activation('relu')) model.add(Dropout(0.5)) ###final layer model.add(Dense(num_labels)) model.add(Activation('softmax'))

👉 You can observe the model summary using the summary function.

Image Source – Screenshot by Author

Compile the Model

To compile the model we need to define loss function which is categorical cross-entropy, accuracy metrics which is accuracy score, and an optimizer which is Adam.

Train the Model

We will train the model and save the model in HDF5 format. We will train a model for 100 epochs and batch size as 32. We’ll use callback, which is a checkpoint to know how much time it took to train over data.

## Trianing my model from tensorflow.keras.callbacks import ModelCheckpoint from datetime import datetime num_epochs = 100 num_batch_size = 32 checkpointer = ModelCheckpoint(filepath='./audio_classification.hdf5', verbose=1, save_best_only=True) start =, y_train, batch_size=num_batch_size, epochs=num_epochs, validation_data=(X_test, y_test), callbacks=[checkpointer], verbose=1) duration = - start print("Training completed in time: ", duration)

Check the Test Accuracy

Now we will evaluate the model on test data. we got near about 77 percent accuracy on the training dataset and 76 percent on test data.

test_accuracy=model.evaluate(X_test,y_test,verbose=0) print(test_accuracy[1])

If you are using the TensorFlow version below 2.6, then you can use predict classes function to predict the corresponding class for each audio file. But, if you are using 2.6 and above, then you can use predict and argument maximum function.

#model.predict_classes(X_test) predict_x=model.predict(X_test) classes_x=np.argmax(predict_x,axis=1) print(classes_x) Testing Some Test Audio Sample

Now it is time to test some random audio samples. Whenever we’ll get new audio, we have to perform three steps again to get the predicted label and class.

First, preprocess the audio file (load it using Librosa and extract MFCC features)

Predict the label to which audio belongs.

An inverse transforms the predicted label to get the respective class name to which it belongs.

filename="../input/urbansound8k/fold7/101848-9-0-0.wav" #preprocess the audio file audio, sample_rate = librosa.load(filename, res_type='kaiser_fast') mfccs_features = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40) mfccs_scaled_features = np.mean(mfccs_features.T,axis=0) #Reshape MFCC feature to 2-D array mfccs_scaled_features=mfccs_scaled_features.reshape(1,-1) #predicted_label=model.predict_classes(mfccs_scaled_features) x_predict=model.predict(mfccs_scaled_features) predicted_label=np.argmax(x_predict,axis=1) print(predicted_label) prediction_class = labelencode _transform(predicted_label) print(prediction_class)

Congratulations! 👏 If you have followed the article till here and have tried to implement it along with the reading. Then, you have learned how to deal with the audio data and use MFCC to extract important features from audio samples and build a simple ANN model on top of it to classify the audio in a different class. To practice this project, I am providing you with the environmental sound classification dataset present over Kaggle and trying to implement this project for practice purposes. The learning of this article can be concluded in some bullet points as discussed below.

Audio classification is a technique to classify sounds into different categories.

We can visualize any audio in the form of a waveform.

MFCC method is used to extract important features from audio files.

Scaling the audio samples to a common scale is important before feeding data to the model to understand it better.

You can build a CNN model to classify audios. And as well as try to build a much deeper ANN than we have built.

👉 The complete Notebook implementation is available here.

👉 Connect with me on Linkedin.

👉 Check out my other articles on Analytics Vidhya and crazy-techie

Thanks for giving your time!

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion. 


Food Delivery Time Prediction With Lstm Neural Network


More businesses are moving online these days, and consumers are ordering online instead of traveling to the store to buy. Zomato and Swiggy are popular online platforms for ordering food products. Other examples are Uber Eats, Food Panda, and Deliveroo, which also have similar services. They provide food delivery options. If the order is complete, a partner will pick up and deliver the meal to the given address via a delivery service. In online food-ordering businesses, delivery time is critical. As a result, estimated food delivery time prediction to reach the buyer’s location is critical. The LSTM neural network is one of the methods that may be implemented in this circumstance. Come on, let’s study the LSTM models in detail.

This article was published as a part of the Data Science Blogathon.

Table of Contents Objectives of Food Delivery Time Prediction

Make an accurate estimate of when the food will arrive, thus increasing customer confidence.

Plan delivery routes and driver schedules more efficiently by predicting how many orders will arrive so that delivery providers can use their resources better.

Make deliveries faster by looking at past delivery data and determining the attributes that affect them.

Grow business because of buyer satisfaction with the speed of delivery.

Based on these goals, we will use the LSTM Neural Network to develop a model that can estimate the delivery time of orders accurately based on the age of the delivery partner, the partner’s rating, and the distance between the restaurant and the buyer’s place. This article will guide you on predicting food delivery time using LSTM. Now, let’s make the prediction through the steps in the article.

Step 1: Import Library import pandas as pd import numpy as np import as px from sklearn.model_selection import train_test_split from keras.models import Sequential from keras.layers import Dense, LSTM

Pandas and NumPy libraries are used together for data analysis. NumPy provides fast mathematical functions for multidimensional arrays, while Pandas makes it easier to analyze and manipulate data with more complex data structures like DataFrame and Series. Meanwhile, the Plotly Express library makes it easy for users to create interactive visualizations in Python. It can use minimal code to create various charts, such as scatter plots, line charts, bar charts, and maps. The Sequential class is a type of model in Keras that allows users to create a neural network by adding layers to it in sequential order. Then, Dense and LSTM are to create layers in the Keras model and also customize their configurations.

Step 2: Read the Data

variables for the particular task at hand. And for this particular case, the appropriate dataset is on my github. The dataset given here is a cleaned version of the original dataset submitted by Gaurav Malik on Kaggle.

#reading dataset data = pd.read_csv(url) data.sample(5)

Let’s see detailed information about the dataset we use with the info() command.

dataset overview

Checking a dataset’s columns and null values is essential in any data analysis project. Let’s do it.


The dataset is complete with no null values, so let’s proceed!

Step 3: Haversine Formula

The Haversine formula is used to find the distance between two geographical locations. The formula refers to this Wikipedia page as follows:

It takes the latitude and longitude of two points and converts the angles to radians to perform the necessary calculations. We use this formula because the dataset doesn’t provide the distance between the restaurant and the delivery location. There are only latitude and longitude. So, let’s calculate it and then create a distance column in the dataset.

R = 6371 ##The earth's radius (in km) def deg_to_rad(degrees): return degrees * (np.pi/180) ## The haversine formula def distcalculate(lat1, lon1, lat2, lon2): d_lat = deg_to_rad(lat2-lat1) d_lon = deg_to_rad(lon2-lon1) a1 = np.sin(d_lat/2)**2 + np.cos(deg_to_rad(lat1)) a2 = np.cos(deg_to_rad(lat2)) * np.sin(d_lon/2)**2 a = a1 * a2 c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1-a)) return R * c # Create distance column & calculate the distance data['distance'] = np.nan for i in range(len(data)): data.loc[i, 'distance'] = distcalculate(data.loc[i, 'Restaurant_latitude'], data.loc[i, 'Restaurant_longitude'], data.loc[i, 'Delivery_location_latitude'], data.loc[i, 'Delivery_location_longitude'])

The parameter “lat” means latitude, and “lon” means longitude. The deg_to_rad function is helpful for converting degrees to radians. At the same time, calculate the distance between two location points using the variables a1 and a2. The variable stores the result of multiplying a1 and a2, while the c variable stores the result of the Haversine formula calculation, which produces the distance between the two location points.

We have added a distance column to the dataset. Now, we will analyze the effect of distance and delivery time.

figure = px.scatter(data_frame = data, x="distance", y="Time_taken(min)", size="Time_taken(min)", trendline="ols", title = "Relationship Between Time Taken and Distance")

The graph shows that there is a consistent relationship between the time taken and the distance traveled for food delivery. This means that the majority of delivery partners deliver food within a range of 25–30 minutes, regardless of the distance.

Next, we will explore whether the delivery partner’s age affects delivery time or not.

figure = px.scatter(data_frame = data, x="Delivery_person_Age", y="Time_taken(min)", size="Time_taken(min)", color = "distance", trendline="ols", title = "Relationship Between Delivery Partner Age and Time Taken")

The graph shows faster food delivery when partners are younger than their older counterparts. Now let’s explore the correlation between delivery time and delivery partner ratings.

figure = px.scatter(data_frame = data, x="Delivery_person_Ratings", y="Time_taken(min)", size="Time_taken(min)", color = "distance", trendline="ols", title = "Relationship Between Delivery Partner Ratings and Time Taken")

The graph shows an inverse linear relationship. The higher the rating partner, the faster the time needed to deliver food, and vice versa.

The next step will be to see whether the delivery partner’s vehicle affects the delivery time or not.

fig =, x="Type_of_vehicle", y="Time_taken(min)", color="Type_of_order", title = "Relationship Between Type of Vehicle and Type of Order")

The graph shows that the type of delivery partner’s vehicle and the type of food delivered do not significantly affect delivery time.

Through the analysis above, we can determine that the delivery partner’s age, the delivery partner’s rating, and the distance between the restaurant and the delivery location are the features that have the most significant impact on food delivery time.

Step 4: Build an LSTM Model and Make Predictions

Previously, we have determined three features that significantly affect the time taken, namely the delivery partner’s age, the delivery partner’s rating, and distance. So the three features will become independent variables (x), while the time taken will become the dependent variable (y).

x = np.array(data[["Delivery_person_Age", "Delivery_person_Ratings", "distance"]]) y = np.array(data[["Time_taken(min)"]]) xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.20, random_state=33)

Now, we need to train an LSTM neural network to predict food delivery time. The aim is to create a precise model that uses features like distance, delivery partner age, and rating to estimate food delivery time. The trained model can then be used to predict new data points or unseen scenarios.

model = Sequential() model.add(LSTM(128, return_sequences=True, input_shape= (xtrain.shape[1], 1))) model.add(LSTM(64, return_sequences=False)) model.add(Dense(25)) model.add(Dense(1)) model.summary()

The code block above explains:

The first line starts building the model architecture by creating an instance of the Sequential class. The following three lines define the layers of the model. The first layer is an LSTM layer with 128 units, which returns sequences and takes input for shape (xtrain.shape[1], 1). Here, xtrain is the input training data, and shape[1] represents the number of features in the input data. The return_sequences parameter is set to True because there will be more layers after this one. The second layer is also an LSTM layer, but with 64 units and return_sequences set to False, indicating that this is the last layer. The third line adds a dense layer with 25 units, which reduces the output of the LSTM layers to a more manageable size. Finally, the fourth line adds a dense layer with one unit, which is the output layer of the model.

Now let’s train the previously created model., ytrain, batch_size=1, epochs=9)

The ‘adam’ parameter is a popular optimization algorithm for deep learning models, and the ‘mean_squared_error’ parameter is a common loss function used in regression problems. The parameter batch_size = 1 means that the model will update its weights after each sample is processed during training. The epochs parameter is set to 9, meaning the model will be trained on the entire dataset for nine iterations.

Finally, let’s test the model’s performance for predicting food delivery times given three input parameters (delivery partner age, delivery rating, and distance).

print("Food Delivery Time Prediction using LSTM") a = int(input("Delivery Partner Age: ")) b = float(input("Previous Delivery Ratings: ")) c = int(input("Total Distance: ")) features = np.array([[a, b, c]]) print("Delivery Time Prediction in Minutes = ", model.predict(features))

The given result is a prediction of the delivery time for a hypothetical food delivery order based on the trained LSTM neural network model using the following input features:

Delivery Partner’s Age: 33

Previous Delivery Ratings: 4.0

Total distance: 7

The output of the prediction is shown as “Delivery Time Prediction in Minutes = [[36.913715]],” which means that the model has estimated that the food delivery will take approximately 36.91 minutes to reach the destination.


This article starts by calculating the distance between the restaurant and the delivery location. Then, it analyzes previous delivery times for the same distance before predicting food delivery times in real-time using LSTM. Broadly speaking, in this post, we have discussed the following:

How to calculate the distance using the haversine formula?

How to find the features that affect the food delivery time prediction?

How to use LSTM neural network model to predict the food delivery time?

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.


Clarifying Image Recognition Vs. Classification In 2023

In this article, we’ll delve deep into image recognition and image classification, highlighting their differences and how they relate to each other. By understanding these concepts, you’ll be better equipped to leverage their potential in various areas of your business.

Image recognition: Turning pixels into meaningful information What is image recognition?

Image recognition is the process of analyzing images or video clips to identify and detect visual features such as objects, people, and places. This is achieved by using sophisticated algorithms and models that analyze and compare the visual data against a database of pre-existing patterns and features.

 Image recognition is a complex and multi-disciplinary field that combines computer vision, artificial intelligence, and machine learning techniques to perform tasks such as facial recognition, object detection, and scene analysis.

Image recognition can be used in the field of security to identify individuals from a database of known faces in real time, allowing for enhanced surveillance and monitoring. It can also be used in the field of healthcare to detect early signs of diseases from medical images, such as CT scans or MRIs, and assist doctors in making a more accurate diagnosis.

Source: GitHub

Image classification: Sorting images into categories What is image classification?

Image classification is a subfield of image recognition that involves categorizing images into pre-defined classes or categories. In other words, it is the process of assigning labels or tags to images based on their content. Image classification is a fundamental task in computer vision, and it is often used in applications such as object recognition, image search, and content-based image retrieval.

A sector where image classification is commonly used is e-commerce. It’s used to classify product images into different categories, such as clothing, electronics, and home appliances, making it easier for customers to find what they are looking for. It can also be used in the field of self-driving cars to identify and classify different types of objects, such as pedestrians, traffic signs, and other vehicles.

Image recognition vs. Image classification: Main differences

While image recognition and image classification are related, they have notable differences that make them suitable for distinct applications.

1. Object detection vs. categorization

Image recognition focuses on identifying and locating specific objects or patterns within an image, whereas image classification assigns an image to a category based on its content. In essence, image recognition is about detecting objects, while image classification is about categorizing images.

2. Use cases and applications

Image recognition is ideal for applications requiring the identification and localization of objects, such as autonomous vehicles, security systems, and facial recognition. Image classification, however, is more suitable for tasks that involve sorting images into categories, like organizing photos, diagnosing medical conditions from images, or analyzing satellite images.

3. Complexity and processing time

Image recognition is generally more complex than image classification, as it involves detecting multiple objects and their locations within an image. This can lead to increased processing time and computational requirements. Image classification, on the other hand, focuses solely on assigning images to categories, making it a simpler and often faster process.

Table 1. Main difference How image recognition and image classification are related?

Despite their differences, image recognition and image classification share some common ground:

1. Techniques and technologies

Both processes use similar techniques and technologies, such as machine learning algorithms and deep learning models like convolutional neural networks (CNNs). These methods can be adapted for either image recognition or classification tasks, depending on the specific application.

2. Interdependence in applications

In some applications, image recognition and image classification are combined to achieve more sophisticated results.

For instance, an autonomous vehicle may use image recognition to detect and locate pedestrians, traffic signs, and other vehicles and then use image classification to categorize these detected objects. This combination of techniques allows for a more comprehensive understanding of the vehicle’s surroundings, enhancing its ability to navigate safely.

3. Feature extraction and analysis

Both image recognition and image classification involve the extraction and analysis of image features. These features, such as edges, textures, and colors, help the algorithms differentiate between objects and categories.

In both cases, the quality of the images and the relevance of the features extracted are crucial for accurate results.

Real-world applications of image recognition and classification

To further clarify the differences and relationships between image recognition and image classification, let’s explore some real-world applications.

1. Medical imaging

Medical imaging is a popular field where both image recognition and classification have significant applications. Image recognition is used to detect and localize specific structures, abnormalities, or features within medical images, such as X-rays, MRIs, or CT scans. This helps medical professionals diagnose and treat various conditions.

Image classification, on the other hand, can be used to categorize medical images based on the presence or absence of specific features or conditions, aiding in the screening and diagnosis process. For instance, an automated image classification system can separate medical images with cancerous matter from ones without any.

To learn more about AI-powered medical imagining, check out this quick read.

2. Security

Image recognition and classification are critical tools in the security industry that enable the detection and tracking of potential threats. Automated image recognition solutions match real-time surveillance images with pre-existing data to identify individuals of interest, while image classification solutions categorize and tag objects in surveillance footage. 

Facial recognition is a specific form of image recognition that helps identify individuals in public areas and secure areas. These tools provide improved situational awareness and enable fast responses to security incidents.

If you wish to learn more about the use cases of computer vision in the security sector, check out this article.

3.  Environmental monitoring

Environmental monitoring and analysis often involve the use of satellite imagery, where both image recognition and classification can provide valuable insights. Image recognition can be used to detect and locate specific features, such as deforestation, water bodies, or urban development. 

Image classification, meanwhile, can be employed to categorize land cover types or identify areas affected by natural disasters or climate change. This information is crucial for decision-making, resource management, and environmental conservation efforts.

Watch this video to learn more about the marriage of satellite imaging and AI-powered image recognition/classification:

To learn more, check out these articles on:


While image recognition and image classification are related and often use similar techniques, they serve different purposes and have distinct applications. Understanding the differences between these two processes is essential for harnessing their potential in various areas. By leveraging the capabilities of image recognition and classification, businesses and organizations can gain valuable insights, improve efficiency, and make more informed decisions.

Further reading

If you need help finding a vendor or have any questions, feel free to contact us:

Shehmir Javaid

Shehmir Javaid is an industry analyst at AIMultiple. He has a background in logistics and supply chain management research and loves learning about innovative technology and sustainability. He completed his MSc in logistics and operations management from Cardiff University UK and Bachelor’s in international business administration From Cardiff Metropolitan University UK.





How To Remove An Object From A List In Python?

To remove items (elements) from a list in Python, use the list functions clear(), pop(), and remove(). You can also delete items with the del statement by specifying a position or range with an index or slice.

In this article, we will show you how to remove an object/element from a list using python. The following are the 4 different methods to accomplish this task −

Using remove() method

Using del keyword

Using pop() method

Using clear() method

Assume we have taken a list containing some elements. We will return a resultant list after removing the given item from an input list using different methods as specified above.

Method 1: Using remove() method

The remove() function deletes the given item passed as an argument to it. −

Syntax list.remove(element) Return Value:The remove() function will not return any value(returns None) Algorithm (Steps)

Following are the Algorithm/steps to be followed to perform the desired task

Create a variable to store the input list.

Use the remove() method to remove the particular item from the input list by passing the list item to be deleted as an argument to it and applying it to the input list.

Print the result list after removing the specified item from the input list


The following program removes the specified item from an input list using the remove() method and prints the resultant list −






















“Input List after removing {TutorialsPoint}:n”





On executing, the above program will generate the following output −

Input List after removing {TutorialsPoint}: ['Python', 'Codes', 'hello', 'everyone']

As an input to the code, we were given a sample list with some random values, as well as an object/element to remove from the list. The element was then passed to the remove() method, where it was deleted/removed from the list. If the object/element is not found in the list, a value error will be returned.

Method 2: Using del Keyword

The del statement is not a List function. The del statement can be used to delete items from the list by passing the index of the item (element) to be deleted

Algorithm (Steps)

Following are the Algorithm/steps to be followed to perform the desired task −

Create a variable to store the input list.

Use the del keyword, to remove the item present at the specified index (here 2nd index(Codes)) from the list.

Print the resultant list i,e after removing the specified item from the list.


The following program removes the specified item from an input list using the del keyword and prints the resultant list −





















“Input List after removing the item present at the 2nd index{Codes}:n”





On executing, the above program will generate the following output −

Input List after removing the item present at the 2nd index{Codes}: ['TutorialsPoint', 'Python', 'hello', 'everyone'] Method 3: Using pop() method

With pop() method, you can delete the element at the specified position and retrieve its value.

The starting value of the index is 0 (zero-based indexing).

If no index is specified, the pop() method removes the last element from the list.

Negative values can be used to represent the position from the end. The last index is -1.

Algorithm (Steps)

Following are the Algorithm/steps to be followed to perform the desired task −

Create a variable to store the input list.

Use the pop() method, bypassing the index of a list item as an argument the to it to remove the item at the given index

Pass the negative index values as an argument to the pop() method, to remove the list items from the last. Here for removing the last item from the list we passed the -1 index as an argument to the pop() method.

Print the resultant list i,e after removing the specified item from the list.


The following program removes the specified item from an input list using the pop() method and prints the resultant list −
































“Input List after removing {TutorialsPoint}, {everyone}:n”




Output Input List after removing {TutorialsPoint}, {everyone}: ['Python', 'Codes', 'hello'] Method 4: Using clear() method

The clear() method removes all items from the list. The list is still there, but it is empty.

Algorithm (Steps)

Following are the Algorithm/steps to be followed to perform the desired task −

Create a variable to store the input list.

Use the clear() method, to remove all the items from the list.

Print the resultant list i,e after removing all the from the list.


The following program clears or empties the complete list using the clear() function −





















“Input List after removing all the items:”




Output Input List after removing all the items: [] Conclusion

We learned how to remove an object/element from a list using four different methods in this article: remove(), pop(), clear(), and the del keyword. We also learned about the errors that those methods generate if the element is not in the list.

An Introduction To Graph Theory And Network Analysis (With Python Codes)


“A picture speaks a thousand words” is one of the most commonly used phrases. But a graph speaks so much more than that. A visual representation of data, in the form of graphs, helps us gain actionable insights and make better data driven decisions based on them.

But to truly understand what graphs are and why they are used, we will need to understand a concept known as Graph Theory. Understanding this concept makes us better programmers (and better data science professionals!).

But if you have tried to understand this concept before, you’ll have come across tons of formulae and dry theoretical concepts. That is why we decided to write this blog post. We have explained the concepts and then provided illustrations so you can follow along and intuitively understand how the functions are performing. This is a detailed post, because we believe that providing a proper explanation of this concept is a much preferred option over succinct definitions.

In this article, we will look at what graphs are, their applications and a bit of history about them. We’ll also cover some Graph Theory concepts and then take up a case study using python to cement our understanding.

Ready? Let’s dive into it.

Table of Contents

Graphs and their applications

History and why graphs?

Terminologies you need to know

Graph Theory Concepts

Getting familiar with Graphs in python

Analysis on a dataset

Graphs and their applications

Let us look at a simple graph to understand the concept. Look at the image below –

Consider that this graph represents the places in a city that people generally visit, and the path that was followed by a visitor of that city. Let us consider V as the places and E as the path to travel from one place to another.

V = {v1, v2, v3, v4, v5} E = {(v1,v2), (v2,v5), (v5, v5), (v4,v5), (v4,v4)}

The edge (u,v) is the same as the edge (v,u) – They are unordered pairs.

Concretely – Graphs are mathematical structures used to study pairwise relationships between objects and entities. It is a branch of Discrete Mathematics and has found multiple applications in Computer Science, Chemistry, Linguistics, Operations Research, Sociology etc.

The Data Science and Analytics field has also used Graphs to model various structures and problems. As a Data Scientist, you should be able to solve problems in an efficient manner and Graphs provide a mechanism to do that in cases where the data is arranged in a specific way.


A Graph is a pair of sets. G = (V,E). V is the set of vertices. E is a set of edges. E is made up of pairs of elements from V (unordered pair)

A DiGraph is also a pair of sets. D = (V,A). V is the set of vertices. A is the set of arcs. A is made up of pairs of elements from V (ordered pair)

In the case of digraphs, there is a distinction between `(u,v)` and `(v,u)`. Usually the edges are called arcs in such cases to indicate a notion of direction.

There are packages that exist in R and Python to analyze data using Graph theory concepts. In this article we will be briefly looking at some of the concepts and analyze a dataset using Networkx Python package.

from IPython.display import Image Image('images/network.PNG')


From the above examples it is clear that the applications of Graphs in Data Analytics are numerous and vast. Let us look at a few use cases:

Marketing Analytics – Graphs can be used to figure out the most influential people in a Social Network. Advertisers and Marketers can estimate the biggest bang for the marketing buck by routing their message through the most influential people in a Social Network

Banking Transactions – Graphs can be used to find unusual patterns helping in mitigating Fraudulent transactions. There have been examples where Terrorist activity has been detected by analyzing the flow of money across interconnected Banking networks

Supply Chain – Graphs help in identifying optimum routes for your delivery trucks and in identifying locations for warehouses and delivery centres

Pharma – Pharma companies can optimize the routes of the salesman using Graph theory. This helps in cutting costs and reducing the travel time for salesman

Telecom – Telecom companies typically use Graphs (Voronoi diagrams) to understand the quantity and location of Cell towers to ensure maximum coverage

History and Why Graphs? History of Graphs

If you want to know more on how the ideas from graph has been formlated – read on!

The origin of the theory can be traced back to the Konigsberg bridge problem (circa 1730s). The problem asks if the seven bridges in the city of Konigsberg can be traversed under the following constraints

no doubling back

you end at the same place you started

This is the same as asking if the multigraph of 4 nodes and 7 edges has an Eulerian cycle (An Eulerian cycle is an Eulerian path that starts and ends on the same Vertex. And an Eulerian path is a path in a Graph that traverses each edge exactly once. More Terminology is given below). This problem led to the concept of Eulerian Graph. In the case of the Konigsberg bridge problem the answer is no and it was first answered by (you guessed it) Euler.

In 1840, A.F Mobius gave the idea of complete graph and bipartite graph and Kuratowski proved that they are planar by means of recreational problems. The concept of tree, (a connected graph without cycles) was implemented by Gustav Kirchhoff in 1845, and he employed graph theoretical ideas in the calculation of currents in electrical networks or circuits.

In 1852, Thomas Gutherie found the famous four color problem. Then in 1856, Thomas. P. Kirkman and William R.Hamilton studied cycles on polyhydra and invented the concept called Hamiltonian graph by studying trips that visited certain sites exactly once. In 1913, H.Dudeney mentioned a puzzle problem. Eventhough the four color problem was invented it was solved only after a century by Kenneth Appel and Wolfgang Haken. This time is considered as the birth of Graph Theory.

Caley studied particular analytical forms from differential calculus to study the trees. This had many implications in theoretical chemistry. This lead to the invention of enumerative graph theory. Any how the term “Graph” was introduced by Sylvester in 1878 where he drew an analogy between “Quantic invariants” and covariants of algebra and molecular diagrams.

In 1941, Ramsey worked on colorations which lead to the identification of another branch of graph theory called extremel graph theory. In 1969, the four color problem was solved using computers by Heinrich. The study of asymptotic graph connectivity gave rise to random graph theory. The histories of Graph Theory and Topology are also closely related. They share many common concepts and theorems.

Image('images/Konigsberg.PNG', width = 800)

Why Graphs?

Here are a few points that help you motivate to use graphs in your day-to-day data science problems –

Graphs provide a better way of dealing with abstract concepts like relationships and interactions. They also offer an intuitively visual way of thinking about these concepts. Graphs also form a natural basis for analyzing relationships in a Social context

Graph Databases have become common computational tools and alternatives to SQL and NoSQL databases

Graphs are used to model analytics workflows in the form of DAGs (Directed acyclic graphs)

Some Neural Network Frameworks also use DAGs to model the various operations in different layers

Graph Theory concepts are used to study and model Social Networks, Fraud patterns, Power consumption patterns, Virality and Influence in Social Media. Social Network Analysis (SNA) is probably the best known application of Graph Theory for Data Science

It is used in Clustering algorithms – Specifically K-Means

System Dynamics also uses some Graph Theory concepts – Specifically loops

Path Optimization is a subset of the Optimization problem that also uses Graph concepts

From a Computer Science perspective – Graphs offer computational efficiency. The Big O complexity for some algorithms is better for data arranged in the form of Graphs (compared to tabular data)

Terminology you should know

Before you go any further into the article, it is recommended that you should get familiar with these terminologies.

The vertices u and v are called the end vertices of the edge (u,v)

If two edges have the same end vertices they are Parallel

An edge of the form (v,v) is a loop

A Graph is simple if it has no parallel edges and loops

A Graph is said to be Empty if it has no edges. Meaning E is empty

A Graph is a Null Graph if it has no vertices. Meaning V and E is empty

A Graph with only 1 Vertex is a Trivial graph

Edges are Adjacent if they have a common vertex. Vertices are Adjacent if they have a common edge

The degree of the vertex v, written as d(v), is the number of edges with v as an end vertex. By convention, we count a loop twice and parallel edges contribute separately

Isolated Vertices are vertices with degree 1. d(1) vertices are isolated

A Graph is Complete if its edge set contains every possible edge between ALL of the vertices

A Walk in a Graph G = (V,E) is a finite, alternating sequence of the form 






 consisting of vertices and edges of the graph G

A Walk is Open if the initial and final vertices are different. A Walk is Closed if the initial and final vertices are the same

A Walk is a Trail if ANY edge is traversed atmost once

A Trail is a Path if ANY vertex is traversed atmost once (Except for a closed walk)

A Closed Path is a Circuit – Analogous to electrical circuits

Graph Theory concepts

In this section, we’ll look at some of the concepts useful for Data Analysis (in no particular order). Please note that there are a lot more concepts that require a depth which is out of scope of this article. So let’s get into it.

Average Path Length

The average of the shortest path lengths for all possible node pairs. Gives a measure of ‘tightness’ of the Graph and can be used to understand how quickly/easily something flows in this Network.


Breadth first search and Depth first search are two different algorithms used to search for Nodes in a Graph. They are typically used to figure out if we can reach a Node from a given Node. This is also known as Graph Traversal

The aim of the BFS is to traverse the Graph as close as possible to the root Node, while the DFS algorithm aims to move as far as possible away from the root node.


One of the most widely used and important conceptual tools for analysing networks. Centrality aims to find the most important nodes in a network. There may be different notions of “important” and hence there are many centrality measures. Centrality measures themselves have a form of classification (or Types of centrality measures). There are measures that are characterized by flow along the edges and those that are characterized by Walk Structure.

Some of the most commonly used ones are:

Degree Centrality – The first and conceptually the simplest Centrality definition. This is the number of edges connected to a node. In the case of a directed graph, we can have 2 degree centrality measures. Inflow and Outflow Centrality

Closeness Centrality – Of a node is the average length of the shortest path from the node to all other nodes

Betweenness Centrality – Number of times a node is present in the shortest path between 2 other nodes

These centrality measures have variants and the definitions can be implemented using various algorithms. All in all, this means a large number of definitions and algorithms.

Network Density

A measure of how many edges a Graph has. The actual definition will vary depending on type of Graph and the context in which the question is asked. For a complete undirected Graph the Density is 1, while it is 0 for an empty Graph. Graph Density can be greater than 1 in some situations (involving loops).

Graph Randomizations

While the definitions of some Graph metrics maybe easy to calculate, it is not easy to understand their relative importance. We use Network/Graph Randomizations in such cases. We calculate the metric for the Graph at hand and for another similar Graph that is randomly generated. This similarity can for example be the same number of density and nodes. Typically we generate a 1000 similar random graphs and calculate the Graph metric for each of them and then compare it with the same metric for the Graph at hand to arrive at some notion of a benchmark.

In Data Science when trying to make a claim about a Graph it helps if it is contrasted with some randomly generated Graphs.

Getting Familiar with Graphs in python

We will be using the networkx package in Python. It can be installed in the Root environment of Anaconda (if you are using the Anaconda distribution of Python). You can also pip install it.

Let us look at some common things that can be done with the Networkx package. These include importing and creating a Graph and ways to visualize it.

Graph Creation import networkx as nx # Creating a Graph G = nx.Graph() # Right now G is empty # Add a node G.add_node(1) G.add_nodes_from([2,3]) # You can also add a list of nodes by passing a list argument # Add edges G.add_edge(1,2) e = (2,3) G.add_edge(*e) # * unpacks the tuple G.add_edges_from([(1,2), (1,3)]) # Just like nodes we can add edges from a list

Node and Edge attributes can be added along with the creation of Nodes and Edges by passing a tuple containing node and attribute dict.

In addition to constructing graphs node-by-node or edge-by-edge, they can also be generated by applying classic graph operations, such as:

subgraph(G, nbunch) - induced subgraph view of G on nodes in nbunch union(G1,G2) - graph union disjoint_union(G1,G2) - graph union assuming all nodes are different cartesian_product(G1,G2) - return Cartesian product graph compose(G1,G2) - combine graphs identifying nodes common to both complement(G) - graph complement create_empty_copy(G) - return an empty copy of the same graph class convert_to_undirected(G) - return an undirected representation of G convert_to_directed(G) - return a directed representation of G

Separate classes exist for different types of Graphs. For example the nx.DiGraph() class allows you to create a Directed Graph. Specific graphs containing paths can be created directly using a single method. For a full list of Graph creation methods please refer to the full documentation. Link is given at the end of the article.

Image('images/graphclasses.PNG', width = 400)

Accessing edges and nodes

Nodes and Edges can be accessed together using the G.nodes() and G.edges() methods. Individual nodes and edges can be accessed using the bracket/subscript notation.


NodeView((1, 2, 3))


EdgeView([(1, 2), (1, 3), (2, 3)])

G[1] # same as G.adj[1]

AtlasView({2: {}, 3: {}})



G.edges[1, 2]


Graph Visualization

Networkx provides basic functionality for visualizing graphs, but its main goal is to enable graph analysis rather than perform graph visualization. Graph visualization is hard and we will have to use specific tools dedicated for this task. Matplotlib offers some convenience functions. But GraphViz is probably the best tool for us as it offers a Python interface in the form of PyGraphViz (link to documentation below).

%matplotlib inline import matplotlib.pyplot as plt nx.draw(G)

import pygraphviz as pgv d={'1': {'2': None}, '2': {'1': None, '3': None}, '3': {'1': None}} A = pgv.AGraph(data=d) print(A) # This is the 'string' or simple representation of the Graph Output: strict graph "" { 1 -- 2; 2 -- 3; 3 -- 1; }

PyGraphviz provides great control over the individual attributes of the edges and nodes. We can get very beautiful visualizations using it.

# Let us create another Graph where we can individually control the colour of each node B = pgv.AGraph() # Setting node attributes that are common for all nodes B.node_attr['style']='filled' B.node_attr['shape']='circle' B.node_attr['fixedsize']='true' B.node_attr['fontcolor']='#FFFFFF' # Creating and setting node attributes that vary for each node (using a for loop) for i in range(16): B.add_edge(0,i) n=B.get_node(i) n.attr['fillcolor']="#%2x0000"%(i*16) n.attr['height']="%s"%(i/16.0+0.5) n.attr['width']="%s"%(i/16.0+0.5) B.draw('star.png',prog="circo") # This creates a .png file in the local directory. Displayed below. Image('images/star.png', width=650) # The Graph visualization we created above.

Usually, visualization is thought of as a separate task from Graph analysis. A graph once analyzed is exported as a Dotfile. This Dotfile is then visualized separately to illustrate a specific point we are trying to make.

Analysis on a Dataset

We will be looking to take a generic dataset (not one that is specifically intended to be used for Graphs) and do some manipulation (in pandas) so that it can be ingested into a Graph in the form of a edgelist. And edgelist is a list of tuples that contain the vertices defining every edge

The dataset we will be looking at comes from the Airlines Industry. It has some basic information on the Airline routes. There is a Source of a journey and a destination. There are also a few columns indicating arrival and departure times for each journey. As you can imagine this dataset lends itself beautifully to be analysed as a Graph. Imagine a few cities (nodes) connected by airline routes (edges). If you are an airline carrier, you can then proceed to ask a few questions like

What is the shortest way to get from A to B? In terms of distance and in terms of time

Is there a way to go from C to D?

Which airports have the heaviest traffic?

Which airport in “in between” most other airports? So that it can be converted into a local hub

import pandas as pd import numpy as np data = pd.read_csv('data/Airlines.csv') data.shape (100, 16) data.dtypes year int64 month int64 day int64 dep_time float64 sched_dep_time int64 dep_delay float64 arr_time float64 sched_arr_time int64 arr_delay float64 carrier object flight int64 tailnum object origin object dest object air_time float64 distance int64 dtype: object

We notice that origin and destination look like good choices for Nodes. Everything can then be imagined as either node or edge attributes. A single edge can be thought of as a journey. And such a journey will have various times, a flight number, an airplane tail number etc associated with it

We notice that the year, month, day and time information is spread over many columns. We want to create one datetime column containing all of this information. We also need to keep scheduled and actual time of arrival and departure separate. So we should finally have 4 datetime columns (Scheduled and actual times of arrival and departure)

Additionally, the time columns are not in a proper format. 4:30 pm is represented as 1630 instead of 16:30. There is no delimiter to split that column. One approach is to use pandas string methods and regular expressions

We should also note that sched_dep_time and sched_arr_time are int64 dtype and dep_time and arr_time are float64 dtype

An additional complication is NaN values

# converting sched_dep_time to 'std' - Scheduled time of departure data['std'] = data.sched_dep_time.astype(str).str.replace('(d{2}$)', '') + ':' + data.sched_dep_time.astype(str).str.extract('(d{2}$)', expand=False) + ':00' # converting sched_arr_time to 'sta' - Scheduled time of arrival data['sta'] = data.sched_arr_time.astype(str).str.replace('(d{2}$)', '') + ':' + data.sched_arr_time.astype(str).str.extract('(d{2}$)', expand=False) + ':00' # converting dep_time to 'atd' - Actual time of departure data['atd'] = data.dep_time.fillna(0).astype(np.int64).astype(str).str.replace('(d{2}$)', '') + ':' + data.dep_time.fillna(0).astype(np.int64).astype(str).str.extract('(d{2}$)', expand=False) + ':00' # converting arr_time to 'ata' - Actual time of arrival data['ata'] = data.arr_time.fillna(0).astype(np.int64).astype(str).str.replace('(d{2}$)', '') + ':' + data.arr_time.fillna(0).astype(np.int64).astype(str).str.extract('(d{2}$)', expand=False) + ':00'

We now have time columns in the format we wanted. Finally we may want to combine the year, month and day columns into a date column. This is not an absolutely necessary step. But we can easily obtain the year, month and day (and other) information once it is converted into datetime format.

data['date'] = pd.to_datetime(data[['year', 'month', 'day']]) # finally we drop the columns we don't need data = data.drop(columns = ['year', 'month', 'day'])

Now import the dataset using the networkx function that ingests a pandas dataframe directly. Just like Graph creation there are multiple ways Data can be ingested into a Graph from multiple formats.

import networkx as nx FG = nx.from_pandas_edgelist(data, source='origin', target='dest', edge_attr=True,) FG.nodes()


NodeView(('EWR', 'MEM', 'LGA', 'FLL', 'SEA', 'JFK', 'DEN', 'ORD', 'MIA', 'PBI', 'MCO', 'CMH', 'MSP', 'IAD', 'CLT', 'TPA', 'DCA', 'SJU', 'ATL', 'BHM', 'SRQ', 'MSY', 'DTW', 'LAX', 'JAX', 'RDU', 'MDW', 'DFW', 'IAH', 'SFO', 'STL', 'CVG', 'IND', 'RSW', 'BOS', 'CLE')) FG.edges()


EdgeView([('EWR', 'MEM'), ('EWR', 'SEA'), ('EWR', 'MIA'), ('EWR', 'ORD'), ('EWR', 'MSP'), ('EWR', 'TPA'), ('EWR', 'MSY'), ('EWR', 'DFW'), ('EWR', 'IAH'), ('EWR', 'SFO'), ('EWR', 'CVG'), ('EWR', 'IND'), ('EWR', 'RDU'), ('EWR', 'IAD'), ('EWR', 'RSW'), ('EWR', 'BOS'), ('EWR', 'PBI'), ('EWR', 'LAX'), ('EWR', 'MCO'), ('EWR', 'SJU'), ('LGA', 'FLL'), ('LGA', 'ORD'), ('LGA', 'PBI'), ('LGA', 'CMH'), ('LGA', 'IAD'), ('LGA', 'CLT'), ('LGA', 'MIA'), ('LGA', 'DCA'), ('LGA', 'BHM'), ('LGA', 'RDU'), ('LGA', 'ATL'), ('LGA', 'TPA'), ('LGA', 'MDW'), ('LGA', 'DEN'), ('LGA', 'MSP'), ('LGA', 'DTW'), ('LGA', 'STL'), ('LGA', 'MCO'), ('LGA', 'CVG'), ('LGA', 'IAH'), ('FLL', 'JFK'), ('SEA', 'JFK'), ('JFK', 'DEN'), ('JFK', 'MCO'), ('JFK', 'TPA'), ('JFK', 'SJU'), ('JFK', 'ATL'), ('JFK', 'SRQ'), ('JFK', 'DCA'), ('JFK', 'DTW'), ('JFK', 'LAX'), ('JFK', 'JAX'), ('JFK', 'CLT'), ('JFK', 'PBI'), ('JFK', 'CLE'), ('JFK', 'IAD'), ('JFK', 'BOS')]) nx.draw_networkx(FG, with_labels=True) # Quick view of the Graph. As expected we see 3 very busy airports

nx.algorithms.degree_centrality(FG) # Notice the 3 airports from which all of our 100 rows of data originates nx.density(FG) # Average edge density of the Graphs


0.09047619047619047 nx.average_shortest_path_length(FG) # Average shortest path length for ALL paths in the Graph


2.36984126984127 nx.average_degree_connectivity(FG) # For a node of degree k - What is the average of its neighbours' degree?


{1: 19.307692307692307, 2: 19.0625, 3: 19.0, 17: 2.0588235294117645, 20: 1.95}

As is obvious from looking at the Graph visualization (way above) – There are multiple paths from some airports to others. Let us say we want to calculate the shortest possible route between 2 such airports. Right off the bat we can think of a couple of ways of doing it

There is the shortest path by distance

There is the shortest path by flight time

What we can do is to calculate the shortest path algorithm by weighing the paths with either the distance or airtime. Please note that this is an approximate solution – The actual problem to solve is to calculate the shortest path factoring in the availability of a flight when you reach your transfer airport + wait time for the transfer. This is a more complete approach and this is how humans normally plan their travel. For the purposes of this article we will just assume that is flight is readily available when you reach an airport and calculate the shortest path using the airtime as the weight

Let us take the example of JAX and DFW airports:

# Let us find all the paths available for path in nx.all_simple_paths(FG, source='JAX', target='DFW'): print(path) # Let us find the dijkstra path from JAX to DFW. dijpath = nx.dijkstra_path(FG, source='JAX', target='DFW') dijpath


['JAX', 'JFK', 'SEA', 'EWR', 'DFW'] # Let us try to find the dijkstra path weighted by airtime (approximate case) shortpath = nx.dijkstra_path(FG, source='JAX', target='DFW', weight='air_time') shortpath


['JAX', 'JFK', 'BOS', 'EWR', 'DFW'] Conclusion

This article has at best only managed a superficial introduction to the very interesting field of Graph Theory and Network analysis. Knowledge of the theory and the Python packages will add a valuable toolset to any Data Scientist’s arsenal. For the dataset used above, a series of other questions can be asked like:

Find the shortest path between two airports given Cost, Airtime and Availability?

You are an airline carrier and you have a fleet of airplanes. You have an idea of the demand available for your flights. Given that you have permission to operate 2 more airplanes (or add 2 airplanes to your fleet) which routes will you operate them on to maximize profitability?

Can you rearrange the flights and schedules to optimize a certain parameter (like Timeliness or Profitability etc)

Bibiliography and References About the Author

Srivatsa currently works for TheMathCompany and has over 7.5 years of experience in Decision Sciences and Analytics. He has grown, led & scaled global teams across functions, industries & geographies. He has led India Delivery for a cross industry portfolio totalling $10M in revenues. He has also conducted several client workshops and training sessions to help level up technical and business domain knowledge.

During his career span, he has led premium client engagements with Industry leaders in Technology, e-commerce and retail. He helped set up the Analytics Center of Excellence for one of the world’s largest Insurance companies.


Update the detailed information about Implementing Artificial Neural Network(Classification) In Python From Scratch on the website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!