Trending February 2024 # Implementing Audio Classification Project Using Deep Learning # Suggested March 2024 # Top 3 Popular

You are reading the article Implementing Audio Classification Project Using Deep Learning updated in February 2024 on the website Daihoichemgio.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested March 2024 Implementing Audio Classification Project Using Deep Learning

This article was published as a part of the Data Science Blogathon.

where we will be exploring learnings for audio and sound classification using Machine learning and deep learning. It is amazing and interesting to know – how machines are capable of understanding human language, and responding in the same way. NLP(Natural Language Processing) is one of the most researched and studied topics of today’s generation, it helps to make machines capable of handling human language in the form of speech as well as text.

Image source – Created using Canva

Table of Contents

Introduction to Audio Classification

Project Overview

Dataset Overview

Hands-on Implementing Audio Classification project

EDA On Audio Data

Data Preprocessing

Building ANN for Audio Classification

Testing some unknown Audio

End Notes

Introduction to Audio Classification

Audio Classification means categorizing certain sounds in some categories, like environmental sound classification and speech recognition. The task we perform same as in Image classification of cat and dog, Text classification of spam and ham. It is the same applied in audio classification. The only difference is the type of data where we have images, text, and now we have a certain type of audio file of a certain length.

Why Audio Classification is termed as difficult than other types of classification?

There are many techniques to classify images as we have different in-built neural networks under CNN, especially to deal with images. And it is easy to extract features from images because images already come in the form of numbers, as the formation of an image is a collection of pixels, and pixels are in the form of numbers. When we have data as text, we use the sequential encoder and decoder-based techniques to find features. But when the buzzing is about recognizing speech, it becomes difficult to compare it to text because it is based on frequency and time. So you need to extract proper pitch and frequency.

Audio classification employs in industries across different domains like voice lock features, music genre identification, Natural Language classification, Environment sound classification, and to capture and identify different types of sound. It is used in chatbots to provide chatbots with the next level of power.

Project Overview

Sound classification is a growing area of research that everyone is trying to learn and implement on some kinds of projects. The project we will build in this article is simply such that a beginner can easily follow – where the problem statement to apply the deep learning techniques to classify environmental sounds, specifically focusing on identifying the urban sounds.

Given an audio sample of some category with a certain duration in .wav extension and determine whether it contains target urban sounds. It lies under the supervised machine learning category, so we have a dataset as well as a target category.

Dataset Overview

The dataset we will use is called as Urban Sound 8k dataset. The dataset contains 8732 sound files of 10 different classes and is listed below. Our task is to extract different features from these files and classify the corresponding audio files into respective categories. You can download the dataset from the official website from here, and it is also available on Kaggle. The size of the dataset is a little bit large, so if it’s not possible to download, then you can create Kaggle Notebook and can practice it on Kaggle itself.

Air Conditioner

Car Horn

Children Playing

Dog Bark

Drilling Machine

Engine Idling

Gun Shot

Jackhammer

Siren

Street Music

Hands-On Practice of Audio Classification Project Libraries Installation

The very important and great library that supports audio and music analysis is Librosa. Simply use the Pip command to install the library. It provides building blocks that are required to construct an information retrieval model from music. Another great library we will use is for deep learning modeling purposes is TensorFlow, and I hope everyone has already installed TensorFlow.

pip install librosa pip install tensorflow Exploratory Data Analysis of Audio data

We have 10 different folders under the urban dataset folder. Before applying any preprocessing, we will try to understand how to load audio files and how to visualize them in form of the waveform. If you want to load the audio file and listen to it, then you can use the IPython library and directly give it an audio file path. We have taken the first audio file in the fold 1 folder that belongs to the dog bark category.

import IPython.display as ipd filepath = "../input/urbansound8k/fold1/101415-3-0-2.wav" ipd.Audio(filepath)

Image Source – screenshot by Author

Now we will use Librosa to load audio data. So when we load any audio file with Librosa, it gives us 2 things. One is sample rate, and the other is a two-dimensional array. Let us load the above audio file with Librosa and plot the waveform using Librosa.

2-D Array – The first axis represents recorded samples of amplitude. And the second axis represents the number of channels. There are different types of channels – Monophonic(audio that has one channel) and stereo(audio that has two channels).

import librosa import librosa.display data, sample_rate = librosa.load(filepath) plt.figure(figsize=(12, 5)) librosa.display.waveshow(data, sr=sample_rate)

Source – Author

As we read, if you try to print the sample rate, then it’s output will be 22050 because when we load the data with librosa, then it normalizes the entire data and tries to give it in a single sample rate. The same we can achieve using scipy python library also. It will also give us two pieces of information – one is sample rate, and the other is data.

from chúng tôi import wavfile as wav

wave_sample_rate, wave_audio = wav.read(filepath)

print(wave_sample_rate)

print(wave_audio)

import matplotlib.pyplot as plt

plt.figure(figsize=(12, 4))

plt.plot(wave_audio)

When you print the sample rate using scipy-it is different than librosa. Now let us visualize the wave audio data. One important thing to understand between both is- when we print the data retrieved from librosa, it can be normalized, but when we try to read an audio file using scipy, it can’t be normalized. Librosa is now getting popular for audio signal processing because of the following three reasons.

It tries to converge the signal into mono(one channel).

It can represent the audio signal between -1 to +1(in normalized form), so a regular pattern is observed.

It is also able to see the sample rate, and by default, it converts it to 22 kHz, while in the case of other libraries, we see it according to a different value.

Imbalance Dataset check

Now we know about the audio files and how to visualize them in audio format. Moving format to data exploration we will load the CSV data file provided for each audio file and check how many records we have for each class.

import pandas as pd metadata = pd.read_csv('/urbansound8k/UrbanSound8K.csv') metadata.head(10)

Source – Author

The data we have is a filename and where it is present so let us explore 1st file, so it is present in fold 5 with category as a dog bark. Now use the value counts function to check records of each class.

metadata['class'].value_counts()

When you see the output so data is not imbalanced, and most of the classes have an approximately equal number of records. We can also visualize the count of records in each category using a bar plot or count plot.

import seaborn as sns plt.figure(figsize=(10, 6)) sns.countplot(metadata['class']) plt.title("Count of records in each class") plt.xticks(rotation="vertical") plt.show()

Image Source – Code Output

Data Preprocessing

Some audios are getting recorded at a different rate-like 44KHz or 22KHz. Using librosa, it will be at 22KHz, and then, we can see the data in a normalized pattern. Now, our task is to extract some important information, and keep our data in the form of independent(Extracted features from the audio signal) and dependent features(class labels). We will use Mel Frequency Cepstral coefficients to extract independent features from audio signals.

MFCCs – The MFCC summarizes the frequency distribution across the window size. So, it is possible to analyze both the frequency and time characteristics of the sound. This audio representation will allow us to identify features for classification. So, it will try to convert audio into some kind of features based on time and frequency characteristics that will help us to do classification. To know and read more about MFCC, you can watch this video and can also read this research paper by springer.

To demonstrate how we apply MFCC in practice, first, we will apply it on a single audio file that we are already using.

mfccs = librosa.feature.mfcc(y=data, sr=sample_rate, n_mfcc=40) print(mfccs.shape) print(mfccs) def features_extractor(file): #load the file (audio) audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast') #we extract mfcc mfccs_features = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40) #in order to find out scaled feature we do mean of transpose of value mfccs_scaled_features = np.mean(mfccs_features.T,axis=0) return mfccs_scaled_features

👉 Now, to extract all the features for each audio file, we have to use a loop over each row in the dataframe. We also use the TQDM python library to track the progress. Inside the loop, we’ll prepare a customized file path for each file and call the function to extract MFCC features and append features and corresponding labels in a newly formed dataframe.

#Now we ned to extract the featured from all the audio files so we use tqdm import numpy as np from tqdm import tqdm ### Now we iterate through every audio file and extract features ### using Mel-Frequency Cepstral Coefficients extracted_features=[] for index_num,row in tqdm(metadata.iterrows()): file_name = os.path.join(os.path.abspath(audio_dataset_path),'fold'+str(row["fold"])+'/',str(row["slice_file_name"])) final_class_labels=row["class"] data=features_extractor(file_name) extracted_features.append([data,final_class_labels])

The loop will take a little bit of time to run because it will iterate over 8000 rows, and after that, you can observe the dataframe of extracted features as shown below.

### converting extracted_features to Pandas dataframe extracted_features_df=pd.DataFrame(extracted_features,columns=['feature','class']) extracted_features_df.head()

Image source – Screenshot by Author

Train Test split

First, we split the dependent and independent features. After that, we have 10 classes, so we use label encoding(Integer label encoding) from number 1 to 10 and convert it into categories. After that, we split the data into train and test sets in an 80-20 ratio.

### Split the dataset into independent and dependent dataset X=np.array(extracted_features_df['feature'].tolist()) y=np.array(extracted_features_df['class'].tolist()) from tensorflow.keras.utils import to_categorical from sklearn.preprocessing import LabelEncoder labelencoder=LabelEncoder() y=to_categorical(labelencoder.fit_transform(y)) ### Train Test Split from sklearn.model_selection import train_test_split X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=0)

Totally, we have 6985 records in the train set and 1747 samples in the test set. Let’s head over to Model creation.

Audio Classification Model Creation

We have extracted features from the audio sample and splitter in the train and test set. Now we will implement an ANN model using Keras sequential API. The number of classes is 10, which is our output shape(number of classes), and we will create ANN with 3 dense layers and architecture is explained below.

The first layer has 100 neurons. Input shape is 40 according to the number of features with activation function as Relu, and to avoid any overfitting, we’ll use the Dropout layer at a rate of 0.5.

The second layer has 200 neurons with activation function as Relu and the drop out at a rate of 0.5.

The third layer again has 100 neurons with activation as Relu and the drop out at a rate of 0.5.

from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense,Dropout,Activation,Flatten from tensorflow.keras.optimizers import Adam from sklearn import metrics ### No of classes num_labels=y.shape[1] model=Sequential() ###first layer model.add(Dense(100,input_shape=(40,))) model.add(Activation('relu')) model.add(Dropout(0.5)) ###second layer model.add(Dense(200)) model.add(Activation('relu')) model.add(Dropout(0.5)) ###third layer model.add(Dense(100)) model.add(Activation('relu')) model.add(Dropout(0.5)) ###final layer model.add(Dense(num_labels)) model.add(Activation('softmax'))

👉 You can observe the model summary using the summary function.

Image Source – Screenshot by Author

Compile the Model

To compile the model we need to define loss function which is categorical cross-entropy, accuracy metrics which is accuracy score, and an optimizer which is Adam.

Train the Model

We will train the model and save the model in HDF5 format. We will train a model for 100 epochs and batch size as 32. We’ll use callback, which is a checkpoint to know how much time it took to train over data.

## Trianing my model from tensorflow.keras.callbacks import ModelCheckpoint from datetime import datetime num_epochs = 100 num_batch_size = 32 checkpointer = ModelCheckpoint(filepath='./audio_classification.hdf5', verbose=1, save_best_only=True) start = datetime.now() model.fit(X_train, y_train, batch_size=num_batch_size, epochs=num_epochs, validation_data=(X_test, y_test), callbacks=[checkpointer], verbose=1) duration = datetime.now() - start print("Training completed in time: ", duration)

Check the Test Accuracy

Now we will evaluate the model on test data. we got near about 77 percent accuracy on the training dataset and 76 percent on test data.

test_accuracy=model.evaluate(X_test,y_test,verbose=0) print(test_accuracy[1])

If you are using the TensorFlow version below 2.6, then you can use predict classes function to predict the corresponding class for each audio file. But, if you are using 2.6 and above, then you can use predict and argument maximum function.

#model.predict_classes(X_test) predict_x=model.predict(X_test) classes_x=np.argmax(predict_x,axis=1) print(classes_x) Testing Some Test Audio Sample

Now it is time to test some random audio samples. Whenever we’ll get new audio, we have to perform three steps again to get the predicted label and class.

First, preprocess the audio file (load it using Librosa and extract MFCC features)

Predict the label to which audio belongs.

An inverse transforms the predicted label to get the respective class name to which it belongs.

filename="../input/urbansound8k/fold7/101848-9-0-0.wav" #preprocess the audio file audio, sample_rate = librosa.load(filename, res_type='kaiser_fast') mfccs_features = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40) mfccs_scaled_features = np.mean(mfccs_features.T,axis=0) #Reshape MFCC feature to 2-D array mfccs_scaled_features=mfccs_scaled_features.reshape(1,-1) #predicted_label=model.predict_classes(mfccs_scaled_features) x_predict=model.predict(mfccs_scaled_features) predicted_label=np.argmax(x_predict,axis=1) print(predicted_label) prediction_class = labelencode _transform(predicted_label) print(prediction_class)

Congratulations! 👏 If you have followed the article till here and have tried to implement it along with the reading. Then, you have learned how to deal with the audio data and use MFCC to extract important features from audio samples and build a simple ANN model on top of it to classify the audio in a different class. To practice this project, I am providing you with the environmental sound classification dataset present over Kaggle and trying to implement this project for practice purposes. The learning of this article can be concluded in some bullet points as discussed below.

Audio classification is a technique to classify sounds into different categories.

We can visualize any audio in the form of a waveform.

MFCC method is used to extract important features from audio files.

Scaling the audio samples to a common scale is important before feeding data to the model to understand it better.

You can build a CNN model to classify audios. And as well as try to build a much deeper ANN than we have built.

👉 The complete Notebook implementation is available here.

👉 Connect with me on Linkedin.

👉 Check out my other articles on Analytics Vidhya and crazy-techie

Thanks for giving your time!

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion. 

Related

You're reading Implementing Audio Classification Project Using Deep Learning

Fake News Classification Using Deep Learning

This article was published as a part of the Data Science Blogathon.

Introduction

Here’s a quick puzzle for you. I’ll give you two titles, and you’ll have to tell me which is fake. Ready? Let’s get started:

“Wipro is planning to buy an EV-based startup.”

Well, it turns out that both of those headlines were fake news. In this article, you will learn the fake news classification using deep learning.

Image – 1

The grim reality is that there is a lot of misinformation and disinformation on the internet. Ninety per cent of Canadians have fallen for false news, according to a 2023 research done by Ipsos Public Affairs for Canada’s Centre for International Governance Innovation.

It got me thinking: is it feasible to build an algorithm that can tell whether an article’s title is fake news? Well, it appears to be the case!

In this post, we go through the exploration of the classification model with BERT and LSTMs to identify the fake new classification.

Go through this Github link to view the complete code.

Dataset for Fake News Classification

We use the dataset from Kaggle. It consists of 2095 article details that include author, title, and other information. Go through the link to get the dataset.

EDA

Let us start analyzing our data to get better insights from it. The dataset looks clean, and now we map the values to our class Real and Fake such as 0 and 1.

data = pd.read_csv('/content/news_articles.csv') data = data[['title', 'label']] data['label'] = data['label'].map({'Real': 0, 'Fake':1}) data.head()

Image by Author

Since we have 1294 samples of real news and 801 samples of fake news, there is an approximately 62:38 news ratio. It means that our dataset is relatively biased. For our project, we consider the title and class columns.

Now, we can analyze the trends present in our dataset. To get an idea of dataset size, we get the mean, min, and max character lengths of titles. We use a histogram to visualize the data.

# Character Length of Titles - Min, Mean, Max print('Mean Length', data['title'].apply(len).mean()) print('Min Length', data['title'].apply(len).min()) print('Max Length', data['title'].apply(len).max()) x = data['title'].apply(len).plot.hist()

Image by Author

We can observe that characters in each title range from 2-443. We can also see that more per cent of samples with a length of 0-100. The mean length of the dataset is around 61.

Preprocessing Data

Now we will use the NLTK library to preprocess our dataset, which includes:

Tokenization:

It is the process of dividing a text into smaller units (each word will be an index in an array)

Lemmatization:

It removes the endings of the word to the root word. It reduces the word children to a child.

Stop words Removal:

Words like the and for will be eliminated from our dataset because they take too much room.

#Import nltk preprocessing library to convert text into a readable format import nltk from nltk.tokenize import sent_tokenize from chúng tôi import WordNetLemmatizer from nltk.corpus import stopwords nltk.download('punkt') nltk.download('wordnet') nltk.download('stopwords') data['title'] = data.apply(lambda row: nltk.word_tokenize(row['title']), axis=1) #Define text lemmatization model (eg: walks will be changed to walk) lemmatizer = WordNetLemmatizer() #Loop through title dataframe and lemmatize each word def lemma(data): return [lemmatizer.lemmatize(w) for w in data] #Apply to dataframe data['title'] = data['title'].apply(lemma) #Define all stopwords in the English language (it, was, for, etc.) stop = stopwords.words('english') #Remove them from our dataframe data['title'] = data['title'].apply(lambda x: [i for i in x if i not in stop]) data.head()

Image by Author

We create two models using this data for text classification:

An LSTM model (Tensorflow’s wiki-words-250 embeddings)

A BERT model.

LSTM Model for Fake News Classification

We split our data into a 70:30 ratio of train and test.

#Split data into training and testing dataset title_train, title_test, y_train, y_test = train_test_split(titles, labels, test_size=0.3, random_state=1000)

To get predictions based on the text from our model, we need to encode it in vector format then it is processed by the machine.

Word2Vec Skip-Gram architecture had used by TensorFlow’s wiki-words-250. Based on the input, Skip-gram had trained by predicting the context.

Consider this sentence as an example:

I am going on a voyage in my car.

The word voyage passed as input and one as the window size. The window size means before and after the target word to predict. In our case, the words are gone and car (excluding stopwords, and go is the lemmatized form of going).

We one-hot-encode our word, resulting in an input vector of size 1 x V, where V is the vocabulary size. A weight matrix of V rows (one for each word in our vocabulary) and E columns, where E is a hyperparameter indicating the size of each embedding, will be multiplied by the representation. Except for one, all values in the input vector are zero because it is one-hot encoded (representing the word we are inputting). Finally, when the weight matrix had multiplied by the output, a 1xE vector denotes the embedding for that word.

The output layer, which consists of a softmax regression classifier, will receive the 1xE vector. It had built of V neurons (which correspond to the vocabulary’s one-hot encoding) that produce a value between 0 and 1 for each word, indicating the likelihood of that word being in the window size.

Word embeddings with a size E of 250 are present in Tensorflow’s wiki-words-250. Embeddings applied to the model by looping through all of the words and computing the embedding for each one. We’ll need to utilize the pad sequences function to adjust for samples of variable lengths.

#Convert each series of words to a word2vec embedding indiv = [] for i in title_train: temp = np.array(embed(i)) indiv.append(temp) #Accounts for different length of words indiv = tf.keras.preprocessing.sequence.pad_sequences(indiv,dtype=’float’) indiv.shape

Therefore, there are 1466 samples in the training data, the highest length is 46 words, and each word has 250 features.

Now, we build our model. It consists of:

1 LSTM layer with 50 units

2 Dense layers (first 20 neurons, the second 5) with an activation function ReLU.

1 Dense output layer with activation function sigmoid.

We will use the Adam optimizer, a binary cross-entropy loss, and a performance metric of accuracy. The model will be trained over 10 epochs. Feel free to further adjust these hyperparameters.

#Sequential model has a 50 cell LSTM layer before Dense layers model = tf.keras.models.Sequential() model.add(tf.keras.layers.LSTM(50)) model.add(tf.keras.layers.Dense(20,activation='relu')) model.add(tf.keras.layers.Dense(5,activation='relu')) model.add(tf.keras.layers.Dense(1,activation='sigmoid')) #Compile model with binary_crossentropy loss, Adam optimizer, and accuracy metrics loss="binary_crossentropy", metrics=['accuracy']) #Train model on 10 epochs model.fit(indiv, y_train,validation_data=[test,y_test], epochs=20)

We get an accuracy of 59.4% on test data.

Using BERT for Fake News Classification

What would you reply if I asked you to name the English term with the most definitions?

That word is “set,” according to the Oxford English Dictionary’s Second Edition.

If you think about it, we could make a lot of different statements using that term in various settings. Consider the following scenario:

I set the table for lunch

The problem with Word2Vec is that no matter how the word had used, it generates the same embedding. We use BERT, which can build contextualized embeddings, to combat this.

BERT is known as “Bidirectional Encoder Representations from Transformers.” It employs a transformer model to generate contextualized embeddings by utilizing attention mechanisms.

An encoder-decoder design had used in a transformer model. The encoder layer creates a continuous representation based on the data it has learned from the input. The preceding input is delivered into the model by the decoder layer, which generates an output. Because BERT’s purpose is to build a vector representation from the text, it only employs an encoder.

Pre-Training & Fine-Tuning

BERT had trained using two ways. The first method is known to be veiled language modelling. Before transmitting sequences, a [MASK] token had used to replace 15% of the words. Using the context supplied by the unmasked words, the model will predict the masked words.

It is accomplished by

Using embedding matrix to apply a classification layer to the encoder output. As a result, it will be the same size as the vocabulary.

Using the softmax function to calculate the likelihood of the word.

The second strategy is to guess the upcoming sentence. The model will be given two sentences as input and predict whether the second sentence will come after the first. While training, half of the inputs are pairs, while the other half consists of random sentences from the corpus. To distinguish between the two statements,

Here, it adds a [CLS] token at the start of the first sentence and a [SEP] token at the end of each.

Each token (word) contains a positional embedding that allows information extracted from the text’s location. Because there is no repetition in a transformer model, there is no inherent comprehension of the word’s place.

Each token is given a sentence embedding (further differentiating between the sentences).

For Next Sentence Prediction, the output of the [CLS] embedding, which stands for “aggregate sequence representation for sentence classification,” is passed through a classification layer with softmax to return the probability of the two sentences being sequential.

Image by Author

Implementation of BERT

The BERT preprocessor and encoder from Tensorflow-hub had used. Do not run the content via the earlier-mentioned framework (which removes capitalization, applies lemmatization, etc.) The BERT preprocessor had used to abstract this.

We split our data for training and testing in the ratio of 80:20.

from sklearn.model_selection import train_test_split #Split data into training and testing dataset title_train, title_test, y_train, y_test = train_test_split(titles, labels, test_size=0.2, random_state=1000)

Now, load Bert preprocessor and encoder

# Use the bert preprocesser and bert encoder from tensorflow_hub

We can now work on our neural network. It must be a functional model, with each layer’s output serving as an argument to the next.

1 Input layer: Used to pass sentences into the model.

The bert_preprocess layer: Preprocess the input text.

The bert_encoder layer: Pass the preprocessed tokens into the BERT encoder.

1 Dropout layer with 0.2. The BERT encoder pooled_output is passed into it.

2 Dense layers with 10 and 1 neurons. The first uses a ReLU activation function, and the second is sigmoid.

import tensorflow as tf # Input Layers input_layer = tf.keras.layers.Input(shape=(), dtype=tf.string, name='news') # BERT layers processed = bert_preprocess(input_layer) output = bert_encoder(processed) # Fully Connected Layers layer = tf.keras.layers.Dropout(0.2, name='dropout')(output['pooled_output']) layer = tf.keras.layers.Dense(10,activation='relu', name='hidden')(layer) layer = tf.keras.layers.Dense(1,activation='sigmoid', name='output')(layer) model = tf.keras.Model(inputs=[input_layer],outputs=[layer])

The “pooled output” will be transmitted into the dropout layer, as you can see. This value represents the text’s overall sequence representation. It is, as previously said, the representation of the [CLS] token outputs.

The Adam optimizer, a binary cross-entropy loss, and an accuracy performance metric had used. For five epochs, the model had trained. Feel free to tweak these hyperparameters even more.

#Compile model on adam optimizer, binary_crossentropy loss, and accuracy metrics #Train model on 5 epochs model.fit(title_train, y_train, epochs= 5) #Evaluate model on test data model.evaluate(title_test,y_test)

Image by Author

Above, you can see that our model achieved an accuracy of 61.33%.

Conclusion

To improve the model performance:

Train the models on a large dataset.

Tweak hyperparameters of the model.

I hope you had found this post insightful and a better understanding of NLP techniques for fake news classification.

References

Image – 1: Photo by Roman Kraft on Unsplash

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion. 

Related

Implementing Artificial Neural Network(Classification) In Python From Scratch

This article was published as a part of the Data Science Blogathon

Neural networks. One of the booming technological breakthroughs in the 21st century.

Are you interested in creating your own neural network from scratch in Python?. Well, you are at the right place. In this article, we will be creating an artificial neural network from scratch in python. The Artificial Neural Network that we are going to develop here is the one that will solve a classification problem. So stretch your fingers, and let’s get started.

Interesting Sidenote

Artificial Neural Networks(ANN) are part of supervised machine learning where we will be having input as well as corresponding output present in our dataset. Our whole aim is to figure out a way of mapping this input to the respective output. ANN can be used for solving both regression and classification problems.

From the perspective of this blog, we will be developing an ANN for solving the classification class of problems.

Pre-Requisites for Artificial Neural Network Implementation

Following will be the libraries and software that we will be needing in order to implement ANN.

1. Python – 3.6 or later

2. Jupyter Notebook ( Google Colab can also be used )

3. Pandas

4. Numpy

5. Tensorflow 2. x

6. Scikit-Learn

Understanding the Problem Statement for Artificial Neural Network

Here we are dealing with a dataset from the finance domain. We have a dataset where we are having 14 dimensions in total and 100000 records. The dimensions that we will be dealing with are as follows:-

RowNumber:- Represents the number of rows

CustomerId:- Represents customerId

Surname:- Represents surname of the customer

CreditScore:- Represents credit score of the customer

Geography:- Represents the city to which customers belongs to

Gender:- Represents Gender of the customer

Age:- Represents age of the customer

Tenure:- Represents tenure of the customer with a bank

Balance:- Represents balance hold by the customer

NumOfProducts:- Represents the number of bank services used by the customer

HasCrCard:- Represents if a customer has a credit card or not

IsActiveMember:- Represents if a customer is an active member or not

EstimatedSalary:- Represents estimated salary of the customer

Exited:- Represents if a customer is going to exit the bank or not.

Structure of dataset

As we can see from the above data dictionary, we are dealing with a total of 14 dimensions.

Here our main goal is to create an artificial neural network that will take into consideration all independent variables(first 13) and based on that will predict if our customer is going to exit the bank or not(Exited is dependent variable here).

Once we understand the steps for constructing neural networks, we can directly implement those same steps to other datasets as well.

One of the ways where we can find such datasets is the UCI machine learning repository. These datasets are classified into regression and classification problems. Since we are implementing this neural network to solve classification problems, you can download any classification dataset from there and can apply the same steps on any dataset of your choice !. How cool is that?

Importing Necessary Libraries for Artificial Neural Network

Let’s import all the necessary libraries here

#Importing necessary Libraries import numpy as np import pandas as pd import tensorflow as tf Importing Dataset

In this step, we are going to import our dataset. Since our dataset is in csv format, we are going to use the read_csv() method of pandas in order to load the dataset.

#Loading Dataset data = pd.read_csv("Churn_Modelling.csv") Generating Matrix of Features (X)

The basic principle while creating a machine learning model is to generate X also called as Matrix of Features. This X basically contains all our independent variables. Let’s create the same here.

Python Code:



Here I have used iloc method of Pandas data frame which allows us to fetch the desired values from the desired column within the dataset. Here as we can see that we are fetching all the data from the 3rd column till the last minus one column. The reason for that is the first 3 columns i.e RowNumber, CustomerId, and Surname have nothing to do with deciding whether the customer is going to exit or not. Hence in this case we started fetching all the values from the 3rd column onwards. Lastly, since our last column is basically a dependent variable hence we have mentioned -1 in iloc method using which allows us to exclude the last column from being included in our matrix of features X.

Generating Dependent Variable Vector(Y)

In the same fashion where we have created our matrix of features(X) for the independent variable, we also have to create a dependent variable vector(Y) which will only contain our dependent variable values.

#Generating Dependent Variable Vectors Y = data.iloc[:,-1].values Encoding Categorical Variable Gender

Now we have defined our X and Y, from this point on we are going to start with one of the highly time-consuming phases in any machine learning problem-solving. This phase is known as feature engineering. To define it in a simple manner, feature engineering is a phase where we either generate new variables from existing ones or modify existing variables so as to use them in our machine learning model.

In the above image depicting the structure of the dataset, we can see that most of the variables are numeric in nature with exception of a few – Gender, Country. Essentially, a machine learning model is a mathematical formula that is only going to accept digits as input. So we try to create an ML model using this dataset which contains a mix of data( numeric + string), our model will simply fail during the creation process itself. Hence we need to convert those string values into their numerical equivalent without losing their significance.

One of the most efficient ways of doing this is by using a technique called encoding. It is a process that will convert strings or categories directly into their numerical equivalent without losing significance.

Here our gender column has only 2 categories which are male and female, we are going to use LabelEncoding. This type of encoding will simply convert this column into a column having values of 0 and 1. In order to use Label Encoding, we are going to use LabelEncoder class from sklearn library.

#Encoding Categorical Variable Gender from sklearn.preprocessing import LabelEncoder LE1 = LabelEncoder() X[:,2] = np.array(LE1.fit_transform(X[:,2]))

Here we have applied label encoding on the Gender column of our dataset.

Encoding Categorical Variable Country

Now let’s deal with another categorical column named country. This column has a cardinality of 3 meaning that it has 3 distinct categories present i.e France, Germany, Spain.

Here we have 2 options:-

1. We can use Label Encoding here and directly convert those values into 0,1,2 like that

2. We can use One Hot Encoding here which will convert those strings into a binary vector stream. For example – Spain will be encoded as 001, France will be 010, etc.

The first approach is easy and faster to implement. However, once those values are encoded, those will be converted into 0,1,2. However, there does exist another method of encoding known as one-hot encoding. In one hot encoding, all the string values are converted into binary streams of 0’s and 1’s. One-hot encoding ensures that the machine learning algorithm does not assume that higher numbers are more important.

#Encoding Categorical variable Geography from sklearn.preprocessing import OneHotEncoder ct =ColumnTransformer(transformers=[('encoder',OneHotEncoder(),[1])],remainder="passthrough") X = np.array(ct.fit_transform(X))

Here we have used OneHotEncoder class from sklearn in order to perform one-hot encoding. Now you might have a query here. What is the use of ColumnTransformer? Well, ColumnTransformer is another class in sklearn that will allow us to select a particular column from our dataset on which we can apply one-hot encoding.

Splitting Dataset into Training and Testing Dataset

In this step, we are going to split our dataset into training and testing datasets. This is one of the bedrocks of the entire machine learning process. The training dataset is the one on which our model is going to train while the testing dataset is the one on which we are going to test the performance of our model.

#Splitting dataset into training and testing dataset from sklearn.model_selection import train_test_split X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.2,random_state=0)

Here we have used the train_test_split function from the sklearn library. We have split our dataset in a configuration such that 80 percent of data will be there in the training phase and 20 percent of data will be in the testing phase.

Additionally, the best part about using the train_test_split function from sklearn is that, while splitting it will also be performing data shuffling in order to create a more generalized dataset.

Performing Feature Scaling

The very last step in our feature engineering phase is feature scaling. It is a procedure where all the variables are converted into the same scale. Why you might ask?. Sometimes in our dataset, certain variables have very high values while certain variables have very low values. So there is a chance that during model creation, the variables having extremely high-value dominate variables having extremely low value. Because of this, there is a possibility that those variables with the low value might be neglected by our model, and hence feature scaling is necessary.

Now here I am going to answer one of the most important questions asked in a machine learning interview. ” When to perform feature scaling – before the train-test split or after the train-test split?”.

Well, the answer is after we split the dataset into training and testing datasets. The reason being, the training dataset is something on which our model is going to train or learned itself. While the testing dataset is something on which our model is going to be evaluated. If we perform feature scaling before the train-test split then it will cause information leakage on testing datasets which neglects the purpose of having a testing dataset and hence we should always perform feature scaling after the train-test split.

Now how we are going to perform feature scaling? Well, there are many ways of performing feature scaling. The two most efficient techniques in the context are:-

1. Standardization

2. Normalization

Whenever standardization is performed, all values in the dataset will be converted into values ranging between -3 to +3. While in the case of normalization, all values will be converted into a range between -1 to +1.

There are few conditions on which technique to use and when. Usually, Normalization is used only when our dataset follows a normal distribution while standardization is a universal technique that can be used for any dataset irrespective of the distribution. Here we are going to use Standardization.

#Performing Feature Scaling from sklearn.preprocessing import StandardScaler sc = StandardScaler() X_train = sc.fit_transform(X_train) X_test = sc.transform(X_test)

Here we have used StandardScalar class from the sklearn library in order to perform standardization.

Now we have completed our feature engineering phase. We can now start with the creation of our artificial neural network from the next point onwards.

Initializing Artificial Neural Network

This is the very first step while creating ANN. Here we are going to create our ann object by using a certain class of Keras named Sequential.

#Initialising ANN ann = tf.keras.models.Sequential()

As a part of tensorflow 2.0, Keras is now integrated with tensorflow and is now considered as a sub-library of tensorflow. The Sequential class is a part of the models module of Keras library which is a part of the tensorflow library now.

Creating Hidden Layers

Once we initialize our ann, we are now going to create layers for the same. Here we are going to create a network that will have 2 hidden layers, 1 input layer, and 1 output layer. So, let’s create our very first hidden layer

 #Adding First Hidden Layer ann.add(tf.keras.layers.Dense(units=6,activation="relu"))

Here we have created our first hidden layer by using the Dense class which is part of the layers module. This class accepts 2 inputs:-

1. units:- number of neurons that will be present in the respective layer

2. activation:- specify which activation function to be used

For the first input, I had tested with many values in the past and the optimal value that I had found is 6. Obviously, we can try with any other value as there is no hard rule about the number of neurons that should be present in the layer.

For the second input, we are always going to use “relu”[rectified linear unit] as an activation function for hidden layers. Since we are going to create two hidden layers, this same step we are going to repeat for the creation of the second hidden layer as well.

 #Adding Second Hidden Layer ann.add(tf.keras.layers.Dense(units=6,activation="relu")) Creating Output Layer

In this step, we are going to create our output layer for ann. The output layer will be responsible for giving output.

 #Adding Output Layer ann.add(tf.keras.layers.Dense(units=1,activation="sigmoid"))

Here again, we are going to use the Dense class in order to create the output layer. Two important things to remember here:-

1. In a binary classification problem(like this one) where we will be having only two classes as output (1 and 0), we will be allocating only one neuron to output this result. For the multiclass classification problem, we have to use more than one neuron in the output layer. For example – if our output contains 4 categories then we need to create 4 different neurons[one for each category].

2. For the binary classification Problems, the activation function that should always be used is sigmoid. For a multiclass classification problem, the activation function that should be used is softmax.

Here since we are dealing with binary classification hence we are allocating only one neuron in the output layer and the activation function which is used is softmax.

Compiling Artificial Neural Network

We have now created layers for our neural network. In this step, we are going to compile our ANN.

#Compiling ANN

We have used compile method of our ann object in order to compile our network. Compile method accepts the below inputs:-

1. optimizer:- specifies which optimizer to be used in order to perform stochastic gradient descent. I had experimented with various optimizers like RMSProp, adam and I have found that adam optimizer is a reliable one that can be used with any neural network.

2. loss:- specifies which loss function should be used. For binary classification, the value should be binary_crossentropy. For multiclass classification, it should be categorical_crossentropy.

3. metrics:- which performance metrics to be used in order to compute performance. Here we have used accuracy as a performance metric.

Fitting Artificial Neural Network

This is the last step in our ann creation process. Here we are just going to train our ann on the training dataset.

#Fitting ANN ann.fit(X_train,Y_train,batch_size=32,epochs = 100)

Here we have used the fit method in order to train our ann. The fit method is accepting 4 inputs in this case:-

1.X_train:- Matrix of features for the training dataset

2.Y_train:- Dependent variable vectors for the training dataset

3.batch_size: how many observations should be there in the batch. Usually, the value for this parameter is 32 but we can experiment with any other value as well.

4. epochs: How many times neural networks will be trained. Here the optimal value that I have found from my experience is 100.

Are you interested to see how the training process looks like? Well here is the snap for the same.

Training of Artificial Neural Network

Here we can see that in each epoch our loss is decreasing and our accuracy is increasing. As we can see here that our final accuracy is 86.59 which is pretty remarkable for a neural network with this simplicity.

That’s it :). We have created our artificial neural network from scratch using Python.

As an additional bonus, I am attaching the code below that will allow us to perform single-point prediction for any custom values of input.

Predicting Result for Single Point Observation #Predicting result for Single Observation

Output:

[[False]]

Here our neural network is trying to predict whether our customer is going to exit or not based on the values of independent variables

PRO Tip

Now we have created our model. I am giving you a pro tip now on how can you save your created neural network.

#Saving created neural network ann.save("ANN.h5")

That’s it. Using this one line of code allows us to save our ML model. You might have a query here?

What is the h5 file format? Well, h5 is a specific file format used by neural networks. Using this format we can directly save our neural network as a serialized object. It is similar to the pickle file format implementation that we use for storing traditional machine learning models.

Well, that’s all about implementing neural networks from scratch in Python.

If you’re an enthusiast who is looking forward to unravel the world of Generative AI. Then, please register for our upcoming event, DataHack Summit 2023.

Conclusion

Hope you like this article.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

Related

Ai Vs. Machine Learning Vs. Deep Learning

Since before the dawn of the computer age, scientists have been captivated by the idea of creating machines that could behave like humans. But only in the last decade has technology enabled some forms of artificial intelligence (AI) to become a reality.

Interest in putting AI to work has skyrocketed, with burgeoning array of AI use cases. Many surveys have found upwards of 90 percent of enterprises are either already using AI in their operations today or plan to in the near future.

Eager to capitalize on this trend, software vendors – both established AI companies and AI startups – have rushed to bring AI capabilities to market. Among vendors selling big data analytics and data science tools, two types of artificial intelligence have become particularly popular: machine learning and deep learning.

While many solutions carry the “AI,” “machine learning,” and/or “deep learning” labels, confusion about what these terms really mean persists in the market place. The diagram below provides a visual representation of the relationships among these different technologies:

As the graphic makes clear, machine learning is a subset of artificial intelligence. In other words, all machine learning is AI, but not all AI is machine learning.

Similarly, deep learning is a subset of machine learning. And again, all deep learning is machine learning, but not all machine learning is deep learning.

Also see: Top Machine Learning Companies

AI, machine learning and deep learning are each interrelated, with deep learning nested within ML, which in turn is part of the larger discipline of AI.

Computers excel at mathematics and logical reasoning, but they struggle to master other tasks that humans can perform quite naturally.

For example, human babies learn to recognize and name objects when they are only a few months old, but until recently, machines have found it very difficult to identify items in pictures. While any toddler can easily tell a cat from a dog from a goat, computers find that task much more difficult. In fact, captcha services sometimes use exactly that type of question to make sure that a particular user is a human and not a bot.

In the 1950s, scientists began discussing ways to give machines the ability to “think” like humans. The phrase “artificial intelligence” entered the lexicon in 1956, when John McCarthy organized a conference on the topic. Those who attended called for more study of “the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.”

Critics rightly point out that there is a big difference between an AI system that can tell the difference between cats and dogs and a computer that is truly intelligent in the same way as a human being. Most researchers believe that we are years or even decades away from creating an artificial general intelligence (also called strong AI) that seems to be conscious in the same way that humans beings are — if it will ever be possible to create such a system at all.

If artificial general intelligence does one day become a reality, it seems certain that machine learning will play a major role in the system’s capabilities.

Machine learning is the particular branch of AI concerned with teaching computers to “improve themselves,” as the attendees at that first artificial intelligence conference put it. Another 1950s computer scientist named Arthur Samuel defined machine learning as “the ability to learn without being explicitly programmed.”

In traditional computer programming, a developer tells a computer exactly what to do. Given a set of inputs, the system will return a set of outputs — just as its human programmers told it to.

Machine learning is different because no one tells the machine exactly what to do. Instead, they feed the machine data and allow it to learn on its own.

In general, machine learning takes three different forms: 

Reinforcement learning is one of the oldest types of machine learning, and it is very useful in teaching a computer how to play a game.

For example, Arthur Samuel created one of the first programs that used reinforcement learning. It played checkers against human opponents and learned from its successes and mistakes. Over time, the software became much better at playing checkers.

Reinforcement learning is also useful for applications like autonomous vehicles, where the system can receive feedback about whether it has performed well or poorly and use that data to improve over time.

Supervised learning is particularly useful in classification applications such as teaching a system to tell the difference between pictures of dogs and pictures of cats.

In this case, you would feed the application a whole lot of images that had been previously tagged as either dogs or cats. From that training data, the computer would draw its own conclusions about what distinguishes the two types of animals, and it would be able to apply what it learned to new pictures.

By contrast, unsupervised learning does not rely on human beings to label training data for the system. Instead, the computer uses clustering algorithms or other mathematical techniques to find similarities among groups of data.

Unsupervised machine learning is particularly useful for the type of big data analytics that interests many enterprise leaders. For example, you could use unsupervised learning to spot similarities among groups of customers and better target your marketing or tailor your pricing.

Some recommendation engines rely on unsupervised learning to tell people who like one movie or book what other movies or books they might enjoy. Unsupervised learning can also help identify characteristics that might indicate a person’s credit worthiness or likelihood of filing an insurance claim.

Various AI applications, such as computer vision, natural language processing, facial recognition, text-to-speech, speech-to-text, knowledge engines, emotion recognition, and other types of systems, often make use of machine learning capabilities. Some combine two or more of the main types of machine learning, and in some cases, are said to be “semi-supervised” because they incorporate some of the techniques of supervised learning and some of the techniques of unsupervised learning. And some machine learning techniques — such as deep learning — can be supervised, unsupervised, or both.

The phrase “deep learning” first came into use in the 1980s, making it a much newer idea than either machine learning or artificial intelligence.

Deep learning describes a particular type of architecture that both supervised and unsupervised machine learning systems sometimes use. Specifically, it is a layered architecture where one layer takes an input and generates an output. It then passes that output on to the next layer in the architecture, which uses it to create another output. That output can then become the input for the next layer in the system, and so on. The architecture is said to be “deep” because it has many layers.

To create these layered systems, many researchers have designed computing systems modeled after the human brain. In broad terms, they call these deep learning systems artificial neural networks (ANNs). ANNs come in several different varieties, including deep neural networks, convolutional neural networks, recurrent neural networks and others. These neural networks use nodes that are similar to the neurons in a human brain.

However, those GPUs also excel at the type of calculations necessary for deep learning. As GPU performance has improved and costs have decreased, people have been able to create high-performance systems that can complete deep learning tasks in much less time and for much less cost than would have been the case in the past.

Today, anyone can easily access deep learning capabilities through cloud services like Amazon Web Services, Microsoft Azure, Google Cloud and IBM Cloud.

If you are interested in learning more about AI vs machine learning vs deep learning, Datamation has several resources that can help, including the following:

Deep Learning For Image Super

This article was published as a part of the Data Science Blogathon

Introduction

(SR) is the process of recovering high-resolution (HR) images from low-resolution (LR) images. It is an important class of image processing techniques in computer vision and image processing and enjoys a wide range of real-world applications, such as medical imaging, satellite imaging, surveillance and security, astronomical imaging, amongst others.

Problem

Image sup -resolution (SR) problem, particularly single image super-resolution (SISR), has gained a lot of attention in the research community. SISR aims to reconstruct a high-resolution image ISR from a single low-resolution image ILR. Generally, the relationship between ILR and the original high-resolution image IHR can vary depending on the situation. Many studies assume that ILR is a bicubic downsampled version of IHR, but other degrading factors such as blur, decimation, or noise can also be considered for practical applications.

In this article, we would be focusing on supervised learning methods for super-resolution tasks. By using HR images as target and LR images as input, we can treat this problem as a supervised learning problem.

Exhaustive table of topics in Supervised Image Super-Resolution

Upsampling Methods

Before understanding the rest of the theory behind the super-resolution, we need to understand upsampling (Increasing the spatial resolution of images or simply increasing the number of pixel rows/columns or both in the image) and its various methods.

1. Interpolation-based methods – Image interpolation (image scaling), refers to resizing digital images and is widely used by image-related applications. The traditional methods include nearest-neighbor interpolation, linear, bilinear, bicubic interpolation, etc.

Nearest-neighbor interpolation with the scale of 2

Nearest-neighbor Interpolation – The nearest-neighbor interpolation is a simple and intuitive algorithm. It selects the value of the nearest pixel for each position to be interpolated regardless of any other pixels.

Bilinear Interpolation – The bilinear interpolation (BLI) first performs linear interpolation on one axis of the image and then performs on the other axis. Since it results in a quadratic interpolation with a receptive field-sized 2 × 2, it shows much better performance than nearest-neighbor interpolation while keeping a relatively fast speed.

Bicubic Interpolation – Similarly, the bicubic interpolation (BCI) performs cubic interpolation on each of the two axes Compared to BLI, the BCI takes 4 × 4 pixels into account, and results in smoother results with fewer artifacts but much lower speed. Refer to this for a detailed discussion.

Shortcomings – Interpolation-based methods often introduce some side effects such as computational complexity, noise amplification, blurring results, etc.

2. Learning-based upsampling – To overcome the shortcomings of interpolation-based methods and learn upsampling in an end-to-end manner, transposed convolution layer and sub-pixel layer are introduced into the SR field.

and the green boxes indicate the kernel and the convolution output.

Transposed convolution: layer, a.k.a. deconvolution layer, tries to perform transformation opposite a normal convolution, i.e., predicting the possible input based on feature maps sized like convolution output. Specifically, it increases the image resolution by expanding the image by inserting zeros and performing convolution.

Sub-pixel layer – The blue boxes denote the input and the boxes with other colors indicate different convolution operations and different output feature maps.

s2 times channels, where s is the scaling factor. Assuming the input size is h × w × c, the output size will be h×w×s2c. After that, the reshaping operation is performed to produce outputs with size sh × sw × c

Super-resolution Frameworks

Since image super-resolution is an ill-posed problem, how to perform upsampling (i.e., generating HR output from LR input) is the key problem. There are mainly four model frameworks based on the employed upsampling operations and their locations in the model (refer to the table above).

1. Pre-upsampling Super-resolution –

We don’t do a direct mapping of LR images to HR images since it is considered to be a difficult task. We utilize traditional upsampling algorithms to obtain higher resolution images and then refining them using deep neural networks is a straightforward solution. For example – LR images are upsampled to coarse HR images with the desired size using bicubic interpolation. Then deep CNNs are applied to these images for reconstructing high-quality images.

2. Post-upsampling Super-resolution –

To improve the computational efficiency and make full use of deep learning technology to increase resolution automatically, researchers propose to perform most computation in low-dimensional space by replacing the predefined upsampling with end-to-end learnable layers integrated at the end of the models. In the pioneer works of this framework, namely post-upsampling SR, the LR input images are fed into deep CNNs without increasing resolution, and end-to-end learnable upsampling layers are applied at the end of the network.

Learning Strategies

error and producing more realistic and higher-quality results.

Pixelwise L1 loss – Absolute difference between pixels of ground truth HR image and the generated one.

Pixelwise L2 loss – Mean squared difference between pixels of ground truth HR image and the generated one.

Content loss – the content loss is indicated as the Euclidean distance between high-level representations of the output image and the target image. High-level features are obtained by passing through pre-trained CNNs like VGG and ResNet.

Adversarial loss – Based on GAN where we treat the SR model as a generator, and define an extra discriminator to judge whether the input image is generated or not.

PSNR – Peak Signal-to-Noise Ratio (PSNR) is a commonly used objective metric to measure the reconstruction quality of a lossy transformation. PSNR is inversely proportional to the logarithm of the Mean Squared Error (MSE) between the ground truth image and the generated image.

In MSE, I is a noise-free m×n monochrome image (ground truth)  and K is the generated image (noisy approximation). In PSNR, MAXI represents the maximum possible pixel value of the image.

Network Design

Various network designs in super-resolution architecture

Enough of the basics! Let’s discuss some of the state-of-art super-resolution methods –

Super-Resolution methods

Super-Resolution Generative Adversarial Network (SRGAN) – Uses the idea of GAN for super-resolution task i.e. generator will try to produce an image from noise which will be judged by the discriminator. Both will keep training so that generator can generate images that can match the true training data.

Architecture of Generative Adversarial Network

There are various ways for super-resolution but there is a problem – how can we recover finer texture details from a low-resolution image so that the image is not distorted?

The results have high PSNR means have high-quality results but they are often lacking high-frequency details.

Check the original papers for detailed information.

Steps –

1. We process the HR (high-resolution images) to get downsampled LR images. Now we have HR and LR images for the training dataset.

2. We pass LR images through a generator that upsamples and gives SR images.

3. We use the discriminator to distinguish HR image and backpropagate GAN loss to train discriminator and generator.

Network architecture of SRGAN

 

Key features of the method – 

Post upsampling type of framework

Subpixel layer for upsampling

Contains residual blocks

Uses Perceptual loss

Original code of SRGAN

conventional residual networks.

Check the original papers for detailed information.

Some of the key features of the methods – 

Residual blocks – SRGAN successfully applied the ResNet architecture to the super-resolution problem with SRResNet, they further improved the performance by employing a better ResNet structure. In the proposed architecture –

Comparison of the residual blocks

They removed the batch normalization layers from the network as in SRResNets. Since batch normalization layers normalize the features, they get rid of range flexibility from networks by normalizing the features, it is better to remove them.

The architecture of EDSR, MDSR

In MDSR, they proposed a multiscale architecture that shares most of the parameters on different scales. The proposed multiscale model uses significantly fewer parameters than multiple single-scale models but shows comparable performance.

Original code of the methods

So now we have come to the end of the blog! To learn about super-resolution, refer to these survey papers.

The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.

Related

Top 10 Essential Prerequisites For Deep Learning Projects

The deep learning projects are used across industries ranging from medical to e-commerce

Deep learning is clearly the technology of the future and is one of the most sought-after innovations of our day. You should be aware of the requirements for DL if you’re interested in learning it. You can choose a better job path with the aid of deep learning projects prerequisite.

Deep learning is an interdisciplinary area of computer science and mathematics with the goal of teaching to carry out cognitive tasks in a manner that is similar to that of humans. Prerequisites for deep learning projects are a process through which computers collect input data, and study or analyze it. Different methods are used by deep learning prerequisites systems to automatically identify patterns in datasets that may contain structured data, quantitative data, textual data, visual data, etc. We’ll talk about the top requirements for deep learning projects in this section to help you prepare for learning its more complex ideas.

1. Programming

Deep learning requires programming as a core component. Deep learning demands the use of a programming language. Python or R are the programming languages of choice for deep learning experts due to their functionality and efficiency. You must study programming and become proficient in one of these two well-known programming languages before you can study the numerous deep learning topics.

2. Statistics

The study of utilizing data and its visualization is known as statistics. It aids in extracting information from your raw data. Data science and the related sciences depend heavily on statistics. You would need to apply statistics to acquire insights from data as a deep learning specialist.

3. Calculus

The foundation of many machine learning algorithms is calculus. Therefore, studying calculus is a requirement for deep learning. You will create models using deep learning based on the features found in your data. You can use such properties and create the model as necessary with the aid of calculus.

4. Linear Algebra

Linear algebra is most likely one of the most crucial requirements for deep learning. Matrix, vector, and linear equations are all topics covered by linear algebra. It focuses on how linear equations are represented in vector spaces. You may design many models (classification, regression, etc.) with the aid of linear algebra, which is also a fundamental building block for many deep-learning ideas.

5. Probability

Mathematics’ field of probability focuses on using numerical data to express how likely or valid an occurrence is to occur. Any event’s probability can range from 0 to 1, with 0 denoting impossibility and 1 denoting complete certainty.

6. Data Science

Data analysis and use are the focus of the field of data science. You must be knowledgeable with a variety of data science principles to construct models that manage data as a deep learning specialist. Understanding deep learning will assist you in using data to achieve the desired results, but mastering data science is a prerequisite for applying deep learning.

7. Work on Projects

While mastering these topics will aid in the development of a solid foundation, you will also need to work on deep learning projects to ensure that you fully comprehend everything. You can apply what you’ve learned and identify your weak areas with the aid of projects. You can easily find a project that interests you because deep learning has applications in many different fields.

8. Neural Networks

The word “neuron,” which is used to describe a single nerve cell, is where the word “neural” originates. That’s correct; a neural network is essentially a network of neurons that carry out routine tasks for us.

A significant portion of the issues we encounter daily is related to pattern recognition, object detection, and intelligence. The reality is that these reactions are challenging to automate even if they are carried out with such simplicity that we don’t even notice it.

9. Clustering Algorithms

The clustering problem is resolved with the most straightforward unsupervised learning approach. The K-means method divides n observations into k clusters, with each observation belonging to the cluster represented by the nearest mean.

10. Regression

Update the detailed information about Implementing Audio Classification Project Using Deep Learning on the Daihoichemgio.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!