Trending February 2024 # Deep Learning For Image Super # Suggested March 2024 # Top 6 Popular

You are reading the article Deep Learning For Image Super updated in February 2024 on the website Daihoichemgio.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested March 2024 Deep Learning For Image Super

This article was published as a part of the Data Science Blogathon

Introduction

(SR) is the process of recovering high-resolution (HR) images from low-resolution (LR) images. It is an important class of image processing techniques in computer vision and image processing and enjoys a wide range of real-world applications, such as medical imaging, satellite imaging, surveillance and security, astronomical imaging, amongst others.

Problem

Image sup -resolution (SR) problem, particularly single image super-resolution (SISR), has gained a lot of attention in the research community. SISR aims to reconstruct a high-resolution image ISR from a single low-resolution image ILR. Generally, the relationship between ILR and the original high-resolution image IHR can vary depending on the situation. Many studies assume that ILR is a bicubic downsampled version of IHR, but other degrading factors such as blur, decimation, or noise can also be considered for practical applications.

In this article, we would be focusing on supervised learning methods for super-resolution tasks. By using HR images as target and LR images as input, we can treat this problem as a supervised learning problem.

Exhaustive table of topics in Supervised Image Super-Resolution

Upsampling Methods

Before understanding the rest of the theory behind the super-resolution, we need to understand upsampling (Increasing the spatial resolution of images or simply increasing the number of pixel rows/columns or both in the image) and its various methods.

1. Interpolation-based methods – Image interpolation (image scaling), refers to resizing digital images and is widely used by image-related applications. The traditional methods include nearest-neighbor interpolation, linear, bilinear, bicubic interpolation, etc.

Nearest-neighbor interpolation with the scale of 2

Nearest-neighbor Interpolation – The nearest-neighbor interpolation is a simple and intuitive algorithm. It selects the value of the nearest pixel for each position to be interpolated regardless of any other pixels.

Bilinear Interpolation – The bilinear interpolation (BLI) first performs linear interpolation on one axis of the image and then performs on the other axis. Since it results in a quadratic interpolation with a receptive field-sized 2 × 2, it shows much better performance than nearest-neighbor interpolation while keeping a relatively fast speed.

Bicubic Interpolation – Similarly, the bicubic interpolation (BCI) performs cubic interpolation on each of the two axes Compared to BLI, the BCI takes 4 × 4 pixels into account, and results in smoother results with fewer artifacts but much lower speed. Refer to this for a detailed discussion.

Shortcomings – Interpolation-based methods often introduce some side effects such as computational complexity, noise amplification, blurring results, etc.

2. Learning-based upsampling – To overcome the shortcomings of interpolation-based methods and learn upsampling in an end-to-end manner, transposed convolution layer and sub-pixel layer are introduced into the SR field.

and the green boxes indicate the kernel and the convolution output.

Transposed convolution: layer, a.k.a. deconvolution layer, tries to perform transformation opposite a normal convolution, i.e., predicting the possible input based on feature maps sized like convolution output. Specifically, it increases the image resolution by expanding the image by inserting zeros and performing convolution.

Sub-pixel layer – The blue boxes denote the input and the boxes with other colors indicate different convolution operations and different output feature maps.

s2 times channels, where s is the scaling factor. Assuming the input size is h × w × c, the output size will be h×w×s2c. After that, the reshaping operation is performed to produce outputs with size sh × sw × c

Super-resolution Frameworks

Since image super-resolution is an ill-posed problem, how to perform upsampling (i.e., generating HR output from LR input) is the key problem. There are mainly four model frameworks based on the employed upsampling operations and their locations in the model (refer to the table above).

1. Pre-upsampling Super-resolution –

We don’t do a direct mapping of LR images to HR images since it is considered to be a difficult task. We utilize traditional upsampling algorithms to obtain higher resolution images and then refining them using deep neural networks is a straightforward solution. For example – LR images are upsampled to coarse HR images with the desired size using bicubic interpolation. Then deep CNNs are applied to these images for reconstructing high-quality images.

2. Post-upsampling Super-resolution –

To improve the computational efficiency and make full use of deep learning technology to increase resolution automatically, researchers propose to perform most computation in low-dimensional space by replacing the predefined upsampling with end-to-end learnable layers integrated at the end of the models. In the pioneer works of this framework, namely post-upsampling SR, the LR input images are fed into deep CNNs without increasing resolution, and end-to-end learnable upsampling layers are applied at the end of the network.

Learning Strategies

error and producing more realistic and higher-quality results.

Pixelwise L1 loss – Absolute difference between pixels of ground truth HR image and the generated one.

Pixelwise L2 loss – Mean squared difference between pixels of ground truth HR image and the generated one.

Content loss – the content loss is indicated as the Euclidean distance between high-level representations of the output image and the target image. High-level features are obtained by passing through pre-trained CNNs like VGG and ResNet.

Adversarial loss – Based on GAN where we treat the SR model as a generator, and define an extra discriminator to judge whether the input image is generated or not.

PSNR – Peak Signal-to-Noise Ratio (PSNR) is a commonly used objective metric to measure the reconstruction quality of a lossy transformation. PSNR is inversely proportional to the logarithm of the Mean Squared Error (MSE) between the ground truth image and the generated image.

In MSE, I is a noise-free m×n monochrome image (ground truth)  and K is the generated image (noisy approximation). In PSNR, MAXI represents the maximum possible pixel value of the image.

Network Design

Various network designs in super-resolution architecture

Enough of the basics! Let’s discuss some of the state-of-art super-resolution methods –

Super-Resolution methods

Super-Resolution Generative Adversarial Network (SRGAN) – Uses the idea of GAN for super-resolution task i.e. generator will try to produce an image from noise which will be judged by the discriminator. Both will keep training so that generator can generate images that can match the true training data.

Architecture of Generative Adversarial Network

There are various ways for super-resolution but there is a problem – how can we recover finer texture details from a low-resolution image so that the image is not distorted?

The results have high PSNR means have high-quality results but they are often lacking high-frequency details.

Check the original papers for detailed information.

Steps –

1. We process the HR (high-resolution images) to get downsampled LR images. Now we have HR and LR images for the training dataset.

2. We pass LR images through a generator that upsamples and gives SR images.

3. We use the discriminator to distinguish HR image and backpropagate GAN loss to train discriminator and generator.

Network architecture of SRGAN

 

Key features of the method – 

Post upsampling type of framework

Subpixel layer for upsampling

Contains residual blocks

Uses Perceptual loss

Original code of SRGAN

conventional residual networks.

Check the original papers for detailed information.

Some of the key features of the methods – 

Residual blocks – SRGAN successfully applied the ResNet architecture to the super-resolution problem with SRResNet, they further improved the performance by employing a better ResNet structure. In the proposed architecture –

Comparison of the residual blocks

They removed the batch normalization layers from the network as in SRResNets. Since batch normalization layers normalize the features, they get rid of range flexibility from networks by normalizing the features, it is better to remove them.

The architecture of EDSR, MDSR

In MDSR, they proposed a multiscale architecture that shares most of the parameters on different scales. The proposed multiscale model uses significantly fewer parameters than multiple single-scale models but shows comparable performance.

Original code of the methods

So now we have come to the end of the blog! To learn about super-resolution, refer to these survey papers.

The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.

Related

You're reading Deep Learning For Image Super

Ai Vs. Machine Learning Vs. Deep Learning

Since before the dawn of the computer age, scientists have been captivated by the idea of creating machines that could behave like humans. But only in the last decade has technology enabled some forms of artificial intelligence (AI) to become a reality.

Interest in putting AI to work has skyrocketed, with burgeoning array of AI use cases. Many surveys have found upwards of 90 percent of enterprises are either already using AI in their operations today or plan to in the near future.

Eager to capitalize on this trend, software vendors – both established AI companies and AI startups – have rushed to bring AI capabilities to market. Among vendors selling big data analytics and data science tools, two types of artificial intelligence have become particularly popular: machine learning and deep learning.

While many solutions carry the “AI,” “machine learning,” and/or “deep learning” labels, confusion about what these terms really mean persists in the market place. The diagram below provides a visual representation of the relationships among these different technologies:

As the graphic makes clear, machine learning is a subset of artificial intelligence. In other words, all machine learning is AI, but not all AI is machine learning.

Similarly, deep learning is a subset of machine learning. And again, all deep learning is machine learning, but not all machine learning is deep learning.

Also see: Top Machine Learning Companies

AI, machine learning and deep learning are each interrelated, with deep learning nested within ML, which in turn is part of the larger discipline of AI.

Computers excel at mathematics and logical reasoning, but they struggle to master other tasks that humans can perform quite naturally.

For example, human babies learn to recognize and name objects when they are only a few months old, but until recently, machines have found it very difficult to identify items in pictures. While any toddler can easily tell a cat from a dog from a goat, computers find that task much more difficult. In fact, captcha services sometimes use exactly that type of question to make sure that a particular user is a human and not a bot.

In the 1950s, scientists began discussing ways to give machines the ability to “think” like humans. The phrase “artificial intelligence” entered the lexicon in 1956, when John McCarthy organized a conference on the topic. Those who attended called for more study of “the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.”

Critics rightly point out that there is a big difference between an AI system that can tell the difference between cats and dogs and a computer that is truly intelligent in the same way as a human being. Most researchers believe that we are years or even decades away from creating an artificial general intelligence (also called strong AI) that seems to be conscious in the same way that humans beings are — if it will ever be possible to create such a system at all.

If artificial general intelligence does one day become a reality, it seems certain that machine learning will play a major role in the system’s capabilities.

Machine learning is the particular branch of AI concerned with teaching computers to “improve themselves,” as the attendees at that first artificial intelligence conference put it. Another 1950s computer scientist named Arthur Samuel defined machine learning as “the ability to learn without being explicitly programmed.”

In traditional computer programming, a developer tells a computer exactly what to do. Given a set of inputs, the system will return a set of outputs — just as its human programmers told it to.

Machine learning is different because no one tells the machine exactly what to do. Instead, they feed the machine data and allow it to learn on its own.

In general, machine learning takes three different forms: 

Reinforcement learning is one of the oldest types of machine learning, and it is very useful in teaching a computer how to play a game.

For example, Arthur Samuel created one of the first programs that used reinforcement learning. It played checkers against human opponents and learned from its successes and mistakes. Over time, the software became much better at playing checkers.

Reinforcement learning is also useful for applications like autonomous vehicles, where the system can receive feedback about whether it has performed well or poorly and use that data to improve over time.

Supervised learning is particularly useful in classification applications such as teaching a system to tell the difference between pictures of dogs and pictures of cats.

In this case, you would feed the application a whole lot of images that had been previously tagged as either dogs or cats. From that training data, the computer would draw its own conclusions about what distinguishes the two types of animals, and it would be able to apply what it learned to new pictures.

By contrast, unsupervised learning does not rely on human beings to label training data for the system. Instead, the computer uses clustering algorithms or other mathematical techniques to find similarities among groups of data.

Unsupervised machine learning is particularly useful for the type of big data analytics that interests many enterprise leaders. For example, you could use unsupervised learning to spot similarities among groups of customers and better target your marketing or tailor your pricing.

Some recommendation engines rely on unsupervised learning to tell people who like one movie or book what other movies or books they might enjoy. Unsupervised learning can also help identify characteristics that might indicate a person’s credit worthiness or likelihood of filing an insurance claim.

Various AI applications, such as computer vision, natural language processing, facial recognition, text-to-speech, speech-to-text, knowledge engines, emotion recognition, and other types of systems, often make use of machine learning capabilities. Some combine two or more of the main types of machine learning, and in some cases, are said to be “semi-supervised” because they incorporate some of the techniques of supervised learning and some of the techniques of unsupervised learning. And some machine learning techniques — such as deep learning — can be supervised, unsupervised, or both.

The phrase “deep learning” first came into use in the 1980s, making it a much newer idea than either machine learning or artificial intelligence.

Deep learning describes a particular type of architecture that both supervised and unsupervised machine learning systems sometimes use. Specifically, it is a layered architecture where one layer takes an input and generates an output. It then passes that output on to the next layer in the architecture, which uses it to create another output. That output can then become the input for the next layer in the system, and so on. The architecture is said to be “deep” because it has many layers.

To create these layered systems, many researchers have designed computing systems modeled after the human brain. In broad terms, they call these deep learning systems artificial neural networks (ANNs). ANNs come in several different varieties, including deep neural networks, convolutional neural networks, recurrent neural networks and others. These neural networks use nodes that are similar to the neurons in a human brain.

However, those GPUs also excel at the type of calculations necessary for deep learning. As GPU performance has improved and costs have decreased, people have been able to create high-performance systems that can complete deep learning tasks in much less time and for much less cost than would have been the case in the past.

Today, anyone can easily access deep learning capabilities through cloud services like Amazon Web Services, Microsoft Azure, Google Cloud and IBM Cloud.

If you are interested in learning more about AI vs machine learning vs deep learning, Datamation has several resources that can help, including the following:

Top 10 Essential Prerequisites For Deep Learning Projects

The deep learning projects are used across industries ranging from medical to e-commerce

Deep learning is clearly the technology of the future and is one of the most sought-after innovations of our day. You should be aware of the requirements for DL if you’re interested in learning it. You can choose a better job path with the aid of deep learning projects prerequisite.

Deep learning is an interdisciplinary area of computer science and mathematics with the goal of teaching to carry out cognitive tasks in a manner that is similar to that of humans. Prerequisites for deep learning projects are a process through which computers collect input data, and study or analyze it. Different methods are used by deep learning prerequisites systems to automatically identify patterns in datasets that may contain structured data, quantitative data, textual data, visual data, etc. We’ll talk about the top requirements for deep learning projects in this section to help you prepare for learning its more complex ideas.

1. Programming

Deep learning requires programming as a core component. Deep learning demands the use of a programming language. Python or R are the programming languages of choice for deep learning experts due to their functionality and efficiency. You must study programming and become proficient in one of these two well-known programming languages before you can study the numerous deep learning topics.

2. Statistics

The study of utilizing data and its visualization is known as statistics. It aids in extracting information from your raw data. Data science and the related sciences depend heavily on statistics. You would need to apply statistics to acquire insights from data as a deep learning specialist.

3. Calculus

The foundation of many machine learning algorithms is calculus. Therefore, studying calculus is a requirement for deep learning. You will create models using deep learning based on the features found in your data. You can use such properties and create the model as necessary with the aid of calculus.

4. Linear Algebra

Linear algebra is most likely one of the most crucial requirements for deep learning. Matrix, vector, and linear equations are all topics covered by linear algebra. It focuses on how linear equations are represented in vector spaces. You may design many models (classification, regression, etc.) with the aid of linear algebra, which is also a fundamental building block for many deep-learning ideas.

5. Probability

Mathematics’ field of probability focuses on using numerical data to express how likely or valid an occurrence is to occur. Any event’s probability can range from 0 to 1, with 0 denoting impossibility and 1 denoting complete certainty.

6. Data Science

Data analysis and use are the focus of the field of data science. You must be knowledgeable with a variety of data science principles to construct models that manage data as a deep learning specialist. Understanding deep learning will assist you in using data to achieve the desired results, but mastering data science is a prerequisite for applying deep learning.

7. Work on Projects

While mastering these topics will aid in the development of a solid foundation, you will also need to work on deep learning projects to ensure that you fully comprehend everything. You can apply what you’ve learned and identify your weak areas with the aid of projects. You can easily find a project that interests you because deep learning has applications in many different fields.

8. Neural Networks

The word “neuron,” which is used to describe a single nerve cell, is where the word “neural” originates. That’s correct; a neural network is essentially a network of neurons that carry out routine tasks for us.

A significant portion of the issues we encounter daily is related to pattern recognition, object detection, and intelligence. The reality is that these reactions are challenging to automate even if they are carried out with such simplicity that we don’t even notice it.

9. Clustering Algorithms

The clustering problem is resolved with the most straightforward unsupervised learning approach. The K-means method divides n observations into k clusters, with each observation belonging to the cluster represented by the nearest mean.

10. Regression

Fake News Classification Using Deep Learning

This article was published as a part of the Data Science Blogathon.

Introduction

Here’s a quick puzzle for you. I’ll give you two titles, and you’ll have to tell me which is fake. Ready? Let’s get started:

“Wipro is planning to buy an EV-based startup.”

Well, it turns out that both of those headlines were fake news. In this article, you will learn the fake news classification using deep learning.

Image – 1

The grim reality is that there is a lot of misinformation and disinformation on the internet. Ninety per cent of Canadians have fallen for false news, according to a 2023 research done by Ipsos Public Affairs for Canada’s Centre for International Governance Innovation.

It got me thinking: is it feasible to build an algorithm that can tell whether an article’s title is fake news? Well, it appears to be the case!

In this post, we go through the exploration of the classification model with BERT and LSTMs to identify the fake new classification.

Go through this Github link to view the complete code.

Dataset for Fake News Classification

We use the dataset from Kaggle. It consists of 2095 article details that include author, title, and other information. Go through the link to get the dataset.

EDA

Let us start analyzing our data to get better insights from it. The dataset looks clean, and now we map the values to our class Real and Fake such as 0 and 1.

data = pd.read_csv('/content/news_articles.csv') data = data[['title', 'label']] data['label'] = data['label'].map({'Real': 0, 'Fake':1}) data.head()

Image by Author

Since we have 1294 samples of real news and 801 samples of fake news, there is an approximately 62:38 news ratio. It means that our dataset is relatively biased. For our project, we consider the title and class columns.

Now, we can analyze the trends present in our dataset. To get an idea of dataset size, we get the mean, min, and max character lengths of titles. We use a histogram to visualize the data.

# Character Length of Titles - Min, Mean, Max print('Mean Length', data['title'].apply(len).mean()) print('Min Length', data['title'].apply(len).min()) print('Max Length', data['title'].apply(len).max()) x = data['title'].apply(len).plot.hist()

Image by Author

We can observe that characters in each title range from 2-443. We can also see that more per cent of samples with a length of 0-100. The mean length of the dataset is around 61.

Preprocessing Data

Now we will use the NLTK library to preprocess our dataset, which includes:

Tokenization:

It is the process of dividing a text into smaller units (each word will be an index in an array)

Lemmatization:

It removes the endings of the word to the root word. It reduces the word children to a child.

Stop words Removal:

Words like the and for will be eliminated from our dataset because they take too much room.

#Import nltk preprocessing library to convert text into a readable format import nltk from nltk.tokenize import sent_tokenize from chúng tôi import WordNetLemmatizer from nltk.corpus import stopwords nltk.download('punkt') nltk.download('wordnet') nltk.download('stopwords') data['title'] = data.apply(lambda row: nltk.word_tokenize(row['title']), axis=1) #Define text lemmatization model (eg: walks will be changed to walk) lemmatizer = WordNetLemmatizer() #Loop through title dataframe and lemmatize each word def lemma(data): return [lemmatizer.lemmatize(w) for w in data] #Apply to dataframe data['title'] = data['title'].apply(lemma) #Define all stopwords in the English language (it, was, for, etc.) stop = stopwords.words('english') #Remove them from our dataframe data['title'] = data['title'].apply(lambda x: [i for i in x if i not in stop]) data.head()

Image by Author

We create two models using this data for text classification:

An LSTM model (Tensorflow’s wiki-words-250 embeddings)

A BERT model.

LSTM Model for Fake News Classification

We split our data into a 70:30 ratio of train and test.

#Split data into training and testing dataset title_train, title_test, y_train, y_test = train_test_split(titles, labels, test_size=0.3, random_state=1000)

To get predictions based on the text from our model, we need to encode it in vector format then it is processed by the machine.

Word2Vec Skip-Gram architecture had used by TensorFlow’s wiki-words-250. Based on the input, Skip-gram had trained by predicting the context.

Consider this sentence as an example:

I am going on a voyage in my car.

The word voyage passed as input and one as the window size. The window size means before and after the target word to predict. In our case, the words are gone and car (excluding stopwords, and go is the lemmatized form of going).

We one-hot-encode our word, resulting in an input vector of size 1 x V, where V is the vocabulary size. A weight matrix of V rows (one for each word in our vocabulary) and E columns, where E is a hyperparameter indicating the size of each embedding, will be multiplied by the representation. Except for one, all values in the input vector are zero because it is one-hot encoded (representing the word we are inputting). Finally, when the weight matrix had multiplied by the output, a 1xE vector denotes the embedding for that word.

The output layer, which consists of a softmax regression classifier, will receive the 1xE vector. It had built of V neurons (which correspond to the vocabulary’s one-hot encoding) that produce a value between 0 and 1 for each word, indicating the likelihood of that word being in the window size.

Word embeddings with a size E of 250 are present in Tensorflow’s wiki-words-250. Embeddings applied to the model by looping through all of the words and computing the embedding for each one. We’ll need to utilize the pad sequences function to adjust for samples of variable lengths.

#Convert each series of words to a word2vec embedding indiv = [] for i in title_train: temp = np.array(embed(i)) indiv.append(temp) #Accounts for different length of words indiv = tf.keras.preprocessing.sequence.pad_sequences(indiv,dtype=’float’) indiv.shape

Therefore, there are 1466 samples in the training data, the highest length is 46 words, and each word has 250 features.

Now, we build our model. It consists of:

1 LSTM layer with 50 units

2 Dense layers (first 20 neurons, the second 5) with an activation function ReLU.

1 Dense output layer with activation function sigmoid.

We will use the Adam optimizer, a binary cross-entropy loss, and a performance metric of accuracy. The model will be trained over 10 epochs. Feel free to further adjust these hyperparameters.

#Sequential model has a 50 cell LSTM layer before Dense layers model = tf.keras.models.Sequential() model.add(tf.keras.layers.LSTM(50)) model.add(tf.keras.layers.Dense(20,activation='relu')) model.add(tf.keras.layers.Dense(5,activation='relu')) model.add(tf.keras.layers.Dense(1,activation='sigmoid')) #Compile model with binary_crossentropy loss, Adam optimizer, and accuracy metrics loss="binary_crossentropy", metrics=['accuracy']) #Train model on 10 epochs model.fit(indiv, y_train,validation_data=[test,y_test], epochs=20)

We get an accuracy of 59.4% on test data.

Using BERT for Fake News Classification

What would you reply if I asked you to name the English term with the most definitions?

That word is “set,” according to the Oxford English Dictionary’s Second Edition.

If you think about it, we could make a lot of different statements using that term in various settings. Consider the following scenario:

I set the table for lunch

The problem with Word2Vec is that no matter how the word had used, it generates the same embedding. We use BERT, which can build contextualized embeddings, to combat this.

BERT is known as “Bidirectional Encoder Representations from Transformers.” It employs a transformer model to generate contextualized embeddings by utilizing attention mechanisms.

An encoder-decoder design had used in a transformer model. The encoder layer creates a continuous representation based on the data it has learned from the input. The preceding input is delivered into the model by the decoder layer, which generates an output. Because BERT’s purpose is to build a vector representation from the text, it only employs an encoder.

Pre-Training & Fine-Tuning

BERT had trained using two ways. The first method is known to be veiled language modelling. Before transmitting sequences, a [MASK] token had used to replace 15% of the words. Using the context supplied by the unmasked words, the model will predict the masked words.

It is accomplished by

Using embedding matrix to apply a classification layer to the encoder output. As a result, it will be the same size as the vocabulary.

Using the softmax function to calculate the likelihood of the word.

The second strategy is to guess the upcoming sentence. The model will be given two sentences as input and predict whether the second sentence will come after the first. While training, half of the inputs are pairs, while the other half consists of random sentences from the corpus. To distinguish between the two statements,

Here, it adds a [CLS] token at the start of the first sentence and a [SEP] token at the end of each.

Each token (word) contains a positional embedding that allows information extracted from the text’s location. Because there is no repetition in a transformer model, there is no inherent comprehension of the word’s place.

Each token is given a sentence embedding (further differentiating between the sentences).

For Next Sentence Prediction, the output of the [CLS] embedding, which stands for “aggregate sequence representation for sentence classification,” is passed through a classification layer with softmax to return the probability of the two sentences being sequential.

Image by Author

Implementation of BERT

The BERT preprocessor and encoder from Tensorflow-hub had used. Do not run the content via the earlier-mentioned framework (which removes capitalization, applies lemmatization, etc.) The BERT preprocessor had used to abstract this.

We split our data for training and testing in the ratio of 80:20.

from sklearn.model_selection import train_test_split #Split data into training and testing dataset title_train, title_test, y_train, y_test = train_test_split(titles, labels, test_size=0.2, random_state=1000)

Now, load Bert preprocessor and encoder

# Use the bert preprocesser and bert encoder from tensorflow_hub

We can now work on our neural network. It must be a functional model, with each layer’s output serving as an argument to the next.

1 Input layer: Used to pass sentences into the model.

The bert_preprocess layer: Preprocess the input text.

The bert_encoder layer: Pass the preprocessed tokens into the BERT encoder.

1 Dropout layer with 0.2. The BERT encoder pooled_output is passed into it.

2 Dense layers with 10 and 1 neurons. The first uses a ReLU activation function, and the second is sigmoid.

import tensorflow as tf # Input Layers input_layer = tf.keras.layers.Input(shape=(), dtype=tf.string, name='news') # BERT layers processed = bert_preprocess(input_layer) output = bert_encoder(processed) # Fully Connected Layers layer = tf.keras.layers.Dropout(0.2, name='dropout')(output['pooled_output']) layer = tf.keras.layers.Dense(10,activation='relu', name='hidden')(layer) layer = tf.keras.layers.Dense(1,activation='sigmoid', name='output')(layer) model = tf.keras.Model(inputs=[input_layer],outputs=[layer])

The “pooled output” will be transmitted into the dropout layer, as you can see. This value represents the text’s overall sequence representation. It is, as previously said, the representation of the [CLS] token outputs.

The Adam optimizer, a binary cross-entropy loss, and an accuracy performance metric had used. For five epochs, the model had trained. Feel free to tweak these hyperparameters even more.

#Compile model on adam optimizer, binary_crossentropy loss, and accuracy metrics #Train model on 5 epochs model.fit(title_train, y_train, epochs= 5) #Evaluate model on test data model.evaluate(title_test,y_test)

Image by Author

Above, you can see that our model achieved an accuracy of 61.33%.

Conclusion

To improve the model performance:

Train the models on a large dataset.

Tweak hyperparameters of the model.

I hope you had found this post insightful and a better understanding of NLP techniques for fake news classification.

References

Image – 1: Photo by Roman Kraft on Unsplash

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion. 

Related

Top 10 Deep Learning Projects For Engineering Students In 2023

If you are one of them wanting to start a career in deep learning, then you must read these top deep 10 learning projects

Deep learning

is a domain with diverse technologies such as tablets and computers that can learn based on programming and other data. Deep learning is emerging as a futuristic concept that can meet the requirements of people. When we take a look at the speech recognition technology and virtual assistants, they are run using

machine learning

and

deep learning technologies

. If you are one of them wanting to start a career in deep learning, then you must read this article as this article features current ideas for your upcoming deep learning project. Here is the list of the top 10 deep learning projects to know about in 2023.

Chatbots

Due to their skillful handling of a profusion of customer queries and messages without any issue, Chatbots play a significant role for industries. They are designed to lessen the customer service workload by automating the hefty part of the process. Nonetheless, chatbots execute this by utilizing their promising methods supported by technologies like machine learning, artificial intelligence, and deep learning. Therefore, creating a chatbot for your final deep learning project will be a great idea.

Forest Fire Prediction

Creating a forest fire prediction system is one of the best deep learning projects and it will be another considerable utilization of the abilities provided by deep learning. Forest fire is an uncontrolled fire in a forest causing a hefty amount of damage to not only nature but the animal habitat, and human property as well. To control the chaotic nature of forest fires and even predict them, you can create a deep learning project utilizing k-means massing to comprehend major fire hotspots and their intensity.

Digit Recognition System

This project involves developing a digit recognition system that can classify digits based on the set tenets. The project aims to create a recognition system that can classify digits ranging from 0 to 9 using a combination of shallow network and deep neural network and by implementing logistic regression. Softmax Regression or Multinomial Logistic Regression is the ideal choice for this project. Since this technique is a generalization of logistic regression, it is apt for multi-class classification, assuming that all the classes are mutually exclusive.

Image Caption Generator Project in Python

This is one of the most interesting deep learning projects. It is easy for humans to describe what is in an image but for computers, an image is just a bunch of numbers that represent the color value of each pixel. This project utilizes deep learning methods where you implement a convolutional neural network (CNN) with a Recurrent Neural Network (LSTM) to build the image caption generator.

Traffic Signs Recognition

Traffic signs and rules are crucial that every driver must obey to prevent accidents. To follow the rule, one must first understand what the traffic sign looks like. In the Traffic signs recognition project, you will learn how a program can identify the type of traffic sign by taking an image as input. For a final-year engineering student, it is one of the best deep learning projects to try.

Credit Card Fraud Detection

With the increase in online transactions, credit card frauds have also increased. Banks are trying to handle this issue using deep learning techniques. In this deep learning project, you can use python to create a classification problem to detect credit card fraud by analyzing the previously available data. 

Customer Segmentation

This is one of the most popular deep learning projects every student should try. Before running any campaign companies create different groups of customers. Customer segmentation is a popular application of unsupervised learning. Using clustering, companies identify segments of customers to target the potential user base.

Movie Recommendation System

In this deep learning project, you have to utilize R to perform a movie recommendation through technologies like Machine Learning and

Artificial Intelligence

. A recommendation system sends out suggestions to users through a filtering process based on other users’ preferences and browsing history. If A and B like Home Alone and B likes Mean Girls, it can be suggested to A – they might like it too. This keeps customers engaged with the platform.

Visual tracking system

A visual tracking system is designed to track and locate moving object(s) in a given time frame via a camera. It is a handy tool that has numerous applications such as security and surveillance, medical imaging, augmented reality, traffic control, video editing and communication, and human-computer interaction.

Drowsiness detection system

Deep Learning For Computer Vision – Introduction To Convolution Neural Networks

Introduction

The power of artificial intelligence is beyond our imagination. We all know robots have already reached a testing phase in some of the powerful countries of the world. Governments, large companies are spending billions in developing this ultra-intelligence creature. The recent existence of robots have gained attention of many research houses across the world.

Does it excite you as well ? Personally for me, learning about robots & developments in AI started with a deep curiosity and excitement in me! Let’s learn about computer vision today.

The earliest research in computer vision started way back in 1950s. Since then, we have come a long way but still find ourselves far from the ultimate objective. But with neural networks and deep learning, we have become empowered like never before.

Applications of deep learning in vision have taken this technology to a different level and made sophisticated things like self-driven cars possible in near future. In this article, I will also introduce you to Convolution Neural Networks which form the crux of deep learning applications in computer vision.

Note: This article is inspired by Stanford’s Class on Visual Recognition. Understanding this article requires prior knowledge of Neural Networks. If you are new to neural networks, you can start here. Another useful resource on basics of deep learning can be found here.

You can also learn Convolutional neural Networks in a structured and comprehensive manner by enrolling in this free course: Convolutional Neural Networks (CNN) from Scratch

Table of Contents

Challenges in Computer Vision

Overview of Traditional Approaches

Review of Neural Networks Fundamentals

Introduction to Convolution Neural Networks

Case Study: Increasing power of of CNNs in IMAGENET competition

Implementing CNNs using GraphLab (Practical in Python)

1. Challenges in Computer Vision (CV)

As the name suggests, the aim of computer vision (CV) is to imitate the functionality of human eye and brain components responsible for your sense of sight.

Doing actions such as recognizing an animal, describing a view, differentiating among visible objects are really a cake-walk for humans. You’d be surprised to know that it took decades of research to discover and impart the ability of detecting an object to a computer with reasonable accuracy.

Let’s get familiar with it a bit more:

Object detection is considered to be the most basic application of computer vision. Rest of the other developments in computer vision are achieved by making small enhancements on top of this. In real life, every time we(humans) open our eyes, we unconsciously detect objects.

Since it is super-intuitive for us, we fail to appreciate the key challenges involved when we try to design systems similar to our eye. Lets start by looking at some of the key roadblocks:

Variations in Viewpoint

The same object can have different positions and angles in an image depending on the relative position of the object and the observer.

There can also be different positions. For instance look at the following images:

Though its obvious to know that these are the same object, it is not very easy to teach this aspect to a computer (robots or machines).

Difference in Illumination

Though this image is so dark, we can still recognize that it is a cat. Teaching this to a computer is another challenge.

Hidden parts of images

Here, only the face of the puppy is visible and that too partially, posing another challenge for the computer to recognize.

Background Clutter

If you observe carefully, you can find a man in this image. As simple as it looks, it’s an uphill task for a computer to learn.

These are just some of the challenges which I brought up so that you can appreciate the complexity of the tasks which your eye and brain duo does with such utter ease. Breaking up all these challenges and solving individually is still possible today in computer vision. But we’re still decades away from a system which can get anywhere close to our human eye (which can do everything!).

This brilliance of our human body is the reason why researchers have been trying to break the enigma of computer vision by analyzing the visual mechanics of humans or other animals. Some of the earliest work in this direction was done by Hubel and Weisel with their famous cat experiment in 1959. Read more about it here.

This was the first study which emphasized the importance of edge detection for solving the computer vision problem. They were rewarded the nobel prize for their work.

Before diving into convolutional neural networks, lets take a quick overview of the traditional or rather elementary techniques used in computer vision before deep learning became popular.

2. Overview of Traditional Approaches

Various techniques, other than deep learning are available enhancing computer vision. Though, they work well for simpler problems, but as the data become huge and the task becomes complex, they are no substitute for deep CNNs. Let’s briefly discuss two simple approaches.

KNN (K-Nearest Neighbours)

Each image is matched with all images in training data. The top K with minimum distances are selected. The majority class of those top K is predicted as output class of the image.

Various distance metrics can be used like L1 distance (sum of absolute distance), L2 distance (sum of squares), etc.

Drawbacks:

Here the same dog is on right side in first image and left side in second. Though its the same image, KNN would give highly non-zero distance for the 2 images.

Similar to above, other challenges mentioned in section 1 will be faced by KNN.

Linear Classifiers

They use a parametric approach where each pixel value is considered as a parameter.

It’s like a weighted sum of the pixel values with the dimension of the weights matrix depending on the number of outcomes.

Intuitively, we can understand this in terms of a template. The weighted sum of pixels forms a template image which is matched with every image. This will also face difficulty in overcoming the challenges discussed in section 1 as single template is difficult to design for all the different cases.

I hope this gives some intuition into the challenges faced by approaches other than deep learning. Please note that more sophisticated techniques can be used than the ones discussed above but they would rarely beat a deep learning model.

3. Review of Neural Networks Fundamentals

Let’s discuss some properties of a neural networks. I will skip the basics of neural networks here as I have already covered that in my previous article – Fundamentals of Deep Learning – Starting with Neural Networks.

Once your fundamentals are sorted, let’s learn in detail some important concepts such as activation functions, data preprocessing, initializing weights and dropouts.

Activation Functions

There are various activation functions which can be used and this is an active area of research. Let’s discuss some of the popular options:

Sigmoid Function

Sigmoid activation, also used in logistic regression regression, squashes the input space from (-inf,inf) to (0,1)

But it has various problems and it is almost never used in CNNs:

Saturated neurons kill the gradient

If you observe the above graph carefully, if the input is beyond -5 or 5, the output will be very close to 0 and 1 respectively. Also, in this region the gradients are almost zero. Notice that the tangents in this region will be almost parallel to x-axis thus ~0 slope.

As we know that gradients get multiplied in back-propogation, so this small gradient will virtually stop back-propogation into further layers, thus killing the gradient.

Outputs are not zero-centered

As you can see that all the outputs are between 0 and 1. As these become inputs to the next layer, all the gradients of the next layer will be either positive or negative. So the path to optimum will be zig-zag. I will skip the mathematics here. Please refer the stanford class referred above for details.

Taking the exp() is computationally expensive

Though not a big drawback, it has a slight negative impact

tanh activation

It is always preferred over sigmoid because it solved problem #2, i.e. the outputs are in range (-1,1).

But it will still result in killing the gradient and thus not recommended choice.

 ReLU (Rectified Linear Unit)

Gradient won’t saturate in the positive region

Computationally very efficient as simple thresholding is required

Empirically found to converge faster than sigmoid or tanh.

Output is not zero-centered and always positive

Gradient is killed for x<0. Few techniques like leaky ReLU and parametric ReLU are used to overcome this and I encourage you to find these

Gradient is not defined at x=0. But this can be easily catered using sub-gradients and posts less practical challenges as x=0 is generally a rare case

To summarize, ReLU is mostly the activation function of choice. If the caveats are kept in mind, these can be used very efficiently.

Data Preprocessing

For images, generally the following preprocessing steps are done:

Same Size Images: All images are converted to the same size and generally in square shape.

Mean Centering: For each pixel, its mean value among all images can be subtracted from each pixel. Sometimes (but rarely) mean centering along red, green and blue channels can also be done

Note that normalization is generally not done in images.

Weight Initialization

There can be various techniques for initializing weights. Lets consider a few of them:

All zeros

This is generally a bad idea because in this case all the neuron will generate the same output initially and similar gradients would flow back in back-propagation

The results are generally undesirable as network won’t train properly.

Gaussian Random Variables

The weights can be initialized with random gaussian distribution of 0 mean and small standard deviation (0.1 to 1e-5)

This works for shallow networks, i.e. ~5 hidden layers but not for deep networks

In case of deep networks, the small weights make the outputs small and as you move towards the end, the values become even smaller. Thus the gradients will also become small resulting in gradient killing at the end.

Note that you need to play with the standard deviation of the gaussian distribution which works well for your network.

Xavier Initialization

It suggests that variance of the gaussian distribution of weights for each neuron should depend on the number of inputs to the layer.

The recommended variance is square root of inputs. So the numpy code for initializing the weights of layer with n inputs is: np.random.randn(n_in, n_out)*sqrt(1/n_in)

A recent research suggested that for ReLU neurons, the recommended update is: np.random.randn(n_in, n_out)*sqrt(2/n_in). Read this blog post for more details.

One more thing must be remembered while using ReLU as activation function. It is that the weights initialization might be such that some of the neurons might not get activated because of negative input. This is something that should be checked. You might be surprised to know that 10-20% of the ReLUs might be dead at a particular time while training and even in the end.

These were just some of the concepts I discussed here. Some more concepts can be of importance like batch normalization, stochastic gradient descent, dropouts which I encourage you to read on your own.

4. Introduction to Convolution Neural Networks

Before going into the details, lets first try to get some intuition into why deep networks work better.

As we learned from the drawbacks of earlier approaches, they are unable to cater to the vast amount of variations in images. Deep CNNs work by consecutively modeling small pieces of information and combining them deeper in network.

One way to understand them is that the first layer will try to detect edges and form templates for edge detection. Then subsequent layers will try to combine them into simpler shapes and eventually into templates of different object positions, illumination, scales, etc. The final layers will match an input image with all the templates and the final prediction is like a weighted sum of all of them. So, deep CNNs are able to model complex variations and behaviour giving highly accurate predictions.

There is an interesting paper on visualization of deep features in CNNs which you can go through to get more intuition – Understanding Neural Networks Through Deep Visualization.

For the purpose of explaining CNNs and finally showing an example, I will be using the CIFAR-10 dataset for explanation here and you can download the data set from here. This dataset has 60,000 images with 10 labels and 6,000 images of each type. Each image is colored and 32×32 in size.

A CNN typically consists of 3 types of layers:

Convolution Layer

Pooling Layer

Fully Connected Layer

You might find some batch normalization layers in some old CNNs but they are not used these days. We’ll consider these one by one.

Convolution Layer

Since convolution layers form the crux of the network, I’ll consider them first. Each layer can be visualized in the form of a block or a cuboid. For instance in the case of CIFAR-10 data, the input layer would have the following form:

Here you can see, this is the original image which is 32×32 in height and width. The depth here is 3 which corresponds to the Red, Green and Blue colors, which form the basis of colored images. Now a convolution layer is formed by running a filter over it. A filter is another block or cuboid of smaller height and width but same depth which is swept over this base block. Let’s consider a filter of size 5x5x3.

We start this filter from the top left corner and sweep it till the bottom left corner. This filter is nothing but a set of eights, i.e. 5x5x3=75 + 1 bias = 76 weights. At each position, the weighted sum of the pixels is calculated as WTX + b and a new value is obtained. A single filter will result in a volume of size 28x28x1 as shown above.

Note that multiple filters are generally run at each step. Therefore, if 10 filters are used, the output would look like:

Here the filter weights are parameters which are learned during the back-propagation step. You might have noticed that we got a 28×28 block as output when the input was 32×32. Why so? Let’s look at a simpler case.

Suppose the initial image had size 6x6xd and the filter has size 3x3xd. Here I’ve kept the depth as d because it can be anything and it’s immaterial as it remains the same in both. Since depth is same, we can have a look at the front view of how filter would work:

Here we can see that the result would be 4x4x1 volume block. Notice there is a single output for entire depth of the each location of filter. But you need not do this visualization all the time. Let’s define a generic case where image has dimension NxNxd and filter has FxFxd. Also, lets define another term stride (S) here which is the number of cells (in above matrix) to move in each step. In the above case, we had a stride of 1 but it can be a higher value as well. So the size of the output will be:

output size = (N – F)/S + 1

You can validate the first case where N=32, F=5, S=1. The output had 28 pixels which is what we get from this formula as well. Please note that some S values might result in non-integer result and we generally don’t use such values.

Let’s consider an example to consolidate our understanding. Starting with the same image as before of size 32×32, we need to apply 2 filters consecutively, first 10 filters of size 7, stride 1 and next 6 filters of size 5, stride 2. Before looking at the solution below, just think about 2 things:

What should be the depth of each filter?

What will the resulting size of the images in each step.

Here is the answer:

Notice here that the size of the images is getting shrunk consecutively. This will be undesirable in case of deep networks where the size would become very small too early. Also, it would restrict the use of large size filters as they would result in faster size reduction.

To prevent this, we generally use a stride of 1 along with zero-padding of size (F-1)/2. Zero-padding is nothing but adding additional zero-value pixels towards the border of the image.

Consider the example we saw above with 6×6 image and 3×3 filter. The required padding is (3-1)/2=1. We can visualize the padding as:

Here you can see that the image now becomes 8×8 because of padding of 1 on each side. So now the output will be of size 6×6 same as the original image.

Now let’s summarize a convolution layer as following:

Input size: W1 x H1 x D1

Hyper-parameters:

K: #filters

F: filter size (FxF)

S: stride

P: amount of padding

Output size: W2 x H2 x D2

W21

H21

D2

#parameters = (F.F.D).K + K

F.F.D : Number of parameters for each filter (analogous to volume of the cuboid)

(F.F.D).K : Volume of each filter multiplied by the number of filters

+K: adding K parameters for the bias term

Some additional points to be taken into consideration:

K should be set as powers of 2 for computational efficiency

F is generally taken as odd number

F=1 might sometimes be used and it makes sense because there is a depth component involved

Filters might be called kernels sometimes

Having understood the convolution layer, lets move on to pooling layer.

Pooling Layer

When we use padding in convolution layer, the image size remains same. So, pooling layers are used to reduce the size of image. They work by sampling in each layer using filters. Consider the following 4×4 layer. So if we use a 2×2 filter with stride 2 and max-pooling, we get the following response:

Here you can see that 4 2×2 matrix are combined into 1 and their maximum value is taken. Generally, max-pooling is used but other options like average pooling can be considered.

Fully Connected Layer

At the end of convolution and pooling layers, networks generally use fully-connected layers in which each pixel is considered as a separate neuron just like a regular neural network. The last fully-connected layer will contain as many neurons as the number of classes to be predicted. For instance, in CIFAR-10 case, the last fully-connected layer will have 10 neurons.

5. Case Study: AlexNet

I recommend reading the prior section multiple times and getting a hang of the concepts before moving forward.

In this section, I will discuss the AlexNet architecture in detail. To give you some background, AlexNet is the winning solution of IMAGENET Challenge 2012. This is one of the most reputed computer vision challenge and 2012 was the first time that a deep learning network was used for solving this problem.

Also, this resulted in a significantly better result as compared to previous solutions. I will share the network architecture here and review all the concepts learned above.

The detailed solution has been explained in this paper. I will explain the overall architecture of the network here. The AlexNet consists of a 11 layer CNN with the following architecture:

Here you can see 11 layers between input and output. Lets discuss each one of them individually. Note that the output of each layer will be the input of next layer. So you should keep that in mind.

Layer 0: Input image

Size: 227 x 227 x 3

Note that in the paper referenced above, the network diagram has 224x224x3 printed which appears to be a typo.

Layer 1: Convolution with 96 filters, size 11×11, stride 4, padding 0

Size: 55 x 55 x 96

(227-11)/4 + 1 = 55 is the size of the outcome

96 depth because 1 set denotes 1 filter and there are 96 filters

Layer 2: Max-Pooling with 3×3 filter, stride 2

Size: 27 x 27 x 96

(55 – 3)/2 + 1 = 27 is size of outcome

depth is same as before, i.e. 96 because pooling is done independently on each layer

Layer 3: Convolution with 256 filters, size 5×5, stride 1, padding 2

Size: 27 x 27 x 256

Because of padding of (5-1)/2=2, the original size is restored

256 depth because of 256 filters

Layer 4: Max-Pooling with 3×3 filter, stride 2

Size: 13 x 13 x 256

(27 – 3)/2 + 1 = 13 is size of outcome

Depth is same as before, i.e. 256 because pooling is done independently on each layer

Layer 5: Convolution with 384 filters, size 3×3, stride 1, padding 1

Size: 13 x 13 x 384

Because of padding of (3-1)/2=1, the original size is restored

384 depth because of 384 filters

Layer 6: Convolution with 384 filters, size 3×3, stride 1, padding 1

Size: 13 x 13 x 384

Because of padding of (3-1)/2=1, the original size is restored

384 depth because of 384 filters

Layer 7: Convolution with 256 filters, size 3×3, stride 1, padding 1

Size: 13 x 13 x 256

Because of padding of (3-1)/2=1, the original size is restored

256 depth because of 256 filters

Layer 8: Max-Pooling with 3×3 filter, stride 2

Size: 6 x 6 x 256

(13 – 3)/2 + 1 = 6 is size of outcome

Depth is same as before, i.e. 256 because pooling is done independently on each layer

Layer 9: Fully Connected with 4096 neuron

In this later, each of the 6x6x256=9216 pixels are fed into each of the 4096 neurons and weights determined by back-propagation.

Layer 10: Fully Connected with 4096 neuron

Similar to layer #9

Layer 11: Fully Connected with 1000 neurons

This is the last layer and has 1000 neurons because IMAGENET data has 1000 classes to be predicted.

I understand this is a complicated structure but once you understand the layers, it’ll give you a much better understanding of the architecture. Note that you fill find a different representation of the structure if you look at the AlexNet paper. This is because at that GPUs were not very powerful and they used 2 GPUs for training the network. So the work processing was divided between the two.

ZFNet: winner of 2013 challenge

GoogleNet: winner of 2014 challenge

VGGNet: a good solution from 2014 challenge

ResNet: winner of 2024 challenge designed by Microsoft Research Team

This video gives a brief overview and comparison of these solutions towards the end.

6. Implementing CNNs using GraphLab

Having understood the theoretical concepts, lets move on to the fun part (practical) and make a basic CNN on the CIFAR-10 dataset which we’ve downloaded before.

I’ll be using GraphLab for the purpose of running algorithms. Instead of GraphLab, you are free to use alternatives tools such as Torch, Theano, Keras, Caffe, TensorFlow, etc. But GraphLab allows a quick and dirty implementation as it takes care of the weights initializations and network architecture on its own.

We’ll work on the CIFAR-10 dataset which you can download from here. The first step is to load the data. This data is packed in a specific format which can be loaded using the following code:

import pandas as pd import numpy as np import cPickle #Define a function to load each batch as dictionary: def unpickle(file): fo = open(file, 'rb') dict = cPickle.load(fo) fo.close() return dict #Make dictionaries by calling the above function: batch1 = unpickle('data/data_batch_1') batch2 = unpickle('data/data_batch_2') batch3 = unpickle('data/data_batch_3') batch4 = unpickle('data/data_batch_4') batch5 = unpickle('data/data_batch_5') batch_test = unpickle('data/test_batch') #Define a function to convert this dictionary into dataframe with image pixel array and labels: def get_dataframe(batch): df = pd.DataFrame(batch['data']) df['image'] = df.as_matrix().tolist() df.drop(range(3072),axis=1,inplace=True) df['label'] = batch['labels'] return df #Define train and test files: train = pd.concat([get_dataframe(batch1),get_dataframe(batch2),get_dataframe(batch3),get_dataframe(batch4),get_dataframe(batch5)],ignore_index=True) test = get_dataframe(batch_test)

We can verify this data by looking at the head and shape of data as follow:

print train.head()

print train.shape, test.shape

Since we’ll be using graphlab, the next step is to convert this into a graphlab SFrame and run neural network. Let’s convert the data first:

import graphlab as gl gltrain = gl.SFrame(train) gltest = gl.SFrame(test) model = gl.neuralnet_classifier.create(gltrain, target='label', validation_set=None)

Here it used a simple fully connected network with 2 hidden layers and 10 neurons each. Let’s evaluate this model on test data.

model.evaluate(gltest)

As you can see that we have a pretty low accuracy of ~15%. This is because it is a very fundamental network. Lets try to make a CNN now. But if we go about training a deep CNN from scratch, we will face the following challenges:

The available data is very less to capture all the required features

Training deep CNNs generally requires a GPU as a CPU is not powerful enough to perform the required calculations. Thus we won’t be able to run it on our system. We can probably rent an Amazom AWS instance.

To overcome these challenges, we can use pre-trained networks. These are nothing but networks like AlexNet which are pre-trained on many images and the weights for deep layers have been determined. The only challenge is to find a pre-trianed network which has been trained on images similar to the one we want to train. If the pre-trained network is not made on images of similar domain, then the features will not exactly make sense and classifier will not be of higher accuracy.

Before proceeding further, we need to convert these images into the size used in ImageNet which we’re using for classification. The GraphLab model is based on 256×256 size images. So we need to convert our images to that size. Lets do it using the following code:

#Convert pixels to graphlab image format gltrain['glimage'] = gl.SArray(gltrain['image']).pixel_array_to_image(32, 32, 3, allow_rounding = True) gltest['glimage'] = gl.SArray(gltest['image']).pixel_array_to_image(32, 32, 3, allow_rounding = True) #Remove the original column gltrain.remove_column('image') gltest.remove_column('image') gltrain.head()

Here we can see that a new column of type graphlab image has been created but the images are in 32×32 size. So we convert them to 256×256 using following code:

#Convert into 256x256 size gltrain['image'] = gl.image_analysis.resize(gltrain['glimage'], 256, 256, 3) gltest['image'] = gl.image_analysis.resize(gltest['glimage'], 256, 256, 3) #Remove old column: gltrain.remove_column('glimage') gltest.remove_column('glimage') gltrain.head()

Now we can see that the image has been converted into the desired size. Next, we will load the ImageNet pre-trained model in graphlab and use the features created in its last layer into a simple classifier and make predictions.

Lets start by loading the pre-trained model.

#Load the pre-trained model:

Now we have to use this model and extract features which will be passed into a classifier. Note that the following operations may take a lot of computing time. I use a Macbook Pro 15″ and I had to leave it for whole night!

gltrain['features'] = pretrained_model.extract_features(gltrain) gltest['features'] = pretrained_model.extract_features(gltest)

Lets have a look at the data to make sure we have the features:

gltrain.head()

Though, we have the features with us, notice here that lot of them are zeros. You can understand this as a result of smaller data set. ImageNet was created on 1.2Mn images. So there would be many features in those images that don’t make sense for this data, thus resulting in zero outcome.

simple_classifier = graphlab.classifier.create(gltrain, features = ['features'], target = 'label')

The various outputs are:

The final model selection is based on a validation set with 5% of the data. The results are:

So we can see that Boosted Trees Classifier has been chosen as the final model. Let’s look at the results on test data:

simple_classifier.evaluate(gltest)

So we can see that the test accuracy is now ~50%. It’s a decent jump from 15% to 50% but there is still huge potential to do better. The idea here was to get you started and I will skip the next steps. Here are some things which you can try:

Remove the redundant features in the data

Perform hyper-parameter tuning in models

Search for pre-trained models which are trained on images similar to this dataset

Projects

Now, its time to take the plunge and actually play with some other real datasets. So are you ready to take on the challenge? Accelerate your deep learning journey with the following Practice Problems:

End Notes

In this article, we covered the basics of computer vision using deep Convolution Neural Networks (CNNs). We started by appreciating the challenges involved in designing artificial systems which mimic the eye. Then, we looked at some of the traditional techniques, prior to deep learning, and got some intuition into their drawbacks.

We moved on to understanding the some aspects of tuning a neural networks such as activation functions, weights initialization and data-preprocessing. Next, we got some intuition into why deep CNNs should work better than traditional approaches and we understood the different elements present in a general deep CNN.

Subsequently, we consolidated our understanding by analyzing the architecture of AlexNet, the winning solution of ImageNet 2012 challenge. Finally, we took the CIFAR-10 data and implemented a CNN on it using a pre-trained AlexNet deep network.

You can test your skills and knowledge. Check out Live Competitions and compete with best Data Scientists from all over the world.

Related

Update the detailed information about Deep Learning For Image Super on the Daihoichemgio.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!