You are reading the article Fake News Classification Using Deep Learning updated in December 2023 on the website Daihoichemgio.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested January 2024 Fake News Classification Using Deep Learning
This article was published as a part of the Data Science Blogathon.
IntroductionHere’s a quick puzzle for you. I’ll give you two titles, and you’ll have to tell me which is fake. Ready? Let’s get started:
“Wipro is planning to buy an EV-based startup.”
Well, it turns out that both of those headlines were fake news. In this article, you will learn the fake news classification using deep learning.
Image – 1
The grim reality is that there is a lot of misinformation and disinformation on the internet. Ninety per cent of Canadians have fallen for false news, according to a 2023 research done by Ipsos Public Affairs for Canada’s Centre for International Governance Innovation.
It got me thinking: is it feasible to build an algorithm that can tell whether an article’s title is fake news? Well, it appears to be the case!
In this post, we go through the exploration of the classification model with BERT and LSTMs to identify the fake new classification.
Go through this Github link to view the complete code.
Dataset for Fake News ClassificationWe use the dataset from Kaggle. It consists of 2095 article details that include author, title, and other information. Go through the link to get the dataset.
EDALet us start analyzing our data to get better insights from it. The dataset looks clean, and now we map the values to our class Real and Fake such as 0 and 1.
data = pd.read_csv('/content/news_articles.csv') data = data[['title', 'label']] data['label'] = data['label'].map({'Real': 0, 'Fake':1}) data.head()Image by Author
Since we have 1294 samples of real news and 801 samples of fake news, there is an approximately 62:38 news ratio. It means that our dataset is relatively biased. For our project, we consider the title and class columns.
Now, we can analyze the trends present in our dataset. To get an idea of dataset size, we get the mean, min, and max character lengths of titles. We use a histogram to visualize the data.
# Character Length of Titles - Min, Mean, Max print('Mean Length', data['title'].apply(len).mean()) print('Min Length', data['title'].apply(len).min()) print('Max Length', data['title'].apply(len).max()) x = data['title'].apply(len).plot.hist()Image by Author
We can observe that characters in each title range from 2-443. We can also see that more per cent of samples with a length of 0-100. The mean length of the dataset is around 61.
Preprocessing DataNow we will use the NLTK library to preprocess our dataset, which includes:
Tokenization:
It is the process of dividing a text into smaller units (each word will be an index in an array)
Lemmatization:
It removes the endings of the word to the root word. It reduces the word children to a child.
Stop words Removal:
Words like the and for will be eliminated from our dataset because they take too much room.
#Import nltk preprocessing library to convert text into a readable format import nltk from nltk.tokenize import sent_tokenize from chúng tôi import WordNetLemmatizer from nltk.corpus import stopwords nltk.download('punkt') nltk.download('wordnet') nltk.download('stopwords') data['title'] = data.apply(lambda row: nltk.word_tokenize(row['title']), axis=1) #Define text lemmatization model (eg: walks will be changed to walk) lemmatizer = WordNetLemmatizer() #Loop through title dataframe and lemmatize each word def lemma(data): return [lemmatizer.lemmatize(w) for w in data] #Apply to dataframe data['title'] = data['title'].apply(lemma) #Define all stopwords in the English language (it, was, for, etc.) stop = stopwords.words('english') #Remove them from our dataframe data['title'] = data['title'].apply(lambda x: [i for i in x if i not in stop]) data.head()Image by Author
We create two models using this data for text classification:
An LSTM model (Tensorflow’s wiki-words-250 embeddings)
A BERT model.
LSTM Model for Fake News ClassificationWe split our data into a 70:30 ratio of train and test.
#Split data into training and testing dataset title_train, title_test, y_train, y_test = train_test_split(titles, labels, test_size=0.3, random_state=1000)To get predictions based on the text from our model, we need to encode it in vector format then it is processed by the machine.
Word2Vec Skip-Gram architecture had used by TensorFlow’s wiki-words-250. Based on the input, Skip-gram had trained by predicting the context.
Consider this sentence as an example:
I am going on a voyage in my car.
The word voyage passed as input and one as the window size. The window size means before and after the target word to predict. In our case, the words are gone and car (excluding stopwords, and go is the lemmatized form of going).
We one-hot-encode our word, resulting in an input vector of size 1 x V, where V is the vocabulary size. A weight matrix of V rows (one for each word in our vocabulary) and E columns, where E is a hyperparameter indicating the size of each embedding, will be multiplied by the representation. Except for one, all values in the input vector are zero because it is one-hot encoded (representing the word we are inputting). Finally, when the weight matrix had multiplied by the output, a 1xE vector denotes the embedding for that word.
The output layer, which consists of a softmax regression classifier, will receive the 1xE vector. It had built of V neurons (which correspond to the vocabulary’s one-hot encoding) that produce a value between 0 and 1 for each word, indicating the likelihood of that word being in the window size.
Word embeddings with a size E of 250 are present in Tensorflow’s wiki-words-250. Embeddings applied to the model by looping through all of the words and computing the embedding for each one. We’ll need to utilize the pad sequences function to adjust for samples of variable lengths.
#Convert each series of words to a word2vec embedding indiv = [] for i in title_train: temp = np.array(embed(i)) indiv.append(temp) #Accounts for different length of words indiv = tf.keras.preprocessing.sequence.pad_sequences(indiv,dtype=’float’) indiv.shape
Therefore, there are 1466 samples in the training data, the highest length is 46 words, and each word has 250 features.
Now, we build our model. It consists of:
1 LSTM layer with 50 units
2 Dense layers (first 20 neurons, the second 5) with an activation function ReLU.
1 Dense output layer with activation function sigmoid.
We will use the Adam optimizer, a binary cross-entropy loss, and a performance metric of accuracy. The model will be trained over 10 epochs. Feel free to further adjust these hyperparameters.
#Sequential model has a 50 cell LSTM layer before Dense layers model = tf.keras.models.Sequential() model.add(tf.keras.layers.LSTM(50)) model.add(tf.keras.layers.Dense(20,activation='relu')) model.add(tf.keras.layers.Dense(5,activation='relu')) model.add(tf.keras.layers.Dense(1,activation='sigmoid')) #Compile model with binary_crossentropy loss, Adam optimizer, and accuracy metrics loss="binary_crossentropy", metrics=['accuracy']) #Train model on 10 epochs model.fit(indiv, y_train,validation_data=[test,y_test], epochs=20)We get an accuracy of 59.4% on test data.
Using BERT for Fake News ClassificationWhat would you reply if I asked you to name the English term with the most definitions?
That word is “set,” according to the Oxford English Dictionary’s Second Edition.
If you think about it, we could make a lot of different statements using that term in various settings. Consider the following scenario:
I set the table for lunch
The problem with Word2Vec is that no matter how the word had used, it generates the same embedding. We use BERT, which can build contextualized embeddings, to combat this.
BERT is known as “Bidirectional Encoder Representations from Transformers.” It employs a transformer model to generate contextualized embeddings by utilizing attention mechanisms.
An encoder-decoder design had used in a transformer model. The encoder layer creates a continuous representation based on the data it has learned from the input. The preceding input is delivered into the model by the decoder layer, which generates an output. Because BERT’s purpose is to build a vector representation from the text, it only employs an encoder.
Pre-Training & Fine-TuningBERT had trained using two ways. The first method is known to be veiled language modelling. Before transmitting sequences, a [MASK] token had used to replace 15% of the words. Using the context supplied by the unmasked words, the model will predict the masked words.
It is accomplished by
Using embedding matrix to apply a classification layer to the encoder output. As a result, it will be the same size as the vocabulary.
Using the softmax function to calculate the likelihood of the word.
The second strategy is to guess the upcoming sentence. The model will be given two sentences as input and predict whether the second sentence will come after the first. While training, half of the inputs are pairs, while the other half consists of random sentences from the corpus. To distinguish between the two statements,
Here, it adds a [CLS] token at the start of the first sentence and a [SEP] token at the end of each.
Each token (word) contains a positional embedding that allows information extracted from the text’s location. Because there is no repetition in a transformer model, there is no inherent comprehension of the word’s place.
Each token is given a sentence embedding (further differentiating between the sentences).
For Next Sentence Prediction, the output of the [CLS] embedding, which stands for “aggregate sequence representation for sentence classification,” is passed through a classification layer with softmax to return the probability of the two sentences being sequential.
Image by Author
Implementation of BERT
The BERT preprocessor and encoder from Tensorflow-hub had used. Do not run the content via the earlier-mentioned framework (which removes capitalization, applies lemmatization, etc.) The BERT preprocessor had used to abstract this.
We split our data for training and testing in the ratio of 80:20.
from sklearn.model_selection import train_test_split #Split data into training and testing dataset title_train, title_test, y_train, y_test = train_test_split(titles, labels, test_size=0.2, random_state=1000)Now, load Bert preprocessor and encoder
# Use the bert preprocesser and bert encoder from tensorflow_hubWe can now work on our neural network. It must be a functional model, with each layer’s output serving as an argument to the next.
1 Input layer: Used to pass sentences into the model.
The bert_preprocess layer: Preprocess the input text.
The bert_encoder layer: Pass the preprocessed tokens into the BERT encoder.
1 Dropout layer with 0.2. The BERT encoder pooled_output is passed into it.
2 Dense layers with 10 and 1 neurons. The first uses a ReLU activation function, and the second is sigmoid.
import tensorflow as tf # Input Layers input_layer = tf.keras.layers.Input(shape=(), dtype=tf.string, name='news') # BERT layers processed = bert_preprocess(input_layer) output = bert_encoder(processed) # Fully Connected Layers layer = tf.keras.layers.Dropout(0.2, name='dropout')(output['pooled_output']) layer = tf.keras.layers.Dense(10,activation='relu', name='hidden')(layer) layer = tf.keras.layers.Dense(1,activation='sigmoid', name='output')(layer) model = tf.keras.Model(inputs=[input_layer],outputs=[layer])The “pooled output” will be transmitted into the dropout layer, as you can see. This value represents the text’s overall sequence representation. It is, as previously said, the representation of the [CLS] token outputs.
The Adam optimizer, a binary cross-entropy loss, and an accuracy performance metric had used. For five epochs, the model had trained. Feel free to tweak these hyperparameters even more.
#Compile model on adam optimizer, binary_crossentropy loss, and accuracy metrics #Train model on 5 epochs model.fit(title_train, y_train, epochs= 5) #Evaluate model on test data model.evaluate(title_test,y_test)Image by Author
Above, you can see that our model achieved an accuracy of 61.33%.
ConclusionTo improve the model performance:
Train the models on a large dataset.
Tweak hyperparameters of the model.
I hope you had found this post insightful and a better understanding of NLP techniques for fake news classification.
ReferencesImage – 1: Photo by Roman Kraft on Unsplash
The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.
Related
You're reading Fake News Classification Using Deep Learning
Implementing Audio Classification Project Using Deep Learning
This article was published as a part of the Data Science Blogathon.
where we will be exploring learnings for audio and sound classification using Machine learning and deep learning. It is amazing and interesting to know – how machines are capable of understanding human language, and responding in the same way. NLP(Natural Language Processing) is one of the most researched and studied topics of today’s generation, it helps to make machines capable of handling human language in the form of speech as well as text.
Image source – Created using Canva
Table of Contents
Introduction to Audio Classification
Project Overview
Dataset Overview
Hands-on Implementing Audio Classification project
EDA On Audio Data
Data Preprocessing
Building ANN for Audio Classification
Testing some unknown Audio
End Notes
Introduction to Audio ClassificationAudio Classification means categorizing certain sounds in some categories, like environmental sound classification and speech recognition. The task we perform same as in Image classification of cat and dog, Text classification of spam and ham. It is the same applied in audio classification. The only difference is the type of data where we have images, text, and now we have a certain type of audio file of a certain length.
Why Audio Classification is termed as difficult than other types of classification?
There are many techniques to classify images as we have different in-built neural networks under CNN, especially to deal with images. And it is easy to extract features from images because images already come in the form of numbers, as the formation of an image is a collection of pixels, and pixels are in the form of numbers. When we have data as text, we use the sequential encoder and decoder-based techniques to find features. But when the buzzing is about recognizing speech, it becomes difficult to compare it to text because it is based on frequency and time. So you need to extract proper pitch and frequency.
Audio classification employs in industries across different domains like voice lock features, music genre identification, Natural Language classification, Environment sound classification, and to capture and identify different types of sound. It is used in chatbots to provide chatbots with the next level of power.
Project OverviewSound classification is a growing area of research that everyone is trying to learn and implement on some kinds of projects. The project we will build in this article is simply such that a beginner can easily follow – where the problem statement to apply the deep learning techniques to classify environmental sounds, specifically focusing on identifying the urban sounds.
Given an audio sample of some category with a certain duration in .wav extension and determine whether it contains target urban sounds. It lies under the supervised machine learning category, so we have a dataset as well as a target category.
Dataset OverviewThe dataset we will use is called as Urban Sound 8k dataset. The dataset contains 8732 sound files of 10 different classes and is listed below. Our task is to extract different features from these files and classify the corresponding audio files into respective categories. You can download the dataset from the official website from here, and it is also available on Kaggle. The size of the dataset is a little bit large, so if it’s not possible to download, then you can create Kaggle Notebook and can practice it on Kaggle itself.
Air Conditioner
Car Horn
Children Playing
Dog Bark
Drilling Machine
Engine Idling
Gun Shot
Jackhammer
Siren
Street Music
Hands-On Practice of Audio Classification Project Libraries InstallationThe very important and great library that supports audio and music analysis is Librosa. Simply use the Pip command to install the library. It provides building blocks that are required to construct an information retrieval model from music. Another great library we will use is for deep learning modeling purposes is TensorFlow, and I hope everyone has already installed TensorFlow.
pip install librosa pip install tensorflow Exploratory Data Analysis of Audio dataWe have 10 different folders under the urban dataset folder. Before applying any preprocessing, we will try to understand how to load audio files and how to visualize them in form of the waveform. If you want to load the audio file and listen to it, then you can use the IPython library and directly give it an audio file path. We have taken the first audio file in the fold 1 folder that belongs to the dog bark category.
import IPython.display as ipd filepath = "../input/urbansound8k/fold1/101415-3-0-2.wav" ipd.Audio(filepath)Image Source – screenshot by Author
Now we will use Librosa to load audio data. So when we load any audio file with Librosa, it gives us 2 things. One is sample rate, and the other is a two-dimensional array. Let us load the above audio file with Librosa and plot the waveform using Librosa.
2-D Array – The first axis represents recorded samples of amplitude. And the second axis represents the number of channels. There are different types of channels – Monophonic(audio that has one channel) and stereo(audio that has two channels).
import librosa import librosa.display data, sample_rate = librosa.load(filepath) plt.figure(figsize=(12, 5)) librosa.display.waveshow(data, sr=sample_rate)Source – Author
As we read, if you try to print the sample rate, then it’s output will be 22050 because when we load the data with librosa, then it normalizes the entire data and tries to give it in a single sample rate. The same we can achieve using scipy python library also. It will also give us two pieces of information – one is sample rate, and the other is data.
from chúng tôi import wavfile as wav
wave_sample_rate, wave_audio = wav.read(filepath)
print(wave_sample_rate)
print(wave_audio)
import matplotlib.pyplot as plt
plt.figure(figsize=(12, 4))
plt.plot(wave_audio)
When you print the sample rate using scipy-it is different than librosa. Now let us visualize the wave audio data. One important thing to understand between both is- when we print the data retrieved from librosa, it can be normalized, but when we try to read an audio file using scipy, it can’t be normalized. Librosa is now getting popular for audio signal processing because of the following three reasons.
It tries to converge the signal into mono(one channel).
It can represent the audio signal between -1 to +1(in normalized form), so a regular pattern is observed.
It is also able to see the sample rate, and by default, it converts it to 22 kHz, while in the case of other libraries, we see it according to a different value.
Imbalance Dataset check
Now we know about the audio files and how to visualize them in audio format. Moving format to data exploration we will load the CSV data file provided for each audio file and check how many records we have for each class.
import pandas as pd metadata = pd.read_csv('/urbansound8k/UrbanSound8K.csv') metadata.head(10)Source – Author
The data we have is a filename and where it is present so let us explore 1st file, so it is present in fold 5 with category as a dog bark. Now use the value counts function to check records of each class.
metadata['class'].value_counts()When you see the output so data is not imbalanced, and most of the classes have an approximately equal number of records. We can also visualize the count of records in each category using a bar plot or count plot.
import seaborn as sns plt.figure(figsize=(10, 6)) sns.countplot(metadata['class']) plt.title("Count of records in each class") plt.xticks(rotation="vertical") plt.show()Image Source – Code Output
Data PreprocessingSome audios are getting recorded at a different rate-like 44KHz or 22KHz. Using librosa, it will be at 22KHz, and then, we can see the data in a normalized pattern. Now, our task is to extract some important information, and keep our data in the form of independent(Extracted features from the audio signal) and dependent features(class labels). We will use Mel Frequency Cepstral coefficients to extract independent features from audio signals.
MFCCs – The MFCC summarizes the frequency distribution across the window size. So, it is possible to analyze both the frequency and time characteristics of the sound. This audio representation will allow us to identify features for classification. So, it will try to convert audio into some kind of features based on time and frequency characteristics that will help us to do classification. To know and read more about MFCC, you can watch this video and can also read this research paper by springer.
To demonstrate how we apply MFCC in practice, first, we will apply it on a single audio file that we are already using.
mfccs = librosa.feature.mfcc(y=data, sr=sample_rate, n_mfcc=40) print(mfccs.shape) print(mfccs) def features_extractor(file): #load the file (audio) audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast') #we extract mfcc mfccs_features = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40) #in order to find out scaled feature we do mean of transpose of value mfccs_scaled_features = np.mean(mfccs_features.T,axis=0) return mfccs_scaled_features👉 Now, to extract all the features for each audio file, we have to use a loop over each row in the dataframe. We also use the TQDM python library to track the progress. Inside the loop, we’ll prepare a customized file path for each file and call the function to extract MFCC features and append features and corresponding labels in a newly formed dataframe.
#Now we ned to extract the featured from all the audio files so we use tqdm import numpy as np from tqdm import tqdm ### Now we iterate through every audio file and extract features ### using Mel-Frequency Cepstral Coefficients extracted_features=[] for index_num,row in tqdm(metadata.iterrows()): file_name = os.path.join(os.path.abspath(audio_dataset_path),'fold'+str(row["fold"])+'/',str(row["slice_file_name"])) final_class_labels=row["class"] data=features_extractor(file_name) extracted_features.append([data,final_class_labels])The loop will take a little bit of time to run because it will iterate over 8000 rows, and after that, you can observe the dataframe of extracted features as shown below.
### converting extracted_features to Pandas dataframe extracted_features_df=pd.DataFrame(extracted_features,columns=['feature','class']) extracted_features_df.head()Image source – Screenshot by Author
Train Test split
First, we split the dependent and independent features. After that, we have 10 classes, so we use label encoding(Integer label encoding) from number 1 to 10 and convert it into categories. After that, we split the data into train and test sets in an 80-20 ratio.
### Split the dataset into independent and dependent dataset X=np.array(extracted_features_df['feature'].tolist()) y=np.array(extracted_features_df['class'].tolist()) from tensorflow.keras.utils import to_categorical from sklearn.preprocessing import LabelEncoder labelencoder=LabelEncoder() y=to_categorical(labelencoder.fit_transform(y)) ### Train Test Split from sklearn.model_selection import train_test_split X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=0)Totally, we have 6985 records in the train set and 1747 samples in the test set. Let’s head over to Model creation.
Audio Classification Model CreationWe have extracted features from the audio sample and splitter in the train and test set. Now we will implement an ANN model using Keras sequential API. The number of classes is 10, which is our output shape(number of classes), and we will create ANN with 3 dense layers and architecture is explained below.
The first layer has 100 neurons. Input shape is 40 according to the number of features with activation function as Relu, and to avoid any overfitting, we’ll use the Dropout layer at a rate of 0.5.
The second layer has 200 neurons with activation function as Relu and the drop out at a rate of 0.5.
The third layer again has 100 neurons with activation as Relu and the drop out at a rate of 0.5.
from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense,Dropout,Activation,Flatten from tensorflow.keras.optimizers import Adam from sklearn import metrics ### No of classes num_labels=y.shape[1] model=Sequential() ###first layer model.add(Dense(100,input_shape=(40,))) model.add(Activation('relu')) model.add(Dropout(0.5)) ###second layer model.add(Dense(200)) model.add(Activation('relu')) model.add(Dropout(0.5)) ###third layer model.add(Dense(100)) model.add(Activation('relu')) model.add(Dropout(0.5)) ###final layer model.add(Dense(num_labels)) model.add(Activation('softmax'))👉 You can observe the model summary using the summary function.
Image Source – Screenshot by Author
Compile the Model
To compile the model we need to define loss function which is categorical cross-entropy, accuracy metrics which is accuracy score, and an optimizer which is Adam.
Train the Model
We will train the model and save the model in HDF5 format. We will train a model for 100 epochs and batch size as 32. We’ll use callback, which is a checkpoint to know how much time it took to train over data.
## Trianing my model from tensorflow.keras.callbacks import ModelCheckpoint from datetime import datetime num_epochs = 100 num_batch_size = 32 checkpointer = ModelCheckpoint(filepath='./audio_classification.hdf5', verbose=1, save_best_only=True) start = datetime.now() model.fit(X_train, y_train, batch_size=num_batch_size, epochs=num_epochs, validation_data=(X_test, y_test), callbacks=[checkpointer], verbose=1) duration = datetime.now() - start print("Training completed in time: ", duration)Check the Test Accuracy
Now we will evaluate the model on test data. we got near about 77 percent accuracy on the training dataset and 76 percent on test data.
test_accuracy=model.evaluate(X_test,y_test,verbose=0) print(test_accuracy[1])If you are using the TensorFlow version below 2.6, then you can use predict classes function to predict the corresponding class for each audio file. But, if you are using 2.6 and above, then you can use predict and argument maximum function.
#model.predict_classes(X_test) predict_x=model.predict(X_test) classes_x=np.argmax(predict_x,axis=1) print(classes_x) Testing Some Test Audio SampleNow it is time to test some random audio samples. Whenever we’ll get new audio, we have to perform three steps again to get the predicted label and class.
First, preprocess the audio file (load it using Librosa and extract MFCC features)
Predict the label to which audio belongs.
An inverse transforms the predicted label to get the respective class name to which it belongs.
filename="../input/urbansound8k/fold7/101848-9-0-0.wav" #preprocess the audio file audio, sample_rate = librosa.load(filename, res_type='kaiser_fast') mfccs_features = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40) mfccs_scaled_features = np.mean(mfccs_features.T,axis=0) #Reshape MFCC feature to 2-D array mfccs_scaled_features=mfccs_scaled_features.reshape(1,-1) #predicted_label=model.predict_classes(mfccs_scaled_features) x_predict=model.predict(mfccs_scaled_features) predicted_label=np.argmax(x_predict,axis=1) print(predicted_label) prediction_class = labelencode _transform(predicted_label) print(prediction_class)Congratulations! 👏 If you have followed the article till here and have tried to implement it along with the reading. Then, you have learned how to deal with the audio data and use MFCC to extract important features from audio samples and build a simple ANN model on top of it to classify the audio in a different class. To practice this project, I am providing you with the environmental sound classification dataset present over Kaggle and trying to implement this project for practice purposes. The learning of this article can be concluded in some bullet points as discussed below.
Audio classification is a technique to classify sounds into different categories.
We can visualize any audio in the form of a waveform.
MFCC method is used to extract important features from audio files.
Scaling the audio samples to a common scale is important before feeding data to the model to understand it better.
You can build a CNN model to classify audios. And as well as try to build a much deeper ANN than we have built.
👉 The complete Notebook implementation is available here.
👉 Connect with me on Linkedin.
👉 Check out my other articles on Analytics Vidhya and crazy-techie
Thanks for giving your time!
The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.
Related
Ai Vs. Machine Learning Vs. Deep Learning
Since before the dawn of the computer age, scientists have been captivated by the idea of creating machines that could behave like humans. But only in the last decade has technology enabled some forms of artificial intelligence (AI) to become a reality.
Interest in putting AI to work has skyrocketed, with burgeoning array of AI use cases. Many surveys have found upwards of 90 percent of enterprises are either already using AI in their operations today or plan to in the near future.
Eager to capitalize on this trend, software vendors – both established AI companies and AI startups – have rushed to bring AI capabilities to market. Among vendors selling big data analytics and data science tools, two types of artificial intelligence have become particularly popular: machine learning and deep learning.
While many solutions carry the “AI,” “machine learning,” and/or “deep learning” labels, confusion about what these terms really mean persists in the market place. The diagram below provides a visual representation of the relationships among these different technologies:
As the graphic makes clear, machine learning is a subset of artificial intelligence. In other words, all machine learning is AI, but not all AI is machine learning.
Similarly, deep learning is a subset of machine learning. And again, all deep learning is machine learning, but not all machine learning is deep learning.
Also see: Top Machine Learning Companies
AI, machine learning and deep learning are each interrelated, with deep learning nested within ML, which in turn is part of the larger discipline of AI.
Computers excel at mathematics and logical reasoning, but they struggle to master other tasks that humans can perform quite naturally.
For example, human babies learn to recognize and name objects when they are only a few months old, but until recently, machines have found it very difficult to identify items in pictures. While any toddler can easily tell a cat from a dog from a goat, computers find that task much more difficult. In fact, captcha services sometimes use exactly that type of question to make sure that a particular user is a human and not a bot.
In the 1950s, scientists began discussing ways to give machines the ability to “think” like humans. The phrase “artificial intelligence” entered the lexicon in 1956, when John McCarthy organized a conference on the topic. Those who attended called for more study of “the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.”
Critics rightly point out that there is a big difference between an AI system that can tell the difference between cats and dogs and a computer that is truly intelligent in the same way as a human being. Most researchers believe that we are years or even decades away from creating an artificial general intelligence (also called strong AI) that seems to be conscious in the same way that humans beings are — if it will ever be possible to create such a system at all.
If artificial general intelligence does one day become a reality, it seems certain that machine learning will play a major role in the system’s capabilities.
Machine learning is the particular branch of AI concerned with teaching computers to “improve themselves,” as the attendees at that first artificial intelligence conference put it. Another 1950s computer scientist named Arthur Samuel defined machine learning as “the ability to learn without being explicitly programmed.”
In traditional computer programming, a developer tells a computer exactly what to do. Given a set of inputs, the system will return a set of outputs — just as its human programmers told it to.
Machine learning is different because no one tells the machine exactly what to do. Instead, they feed the machine data and allow it to learn on its own.
In general, machine learning takes three different forms:
Reinforcement learning is one of the oldest types of machine learning, and it is very useful in teaching a computer how to play a game.
For example, Arthur Samuel created one of the first programs that used reinforcement learning. It played checkers against human opponents and learned from its successes and mistakes. Over time, the software became much better at playing checkers.
Reinforcement learning is also useful for applications like autonomous vehicles, where the system can receive feedback about whether it has performed well or poorly and use that data to improve over time.
Supervised learning is particularly useful in classification applications such as teaching a system to tell the difference between pictures of dogs and pictures of cats.
In this case, you would feed the application a whole lot of images that had been previously tagged as either dogs or cats. From that training data, the computer would draw its own conclusions about what distinguishes the two types of animals, and it would be able to apply what it learned to new pictures.
By contrast, unsupervised learning does not rely on human beings to label training data for the system. Instead, the computer uses clustering algorithms or other mathematical techniques to find similarities among groups of data.
Unsupervised machine learning is particularly useful for the type of big data analytics that interests many enterprise leaders. For example, you could use unsupervised learning to spot similarities among groups of customers and better target your marketing or tailor your pricing.
Some recommendation engines rely on unsupervised learning to tell people who like one movie or book what other movies or books they might enjoy. Unsupervised learning can also help identify characteristics that might indicate a person’s credit worthiness or likelihood of filing an insurance claim.
Various AI applications, such as computer vision, natural language processing, facial recognition, text-to-speech, speech-to-text, knowledge engines, emotion recognition, and other types of systems, often make use of machine learning capabilities. Some combine two or more of the main types of machine learning, and in some cases, are said to be “semi-supervised” because they incorporate some of the techniques of supervised learning and some of the techniques of unsupervised learning. And some machine learning techniques — such as deep learning — can be supervised, unsupervised, or both.
The phrase “deep learning” first came into use in the 1980s, making it a much newer idea than either machine learning or artificial intelligence.
Deep learning describes a particular type of architecture that both supervised and unsupervised machine learning systems sometimes use. Specifically, it is a layered architecture where one layer takes an input and generates an output. It then passes that output on to the next layer in the architecture, which uses it to create another output. That output can then become the input for the next layer in the system, and so on. The architecture is said to be “deep” because it has many layers.
To create these layered systems, many researchers have designed computing systems modeled after the human brain. In broad terms, they call these deep learning systems artificial neural networks (ANNs). ANNs come in several different varieties, including deep neural networks, convolutional neural networks, recurrent neural networks and others. These neural networks use nodes that are similar to the neurons in a human brain.
However, those GPUs also excel at the type of calculations necessary for deep learning. As GPU performance has improved and costs have decreased, people have been able to create high-performance systems that can complete deep learning tasks in much less time and for much less cost than would have been the case in the past.
Today, anyone can easily access deep learning capabilities through cloud services like Amazon Web Services, Microsoft Azure, Google Cloud and IBM Cloud.
If you are interested in learning more about AI vs machine learning vs deep learning, Datamation has several resources that can help, including the following:
Deep Learning For Image Super
This article was published as a part of the Data Science Blogathon
Introduction(SR) is the process of recovering high-resolution (HR) images from low-resolution (LR) images. It is an important class of image processing techniques in computer vision and image processing and enjoys a wide range of real-world applications, such as medical imaging, satellite imaging, surveillance and security, astronomical imaging, amongst others.
ProblemImage sup -resolution (SR) problem, particularly single image super-resolution (SISR), has gained a lot of attention in the research community. SISR aims to reconstruct a high-resolution image ISR from a single low-resolution image ILR. Generally, the relationship between ILR and the original high-resolution image IHR can vary depending on the situation. Many studies assume that ILR is a bicubic downsampled version of IHR, but other degrading factors such as blur, decimation, or noise can also be considered for practical applications.
In this article, we would be focusing on supervised learning methods for super-resolution tasks. By using HR images as target and LR images as input, we can treat this problem as a supervised learning problem.
Exhaustive table of topics in Supervised Image Super-Resolution
Upsampling MethodsBefore understanding the rest of the theory behind the super-resolution, we need to understand upsampling (Increasing the spatial resolution of images or simply increasing the number of pixel rows/columns or both in the image) and its various methods.
1. Interpolation-based methods – Image interpolation (image scaling), refers to resizing digital images and is widely used by image-related applications. The traditional methods include nearest-neighbor interpolation, linear, bilinear, bicubic interpolation, etc.
Nearest-neighbor interpolation with the scale of 2
Nearest-neighbor Interpolation – The nearest-neighbor interpolation is a simple and intuitive algorithm. It selects the value of the nearest pixel for each position to be interpolated regardless of any other pixels.
Bilinear Interpolation – The bilinear interpolation (BLI) first performs linear interpolation on one axis of the image and then performs on the other axis. Since it results in a quadratic interpolation with a receptive field-sized 2 × 2, it shows much better performance than nearest-neighbor interpolation while keeping a relatively fast speed.
Bicubic Interpolation – Similarly, the bicubic interpolation (BCI) performs cubic interpolation on each of the two axes Compared to BLI, the BCI takes 4 × 4 pixels into account, and results in smoother results with fewer artifacts but much lower speed. Refer to this for a detailed discussion.
Shortcomings – Interpolation-based methods often introduce some side effects such as computational complexity, noise amplification, blurring results, etc.
2. Learning-based upsampling – To overcome the shortcomings of interpolation-based methods and learn upsampling in an end-to-end manner, transposed convolution layer and sub-pixel layer are introduced into the SR field.
and the green boxes indicate the kernel and the convolution output.
Transposed convolution: layer, a.k.a. deconvolution layer, tries to perform transformation opposite a normal convolution, i.e., predicting the possible input based on feature maps sized like convolution output. Specifically, it increases the image resolution by expanding the image by inserting zeros and performing convolution.
Sub-pixel layer – The blue boxes denote the input and the boxes with other colors indicate different convolution operations and different output feature maps.
s2 times channels, where s is the scaling factor. Assuming the input size is h × w × c, the output size will be h×w×s2c. After that, the reshaping operation is performed to produce outputs with size sh × sw × c
Super-resolution FrameworksSince image super-resolution is an ill-posed problem, how to perform upsampling (i.e., generating HR output from LR input) is the key problem. There are mainly four model frameworks based on the employed upsampling operations and their locations in the model (refer to the table above).
1. Pre-upsampling Super-resolution –
We don’t do a direct mapping of LR images to HR images since it is considered to be a difficult task. We utilize traditional upsampling algorithms to obtain higher resolution images and then refining them using deep neural networks is a straightforward solution. For example – LR images are upsampled to coarse HR images with the desired size using bicubic interpolation. Then deep CNNs are applied to these images for reconstructing high-quality images.
2. Post-upsampling Super-resolution –
To improve the computational efficiency and make full use of deep learning technology to increase resolution automatically, researchers propose to perform most computation in low-dimensional space by replacing the predefined upsampling with end-to-end learnable layers integrated at the end of the models. In the pioneer works of this framework, namely post-upsampling SR, the LR input images are fed into deep CNNs without increasing resolution, and end-to-end learnable upsampling layers are applied at the end of the network.
Learning Strategieserror and producing more realistic and higher-quality results.
Pixelwise L1 loss – Absolute difference between pixels of ground truth HR image and the generated one.
Pixelwise L2 loss – Mean squared difference between pixels of ground truth HR image and the generated one.
Content loss – the content loss is indicated as the Euclidean distance between high-level representations of the output image and the target image. High-level features are obtained by passing through pre-trained CNNs like VGG and ResNet.
Adversarial loss – Based on GAN where we treat the SR model as a generator, and define an extra discriminator to judge whether the input image is generated or not.
PSNR – Peak Signal-to-Noise Ratio (PSNR) is a commonly used objective metric to measure the reconstruction quality of a lossy transformation. PSNR is inversely proportional to the logarithm of the Mean Squared Error (MSE) between the ground truth image and the generated image.
In MSE, I is a noise-free m×n monochrome image (ground truth) and K is the generated image (noisy approximation). In PSNR, MAXI represents the maximum possible pixel value of the image.
Network DesignVarious network designs in super-resolution architecture
Enough of the basics! Let’s discuss some of the state-of-art super-resolution methods –
Super-Resolution methodsSuper-Resolution Generative Adversarial Network (SRGAN) – Uses the idea of GAN for super-resolution task i.e. generator will try to produce an image from noise which will be judged by the discriminator. Both will keep training so that generator can generate images that can match the true training data.
Architecture of Generative Adversarial Network
There are various ways for super-resolution but there is a problem – how can we recover finer texture details from a low-resolution image so that the image is not distorted?
The results have high PSNR means have high-quality results but they are often lacking high-frequency details.
Check the original papers for detailed information.
Steps –
1. We process the HR (high-resolution images) to get downsampled LR images. Now we have HR and LR images for the training dataset.
2. We pass LR images through a generator that upsamples and gives SR images.
3. We use the discriminator to distinguish HR image and backpropagate GAN loss to train discriminator and generator.
Network architecture of SRGAN
Key features of the method –
Post upsampling type of framework
Subpixel layer for upsampling
Contains residual blocks
Uses Perceptual loss
Original code of SRGAN
conventional residual networks.
Check the original papers for detailed information.
Some of the key features of the methods –
Residual blocks – SRGAN successfully applied the ResNet architecture to the super-resolution problem with SRResNet, they further improved the performance by employing a better ResNet structure. In the proposed architecture –
Comparison of the residual blocks
They removed the batch normalization layers from the network as in SRResNets. Since batch normalization layers normalize the features, they get rid of range flexibility from networks by normalizing the features, it is better to remove them.
The architecture of EDSR, MDSR
In MDSR, they proposed a multiscale architecture that shares most of the parameters on different scales. The proposed multiscale model uses significantly fewer parameters than multiple single-scale models but shows comparable performance.
Original code of the methods
So now we have come to the end of the blog! To learn about super-resolution, refer to these survey papers.
The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.
Related
Top 10 Essential Prerequisites For Deep Learning Projects
The deep learning projects are used across industries ranging from medical to e-commerce
Deep learning is clearly the technology of the future and is one of the most sought-after innovations of our day. You should be aware of the requirements for DL if you’re interested in learning it. You can choose a better job path with the aid of deep learning projects prerequisite.
Deep learning is an interdisciplinary area of computer science and mathematics with the goal of teaching to carry out cognitive tasks in a manner that is similar to that of humans. Prerequisites for deep learning projects are a process through which computers collect input data, and study or analyze it. Different methods are used by deep learning prerequisites systems to automatically identify patterns in datasets that may contain structured data, quantitative data, textual data, visual data, etc. We’ll talk about the top requirements for deep learning projects in this section to help you prepare for learning its more complex ideas.
1. ProgrammingDeep learning requires programming as a core component. Deep learning demands the use of a programming language. Python or R are the programming languages of choice for deep learning experts due to their functionality and efficiency. You must study programming and become proficient in one of these two well-known programming languages before you can study the numerous deep learning topics.
2. StatisticsThe study of utilizing data and its visualization is known as statistics. It aids in extracting information from your raw data. Data science and the related sciences depend heavily on statistics. You would need to apply statistics to acquire insights from data as a deep learning specialist.
3. CalculusThe foundation of many machine learning algorithms is calculus. Therefore, studying calculus is a requirement for deep learning. You will create models using deep learning based on the features found in your data. You can use such properties and create the model as necessary with the aid of calculus.
4. Linear AlgebraLinear algebra is most likely one of the most crucial requirements for deep learning. Matrix, vector, and linear equations are all topics covered by linear algebra. It focuses on how linear equations are represented in vector spaces. You may design many models (classification, regression, etc.) with the aid of linear algebra, which is also a fundamental building block for many deep-learning ideas.
5. ProbabilityMathematics’ field of probability focuses on using numerical data to express how likely or valid an occurrence is to occur. Any event’s probability can range from 0 to 1, with 0 denoting impossibility and 1 denoting complete certainty.
6. Data ScienceData analysis and use are the focus of the field of data science. You must be knowledgeable with a variety of data science principles to construct models that manage data as a deep learning specialist. Understanding deep learning will assist you in using data to achieve the desired results, but mastering data science is a prerequisite for applying deep learning.
7. Work on ProjectsWhile mastering these topics will aid in the development of a solid foundation, you will also need to work on deep learning projects to ensure that you fully comprehend everything. You can apply what you’ve learned and identify your weak areas with the aid of projects. You can easily find a project that interests you because deep learning has applications in many different fields.
8. Neural NetworksThe word “neuron,” which is used to describe a single nerve cell, is where the word “neural” originates. That’s correct; a neural network is essentially a network of neurons that carry out routine tasks for us.
A significant portion of the issues we encounter daily is related to pattern recognition, object detection, and intelligence. The reality is that these reactions are challenging to automate even if they are carried out with such simplicity that we don’t even notice it.
9. Clustering AlgorithmsThe clustering problem is resolved with the most straightforward unsupervised learning approach. The K-means method divides n observations into k clusters, with each observation belonging to the cluster represented by the nearest mean.
10. RegressionTop 10 Deep Learning Projects For Engineering Students In 2023
If you are one of them wanting to start a career in deep learning, then you must read these top deep 10 learning projects
Deep learning
is a domain with diverse technologies such as tablets and computers that can learn based on programming and other data. Deep learning is emerging as a futuristic concept that can meet the requirements of people. When we take a look at the speech recognition technology and virtual assistants, they are run using
machine learning
and
deep learning technologies
. If you are one of them wanting to start a career in deep learning, then you must read this article as this article features current ideas for your upcoming deep learning project. Here is the list of the top 10 deep learning projects to know about in 2023.
ChatbotsDue to their skillful handling of a profusion of customer queries and messages without any issue, Chatbots play a significant role for industries. They are designed to lessen the customer service workload by automating the hefty part of the process. Nonetheless, chatbots execute this by utilizing their promising methods supported by technologies like machine learning, artificial intelligence, and deep learning. Therefore, creating a chatbot for your final deep learning project will be a great idea.
Forest Fire PredictionCreating a forest fire prediction system is one of the best deep learning projects and it will be another considerable utilization of the abilities provided by deep learning. Forest fire is an uncontrolled fire in a forest causing a hefty amount of damage to not only nature but the animal habitat, and human property as well. To control the chaotic nature of forest fires and even predict them, you can create a deep learning project utilizing k-means massing to comprehend major fire hotspots and their intensity.
Digit Recognition SystemThis project involves developing a digit recognition system that can classify digits based on the set tenets. The project aims to create a recognition system that can classify digits ranging from 0 to 9 using a combination of shallow network and deep neural network and by implementing logistic regression. Softmax Regression or Multinomial Logistic Regression is the ideal choice for this project. Since this technique is a generalization of logistic regression, it is apt for multi-class classification, assuming that all the classes are mutually exclusive.
Image Caption Generator Project in PythonThis is one of the most interesting deep learning projects. It is easy for humans to describe what is in an image but for computers, an image is just a bunch of numbers that represent the color value of each pixel. This project utilizes deep learning methods where you implement a convolutional neural network (CNN) with a Recurrent Neural Network (LSTM) to build the image caption generator.
Traffic Signs RecognitionTraffic signs and rules are crucial that every driver must obey to prevent accidents. To follow the rule, one must first understand what the traffic sign looks like. In the Traffic signs recognition project, you will learn how a program can identify the type of traffic sign by taking an image as input. For a final-year engineering student, it is one of the best deep learning projects to try.
Credit Card Fraud DetectionWith the increase in online transactions, credit card frauds have also increased. Banks are trying to handle this issue using deep learning techniques. In this deep learning project, you can use python to create a classification problem to detect credit card fraud by analyzing the previously available data.
Customer SegmentationThis is one of the most popular deep learning projects every student should try. Before running any campaign companies create different groups of customers. Customer segmentation is a popular application of unsupervised learning. Using clustering, companies identify segments of customers to target the potential user base.
Movie Recommendation SystemIn this deep learning project, you have to utilize R to perform a movie recommendation through technologies like Machine Learning and
Artificial Intelligence
. A recommendation system sends out suggestions to users through a filtering process based on other users’ preferences and browsing history. If A and B like Home Alone and B likes Mean Girls, it can be suggested to A – they might like it too. This keeps customers engaged with the platform.
Visual tracking systemA visual tracking system is designed to track and locate moving object(s) in a given time frame via a camera. It is a handy tool that has numerous applications such as security and surveillance, medical imaging, augmented reality, traffic control, video editing and communication, and human-computer interaction.
Drowsiness detection systemUpdate the detailed information about Fake News Classification Using Deep Learning on the Daihoichemgio.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!