You are reading the article Electron Theory Of Matter And Atoms updated in December 2023 on the website Daihoichemgio.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested January 2024 Electron Theory Of Matter And Atoms
The electron theory of matter is one of the most successful and experimentally proved theory that can explain the nature of electricity. Although the study of electricity has been attracting the attention of scientists for several hundred years, and there were several experiments and theories developed to understand the nature of electricity. The only theory that has explained it successfully is the electron theory of matter.
The electron theory of matter is the result of experiments and researches conducted by many scientists like J. J. Thomson, R. A. Millikan, Earnest Rutherford and Bohr. This article is meant for explaining the concept of the electron theory of matter and the atom.
Electron Theory of MatterThe nature of electricity can be easily explained by the electron theory of matter. This theory states that all substances whether solid, liquid or gas is composed of small particles called molecules. Where, a molecule is in turn made up of minute particles called atoms.
According to the electron theory of matter,
The substance whose molecules consist of same type of atoms is called element. For example, oxygen is an element because its molecule has two atoms of same type.
The substance whose molecules consist of different kind of atoms is called compound. For example, water is a compound because its molecules contains two atoms of hydrogen and one atom of oxygen.
The AtomThe atom is the basic building block of the substance. Basically, atoms are the sub-parts of molecules. An atom consists of two parts namely,
Nucleus
Extra-nucleus
The nucleus is the central part of an atom and it contains two subatomic particles namely protons and neutrons. The proton is a positively charged particle. The magnitude of charge on a proton is equal to $mathrm{+1.67times 10^{-27}C}$ . A proton is having a mass of $mathrm{+1.67times 10^{-27}} kg. Another particle inside the nucleus is neutron. A neutron is an electrically neutral particle which means it does not carry any charge. Though, the mass of a neutron is equal to that of the proton. Therefore, the nucleus of an atom bears a positive charge.
Extra-NucleusThe extra-nucleus is the outer part of an atom, i.e. it is the space in an atom around the nucleus. The extra-nucleus contains electrons only. Thus, an electron is also a subatomic particle. An electron carries a negative charge of magnitude equal to $mathrm{-1.67times 10^{-19}C}$ . The mass of an electron is equal to $mathrm{9.1times 10^{-31}}$ kg. Here, we can observe that the mass of an electron is very small as compared to that of a proton or a neutron. Therefore, the nucleus of an atom constitute the entire weight of atom.
Electrons in an atom move around the nucleus in different paths (orbits). These electrons obey the following rules while moving in their orbit −
The maximum number of electrons that an orbit can have is equal to $mathrm{2n^{2}}$, where n is the number of the orbit. Thus, the first orbit has 2 electrons, the second orbit has 8 electrons and so on.
The outermost (last) orbit can have maximum 8 electrons.
No orbit cannot accommodate more than 18 electrons.
In the normal state of an atom, the number of electrons is equal to the number of protons. Therefore, under normal conditions, an atom is electrically neutral as a whole and does not exhibit electricity.
The following are two important measure of an atom −
Atomic Number − The number of electrons or protons in an atom is known as atomic number, i.e.,
Atomic number = No. of electrons or No. of protons
Atomic Weight − The sum of the number of protons and number of neutrons in the nucleus of an atom is known as atomic weight, i.e.,
Atomic weight = No. of protons + No. of neutrons
ConclusionIn this article, we discussed the electron theory of matter and the concept of the atom. These two concepts play a vital role in understanding the nature of electricity. The electron theory of matter is the only theory that has survived over the years to explain the nature of electricity. It also helps in understanding the basic structure of the atom.
You're reading Electron Theory Of Matter And Atoms
This Giant Exoplanet’s Atmosphere Teems With Glowing Hot Atoms Of Titanium And Iron
For the first time, astronomers have detected iron and titanium vapors in a planet’s sky—the metals glowing hot like the filaments in a light bulb in the searing atmosphere. The strategy used to make this discovery might one day hunt for signs of life on alien worlds, researchers added.
Scientists investigated the exoplanet KELT-9b, the hottest alien world discovered yet, with daytime temperatures reaching more than 4,300 degrees C, hotter than many stars. This planet is located about 650 light years from Earth in the constellation Cygnus the Swan. It circles the young blue star KELT-9, which is nearly twice as hot as our sun. KELT-9b, which is about 2.8 times Jupiter’s mass, orbits its star roughly 10 times closer than Mercury does the sun.
KELT-9b belongs to a class of worlds known as ultra-hot Jupiters that blur the lines between stars and gas giants. The scorching heat of these exoplanets gives researchers an exceptional opportunity to analyze the ingredients of their atmospheres. When chemicals are heated, they each can give off a unique pattern or spectrum of light that can help identify them like fingerprints. The fact that such Jupiter-like worlds are ultra-hot means that atoms or molecules that might not ordinarily reach high enough temperatures in regular planets to give off light, such as iron, might emit detectable spectra.
Now exoplanet astronomer Jens Hoeijmakers at the Universities of Geneva and Bern in Switzerland and his colleagues have detected light from iron and titanium from KELT-9b. Using Italy’s Galileo National Telescope on the Canary Island of La Palma on the night of July 31, 2023, they detected the spectra of neutrally charged iron atoms and positively charged iron and titanium ions. Their results were published this week in Nature.
“It is the first time that iron and titanium have been robustly detected in the atmosphere of any planet beyond the solar system,” said study co-author Kevin Heng, a theoretical astrophysicist at the University of Bern in Switzerland.
Indeed, since none of the worlds in the solar system are hot enough to possess iron or titanium vapors in their atmospheres, this is the first time such metal gases have been detected in the sky of any planet, said astronomer Laura Kreidberg at Harvard University, who did not take part in this research. “You have to have insanely high temperatures — more like stellar temps than planetary — to get these elements in gaseous atomic form,” she said. “”It’s a very exciting result!”
The material in KELT-9b almost certainly had a common origin with its star. “Our understanding of exoplanet formation tells us that the star and the exoplanet formed from a common disk of dust and gas,” Heng said. “Deciphering the chemical composition of KELT-9b’s atmosphere will give us some chance at understanding its formation history.”
Normally, a major portion of the atoms and molecules making up a planet are hidden in clouds or deep under the atmosphere. Think of Earth—most of the molecules that make up our planet, from the core to the surface, aren’t represented in the atmosphere. In contrast, KELT-9b “is so incredibly hot, all the atoms and molecules are uniformly mixed through the atmosphere,” Kreidberg said. “We can therefore see the raw materials that the planet is made of. This planet is an unmatched laboratory for studying the building blocks of planet formation.”
While Kelt-9b is too hot to ever support life as we know it, the strategy used to detect iron and titanium in KELT-9b’s atmosphere “is the same exact technique that can be used to detect molecules interesting for biology in a future, yet-to-be-discovered exoplanet,” Heng said, such as oxygen or organic molecules. “You can say that hot exoplanets are a training ground for us to hone our techniques and prepare for the exciting targets to emerge in the coming decade.”
Knowledge Distillation: Theory And End To End Case Study
This article was published as a part of the Data Science Blogathon
on a business problem to classify x-ray images for pneumonia detection.
Image Source: Alpha Coders
What is Knowledge Distillation?Knowledge Distillation aims to transfer knowledge from a large deep learning model to a small deep learning model. Here size is in the context of the number of parameters present in the model which directly relates to the latency of the model.
Knowledge distillation is therefore a method to compress the model while maintaining accuracy. Here the bigger network which gives the knowledge is called a Teacher Network and the smaller network which is receiving the knowledge is called a Student Network.
(Image Source: Author, Inspired from Reference [6])
Why make the Model Lighter?In many applications, the model needs to be deployed on systems that have low computational power such as mobile devices, edge devices. For example, in the medical field, limited computation power systems (example: POCUS – Point of Care Ultrasound) are used in remote areas where it is required to run the models in real-time. From both time(latency) and memory (computation power) it is desirable to have ultra-lite and accurate deep learning models.
But ultra-lite (a few thousand parameters) models may not give us good accuracy. This is where we utilize Knowledge Distillation, taking help from the teacher network. It basically makes the model lite while maintaining accuracy.
Knowledge Distillation StepsBelow are the steps for Knowledge distillation:
3) Train the student network intelligently in coordination with the teacher network: The student network is trained in coordination with the fully trained teacher network. Here forward propagation is done on both teacher and student networks and backpropagation is done on the student network. There are two loss functions defined. One is student loss and distillation loss function. These loss functions are explained in the next paragraph of this article.
Knowledge Distillation Mathematical Equations:
(Image Source: Author, Inspired from Reference [7])
Loss Functions for teacher and student networks are defined as below:
Teacher Loss LT: (between actual lables and predictions by teacher network)
LT = H(p,qT)
Total Student Loss LTS :
LTS = α * Student Loss + Distallation Loss
LTS = α* H(p,qs) + H(q̃T, q̃S)
Where,
Distillation Loss = H(
q̃T, q̃S
)
Student Loss = H(p,qS)
Here:
H
: Loss function (Categorical Cross Entropy or KL Divergence)
z
T
and z
S
: pre-softmax logits
q̃T : softmax(zT/t)
q̃S: softmax(zS/t)
alpha (α) and temperature (t) are hyperparameters.
Temperature t is used to reduce the magnitude difference among the class likelihood values.
These mathematical equations are taken from reference [3].
End to End Case StudyHere we will look at a case study where we will implement the knowledge distillation concept in an image classification problem for pneumonia detection.
About Data:
The dataset contains chest x-ray images. Each image can belong to one of three classes:
2) PNEUMONIA_BACTERIA or BACTERIA
3) PNEUMONIA_VIRUS or VIRUS
Let’s get started!!
Importing Required Libraries:
import numpy as np import matplotlib.pyplot as plt import os import pandas as pd import glob import shutil import tensorflow as tf from tensorflow.keras.models import Model from tensorflow.keras.layers import Conv2D, Dropout, MaxPool2D, BatchNormalization, Input, Conv2DTranspose, Concatenate from tensorflow.keras.losses import SparseCategoricalCrossentropy, CategoricalCrossentropy from tensorflow.keras.optimizers import Adam from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint import matplotlib.pyplot as plt from tensorflow.keras.preprocessing.image import ImageDataGenerator import cv2 from sklearn.model_selection import train_test_split import random import h5py from IPython.display import display from PIL import Image as im import datetime import random from tensorflow.keras import layersDownloading the data
The data set is huge. I have randomly selected 1000 images for each class and kept 800 images in train data, 100 images in the validation data, and 100 images in test data for each of the classes. I had zipped this and uploaded this selected data into my google drive.
S. No. Class Train Test Validation
1. Normal 800 800 800
2. BACTERIA 100 100 100
3. VIRUS 100 100 100
Downloading the data from google drive to google colab:
#downloading the data and unzipping it from google.colab import drive drive.mount('/content/drive') !unzip "/content/drive/MyDrive/data_xray.zip" -d "/content/"Visualizing the images
We will now look at some images from each of the classes.
for i, folder in enumerate(os.listdir(train_path)): for j, img in enumerate(os.listdir(train_path+"/"+folder)): filename = train_path+"/"+folder + "/" + img img= im.open(filename) ax = plt.subplot(3,4,4*i+j+1) ax.set_xlabel(folder+ ' '+ str(img.size[0]) +'x'+ str(img.size[1])) plt.imshow(img, 'gray') ax.set_xlabel(folder+ ' '+ str(img.size[0]) +'x'+ str(img.size[1])) ax.axes.xaxis.set_ticklabels([]) ax.axes.yaxis.set_ticklabels([]) #plt.axis('off') img.close() breakSo above sample images suggest that each x-ray image can be of a different size.
Creating Data Generators
We will use Keras ImageDataGenerator for image augmentation. Image augmentation is a tool to get multiple transformed copies of an image. These transformations can be cropping, rotating, flipping. This helps in generalizing the model. This will also ensure that we get the same size (224×224) for each image. Below are the codes for train and validation data generators.
def trainGenerator(batch_size, train_path): datagen = ImageDataGenerator(rescale=1. / 255, rotation_range=5, shear_range=0.02, zoom_range=0.1, brightness_range=[0.7,1.3], horizontal_flip=True, vertical_flip=True, fill_mode='nearest') train_gen = datagen.flow_from_directory(train_path, batch_size=batch_size,target_size=(224, 224), shuffle=True, seed=1, class_mode="categorical" ) for image, label in train_gen: yield (image, label)def validGenerator(batch_size, valid_path):
datagen = ImageDataGenerator(rescale=1. / 255, )
valid_gen = datagen.flow_from_directory(valid_path, batch_size=batch_size, target_size=(224, 224),shuffle=True, seed=1 )
for image, label in valid_gen:
yield (image, label)
Model 1: Teacher Network
Here we will use the VGG16 model and train it using transfer learning (based on the ImageNet dataset).
We will first define the VGG16 model.
from tensorflow.keras.applications.vgg16 import VGG16base_model = VGG16(input_shape = (224, 224, 3), # Shape of our images
weights = ‘imagenet’)
Out of the total layers, We will make the first 8 layers untrainable:
len(base_model.layers)
for layer in base_model.layers[:8]:
layer.trainable = False
x = layers.Flatten()(base_model.output) # Add a fully connected layer with 512 hidden units and ReLU activation x = layers.Dense(512, activation='relu')(x) #x = layers.BatchNormalization()(x) # Add a dropout rate of 0.5 x = layers.Dropout(0.5)(x) x = layers.Dense(3)(x) #linear activation to get pre-soft logits model = tf.keras.models.Model(base_model.input, x) opti = Adam(learning_rate=1e-4, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.001) model.summary()As we can see, there are 27M parameters in the teacher network.
One important point to note here is that the last layer of the model does not have any activation function (i.e. it has default linear activation). Generally, there would be a softmax activation function in the last layer as this is a multi-class classification problem but here we are using the default linear activation function to get pre-softmax logits. Because these pre-softmax logits will be used along with the student network’s pre-softmax logits in the distillation loss function.
Hence, we are using from_logits = True in the CategoricalCrossEntropy loss function. This means that the loss function will calculate the loss directly from the logits. If we had used softmax activation, then it would have been from_logits = False.
We will now define a callback for the early stopping of the model and run the model.
Running the model:
earlystop = EarlyStopping(monitor='val_acc', patience=5, verbose=1) filepath="model_save/weights-{epoch:02d}-{val_accuracy:.4f}.hdf5" checkpoint = ModelCheckpoint(filepath=filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max') callbacks = [earlystop ] vgg_hist = model.fit(train_generator, validation_data = validation_generator, validation_steps=10, steps_per_epoch = 90, epochs = 50, callbacks=callbacks)Checking the accuracy and loss for each epoch:
import matplotlib.pyplot as plt plt.figure(1) # summarize history for accuracy plt.subplot(211) plt.plot(vgg_hist.history['acc']) plt.plot(vgg_hist.history['val_acc']) plt.title('teacher model accuracy') plt.ylabel('accuracy') plt.xlabel('epoch') plt.legend(['train', 'valid'], loc='lower right') # summarize history for loss plt.subplot(212) plt.plot(vgg_hist.history['loss']) plt.plot(vgg_hist.history['val_loss']) plt.title('teacher model loss') plt.ylabel('loss') plt.xlabel('epoch') plt.legend(['train', 'valid'], loc='upper right') plt.show()Now we will evaluate the model on the test data:
# First, we are going to load the file names and their respective target labels into a numpy array!
from sklearn.datasets import load_files import numpy as np test_dir = '/content/test' def load_dataset(path): data = load_files(path) files = np.array(data['filenames']) targets = np.array(data['target']) target_labels = np.array(data['target_names']) return files,targets,target_labels x_test, y_test,target_labels = load_dataset(test_dir) from keras.utils import np_utils y_test = np_utils.to_categorical(y_test,no_of_classes) # We just have the file names in the x set. Let's load the images and convert them into array. from keras.preprocessing.image import array_to_img, img_to_array, load_img def convert_image_to_array(files): images_as_array=[] for file in files: # Convert to Numpy Array images_as_array.append(tf.image.resize(img_to_array(load_img(file)), (224, 224))) return images_as_array x_test = np.array(convert_image_to_array(x_test)) print('Test set shape : ',x_test.shape) x_test = x_test.astype('float32')/255 # Let's visualize test prediction. y_pred_logits = model.predict(x_test) y_pred = tf.nn.softmax(y_pred_logits) # plot a raandom sample of test images, their predicted labels, and ground truth fig = plt.figure(figsize=(16, 9)) for i, idx in enumerate(np.random.choice(x_test.shape[0], size=16, replace=False)): ax = fig.add_subplot(4, 4, i + 1, xticks=[], yticks=[]) ax.imshow(np.squeeze(x_test[idx])) pred_idx = np.argmax(y_pred[idx]) true_idx = np.argmax(y_test[idx]) ax.set_title("{} ({})".format(target_labels[pred_idx], target_labels[true_idx]), color=("green" if pred_idx == true_idx else "red"))Calculating the accuracy of the test dataset:
print(model.metrics_names) loss, acc = model.evaluate(x_test, y_test, verbose = 1) print('test loss = ', loss) print('test accuracy = ',acc)
Model 2 –S
The student network defined here has a series of 2D convolutions and max-pooling layers just like our teacher network VGG16. The only difference is that number of Convolutions filters in the student network is very less in each layer as compared to the teacher network. This would make us achieve our goal to have a very less number of weights (parameters) to be learned in the student network during training.
Defining the student network:
# import necessary layers
from tensorflow.keras.layers import Input, Conv2D
from tensorflow.keras.layers import MaxPool2D, Flatten, Dense, Dropout from tensorflow.keras import Model # input input = Input(shape =(224,224,3)) # 1st Conv Block x = Conv2D (filters =8, kernel_size =3, padding ='valid', activation='relu')(input) x = Conv2D (filters =8, kernel_size =3, padding ='valid', activation='relu')(x) x = MaxPool2D(pool_size =2, strides =2, padding ='valid')(x) # 2nd Conv Block x = Conv2D (filters =16, kernel_size =3, padding ='valid', activation='relu')(x) x = Conv2D (filters =16, kernel_size =3, padding ='valid', activation='relu')(x) x = MaxPool2D(pool_size =2, strides =2, padding ='valid')(x) # 3rd Conv block x = Conv2D (filters =32, kernel_size =3, padding ='valid', activation='relu')(x) x = Conv2D (filters =32, kernel_size =3, padding ='valid', activation='relu')(x) #x = Conv2D (filters =32, kernel_size =3, padding ='valid', activation='relu')(x) x = MaxPool2D(pool_size =2, strides =2, padding ='valid')(x) # 4th Conv block x = Conv2D (filters =64, kernel_size =3, padding ='valid', activation='relu')(x) x = Conv2D (filters =64, kernel_size =3, padding ='valid', activation='relu')(x) #x = Conv2D (filters =64, kernel_size =3, padding ='valid', activation='relu')(x) x = MaxPool2D(pool_size =2, strides =2, padding ='valid')(x) # 5th Conv block x = Conv2D (filters =64, kernel_size =3, padding ='valid', activation='relu')(x) x = Conv2D (filters =64, kernel_size =3, padding ='valid', activation='relu')(x) #x = Conv2D (filters =64, kernel_size =3, padding ='valid', activation='relu')(x) x = MaxPool2D(pool_size =2, strides =2, padding ='valid')(x) # Fully connected layers x = Flatten()(x) #x = Dense(units = 1028, activation ='relu')(x) x = Dense(units = 256, activation ='relu')(x) x = Dropout(0.5)(x) output = Dense(units = 3)(x) #last layer with linear activation # creating the model s_model_1 = Model (inputs=input, outputs =output) s_model_1.summary()Note that the number of parameters here is only 296k as compared to what we got in the teacher network (27M).
Now we will define the distiller. Distiller is a custom class that we will define in Keras in order to establish coordination/communication with the teacher network.
This Distiller Class takes student-teacher networks, hyperparameters (alpha and temperature as mentioned in the first part of this article), and the train data (x,y) as input. The Distiller Class does forward propagation of teacher and student networks and calculates both the losses: Student Loss and Distillation Loss. Then the backpropagation of the student network is done and weights are updated.
Defining the Distiller:
class Distiller(keras.Model): def __init__(self, student, teacher): super(Distiller, self).__init__() self.teacher = teacher self.student = student def compile( self, optimizer, metrics, student_loss_fn, distillation_loss_fn, alpha=0.5, temperature=2, ): """ Configure the distiller. Args: optimizer: Keras optimizer for the student weights metrics: Keras metrics for evaluation student_loss_fn: Loss function of difference between student predictions and ground-truth distillation_loss_fn: Loss function of difference between soft student predictions and soft teacher predictions alpha: weight to student_loss_fn and 1-alpha to distillation_loss_fn temperature: Temperature for softening probability distributions. Larger temperature gives softer distributions. """ self.student_loss_fn = student_loss_fn self.distillation_loss_fn = distillation_loss_fn self.alpha = alpha self.temperature = temperature def train_step(self, data): # Unpack data x, y = data # Forward pass of teacher teacher_predictions = self.teacher(x, training=False) #model = ... # create the original model teacher_predictions = self.teacher(x, training=False) with tf.GradientTape() as tape: # Forward pass of student # Forward pass of student student_predictions = self.student(x, training=True) # Compute losses student_loss = self.student_loss_fn(y, student_predictions) distillation_loss = self.distillation_loss_fn( tf.nn.softmax(teacher_predictions / self.temperature, axis=1), tf.nn.softmax(student_predictions / self.temperature, axis=1), ) loss = self.alpha * student_loss + distillation_loss # Compute gradients trainable_vars = self.student.trainable_variables gradients = tape.gradient(loss, trainable_vars) # Update weights self.optimizer.apply_gradients(zip(gradients, trainable_vars)) # Update the metrics configured in `compile()`. # Return a dict of performance results = {m.name: m.result() for m in self.metrics} results.update( {"student_loss": student_loss, "distillation_loss": distillation_loss} ) return results def test_step(self, data): # Unpack the data x, y = data # Compute predictions y_prediction = self.student(x, training=False) # Calculate the loss student_loss = self.student_loss_fn(y, y_prediction) # Update the metrics. # Return a dict of performance results = {m.name: m.result() for m in self.metrics} results.update({"student_loss": student_loss}) return resultsNow we will initialize and compile the distiller. Here for the student loss, we are using the Categorical cross-entropy function and for distillation loss, we are using the KLDivergence loss function.
KLDivergence loss function is used to calculate the distance between two probability distributions. By minimizing the KLDivergence we are trying to make student network predict similar to teacher network.
Compiling and Running the Student Network Distiller:
# Initialize and compile distiller distiller = Distiller(student=s_model_1, teacher=model) optimizer=Adam(learning_rate=1e-4, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.001), metrics=['acc'], student_loss_fn=CategoricalCrossentropy(from_logits=True), distillation_loss_fn=tf.keras.losses.KLDivergence(), alpha=0.5, temperature=2, ) # Distill teacher to student distiller_hist = distiller.fit(train_generator, validation_data = validation_generator, epochs=50, validation_steps=10, steps_per_epoch = 90)Checking the plot of accuracy and loss for each epoch:
import matplotlib.pyplot as plt plt.figure(1) # summarize history for accuracy plt.subplot(211) plt.plot(distiller_hist.history['acc']) plt.plot(distiller_hist.history['val_acc']) plt.title('model accuracy') plt.ylabel('accuracy') plt.xlabel('epoch') plt.legend(['train', 'valid'], loc='lower right') # summarize history for loss plt.subplot(212) plt.plot(distiller_hist.history['student_loss']) plt.plot(distiller_hist.history['val_student_loss']) plt.title('model loss') plt.ylabel('loss') plt.xlabel('epoch') plt.legend(['train', 'valid'], loc='upper right') plt.show() plt.tight_layout()Checking accuracy on the test data:
print(distiller.metrics_names) acc, loss = distiller.evaluate(x_test, y_test, verbose = 1) print('test loss = ', loss) print('test accuracy = ',acc)We have got 74% accuracy on the test data. With the teacher network, we had got 77% accuracy. Now we will change the hyperparameter t, to see if we can improve the accuracy in the student network.
Compiling and Running the Distiller with t = 6:
# Initialize and compile distiller distiller = Distiller(student=s_model_1, teacher=model) optimizer=Adam(learning_rate=1e-4, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.001), metrics=['acc'], student_loss_fn=CategoricalCrossentropy(from_logits=True), #distillation_loss_fn=CategoricalCrossentropy(), distillation_loss_fn=tf.keras.losses.KLDivergence(), alpha=0.5, temperature=6, ) # Distill teacher to student distiller_hist = distiller.fit(train_generator, validation_data = validation_generator, epochs=50, validation_steps=10, steps_per_epoch = 90)Plotting the loss and accuracy for each epoch:
import matplotlib.pyplot as plt plt.figure(1)# summarize history for accuracy
plt.subplot(211)
plt.plot(distiller_hist.history['acc'])
plt.plot(distiller_hist.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'valid'], loc='lower right')
# summarize history for loss
plt.subplot(212)
plt.plot(distiller_hist.history['student_loss'])
plt.plot(distiller_hist.history['val_student_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'valid'], loc='upper right')
plt.show()
plt.tight_layout()Checking the test accuracy:
print(distiller.metrics_names) acc, loss = distiller.evaluate(x_test, y_test, verbose = 1) print('test loss = ', loss) print('test accuracy = ',acc)With t = 6, we have got 75% accuracy which is better than what we got with t = 2.
This way, we can do more iterations by changing the values of hypermeters alpha (α) and temperature (t) in order to get better accuracy.
Model 3: Student Model without Knowledge Distillation
Now we will check the student model without Knowledge Distillation. Here there will be no coordination with the teacher network and there will be only one loss function i.e. Student Loss.
The student model remains the same as the previous ithout distillation.
Compiling and running the model:
opti = Adam(learning_rate=1e-4, beta_1=0.9, beta_2=0.999, ep , decay=0.001) earlystop = EarlyStopping(monitor='val_acc', patience=5, verbose=1) filepath="model_save/weights-{epoch:02d}-{val_accuracy:.4f}.hdf5" checkpoint = ModelCheckpoint(filepath=filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max') callbacks = [earlystop ] s_model_2_hist = s_model_2.fit(train_generator, validation_data = validation_generator, validation_steps=10, steps_per_epoch = 90, epochs = 50, callbacks=callbacks)Our model stopped in 13 epochs as we had used early stop callback if there is no improvement in validation accuracy in 5 epochs.
Plotting the loss and accuracy for each epoch:
import matplotlib.pyplot as plt plt.figure(1) # summarize history for accuracy plt.subplot(211) plt.plot(s_model_2_hist.history['acc']) plt.plot(s_model_2_hist.history['val_acc']) plt.title('model accuracy') plt.ylabel('accuracy') plt.xlabel('epoch') plt.legend(['train', 'valid'], loc='lower right') # summarize history for loss plt.subplot(212) plt.plot(s_model_2_hist.history['loss']) plt.plot(s_model_2_hist.history['val_loss']) plt.title('model loss') plt.ylabel('loss') plt.xlabel('epoch') plt.legend(['train', 'valid'], loc='upper right') plt.tight_layout() plt.show()Checking the Test Accuracy:
print(s_model_2.metrics_names)
loss, acc = s_model_2.evaluate(x_test, y_test, verbose = 1)
print(‘test loss = ‘, loss)
print(‘test accuracy = ‘,acc)
Here we are able to achieve 64% accuracy on the test data.
Result Summary:
Below is the comparison of all four models that are made in this case study:
S. No. Model No. of Parameters Hyperparameter Test Accuracy
1 Teacher Model 27 M – 77%
2 Student Model with Distillation 296 k α = 0.5, t = 2 74%
3 Student Model with Distillation 296 k α = 0.5, t = 6 75%
4
Student Model without Distillation
296 k – 64%
As seen from the above table, with Knowledge distillation, we have achieved 75% accuracy with a very lite neural network. We can play around with the hypermeters α and t to improve it further.
ConclusionIn this article, we saw that Knowledge Distillation can compress a Deep CNN while maintaining the accuracy so that it can be deployed on embedded systems that have less storage and computational power.
We used Knowledge Distillation on the Pneumonia detection problem from x-ray images. By distilling Knowledge from a Teacher Network having 27M parameters to a Student Network having only 0.296M parameters (almost 100 times lighter), we were able to achieve almost the same accuracy. With more hyperparameter iterations and ensembling of multiple students networks as mentioned in reference [3], the performance of the student model can be further improved.
References
1) Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning 2023.
2) Dataset: Kermany, Daniel; Zhang, Kang; Goldbaum, Michael (2023), “Labeled Optical Coherence Tomography (OCT) and Chest X-Ray Images for Classification”, Mendeley Data, V2, doi: 10.17632/rscbjbr9sj.2
3) Designing Lightweight Deep Learning Models for Echocardiography View Classification 2023.
The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.
Related
Personal Construct Theory: George Kelly
What does the Personal Construct Theory Define?
The personal Construct Theory states that individuals develop their ideas and rules to interpret things and events around them. Given in the 1950s by George Kelly, an American psychologist, this theory views people as scientists. Like scientists, every person observes their environment, understands the events, and draws conclusions. Thus, they formulate hypotheses about how the world works and test them daily. This is why people who may be experiencing the same thing perceive it differently.
What are Constructs?
When people use their experiences and perceptions to conclude something, they use what is known as ‘constructs.’ We behave following the expectation that our constructs will predict and explain the reality of our world. These are used to test the hypotheses that individuals develop. Constructs are constantly evaluated and modified as we go through life experiencing new things.
The structure of Personal Construct Theory The Basic Postulate
The basic postulate says, “A person’s processes are psychologically channelized by how he anticipates events.” This means that humans build a construct based on how they perceive or construe an event, and this construct is used in the future to verify if that prediction was true or not. Hence, the postulate reaffirms the idea of humans acting like scientists by developing and testing hypotheses.
The 11 Corollaries
The corollaries, as suggested by Kelly, expand on the primary postulate.
Construction Corollary − No event or experience can happen again exactly as it did, but an event can still be repeated with some changes. We predict how we will behave in a similar event based on these similarities.
Individuality Corollary − People differ from each other in their constructions of events. Each person is unique and has unique experiences and perceptions, and hence, their constructs are different.
Organization Corollary − We arrange our constructs in patterns and how they relate to each other. We consider their similarities as well as their differences. We organize these constructs in a hierarchy, with some constructs subordinate to others. A construct can include one or more subordinate constructs.
Dichotomy Corollary − A person’s construction system is composed of a finite number of dichotomous constructs. The constructs we store are bipolar or dichotomous. For example, if there is a construct of ‘kindness,’ there will also be a construct of ‘unkindness.’ This is important for us to anticipate future events correctly. Just as we note similarities among people or events, we must also account for dissimilarities.
Choice Corollary − A person chooses for himself that alternative in a dichotomized construct through which he anticipates the greater possibility for the elaboration of his system. This construct explains that people choose the alternative construct that helps them expand on their experiences.
Range Corollary − A construct is applicable for anticipating only a finite range of events, and a construct may have a great range or a very short range. This means that our constructs can apply to a wide set of objects or very narrow objects.
Experience Corollary − We modify or reconstruct our constructs as we experience things we did not expect. If we observe that a construct does not predict the outcome of a situation correctly, then it must be reformulated or replaced.
Modulation Corollary − Constructs differ based on their permeability. A permeable construct will be one in which newer and bigger ideas can be included after the construct has been made. A permeable construct is open to new experiences and is capable of being revised or extended by them.
Fragmentation Corollary − A person may have a variety of construction subsystems that are incompatible with each other.
Commonality Corollary − Kelly suggested that if a group of people interprets an experience similarly, we can conclude that their cognitive processes are similar. This can be true when it comes to cultural experiences.
Sociality Corollary − This talks about how commonalities do not necessarily bring out positive relationships. To have positive relations, people must understand each other’s constructs. We must understand how another person thinks if we are to anticipate how that person will predict events.
Conclusion
The Personal Construct Theory is very significant in cognitive psychology and personality psychology. While it introduced many new concepts and terms, it has also been criticized. Despite the criticism, the theory left its mark shortly after its conception.
An Introduction To Graph Theory And Network Analysis (With Python Codes)
Introduction
“A picture speaks a thousand words” is one of the most commonly used phrases. But a graph speaks so much more than that. A visual representation of data, in the form of graphs, helps us gain actionable insights and make better data driven decisions based on them.
But to truly understand what graphs are and why they are used, we will need to understand a concept known as Graph Theory. Understanding this concept makes us better programmers (and better data science professionals!).
But if you have tried to understand this concept before, you’ll have come across tons of formulae and dry theoretical concepts. That is why we decided to write this blog post. We have explained the concepts and then provided illustrations so you can follow along and intuitively understand how the functions are performing. This is a detailed post, because we believe that providing a proper explanation of this concept is a much preferred option over succinct definitions.
In this article, we will look at what graphs are, their applications and a bit of history about them. We’ll also cover some Graph Theory concepts and then take up a case study using python to cement our understanding.
Ready? Let’s dive into it.
Table of Contents
Graphs and their applications
History and why graphs?
Terminologies you need to know
Graph Theory Concepts
Getting familiar with Graphs in python
Analysis on a dataset
Graphs and their applicationsLet us look at a simple graph to understand the concept. Look at the image below –
Consider that this graph represents the places in a city that people generally visit, and the path that was followed by a visitor of that city. Let us consider V as the places and E as the path to travel from one place to another.
V = {v1, v2, v3, v4, v5} E = {(v1,v2), (v2,v5), (v5, v5), (v4,v5), (v4,v4)}The edge (u,v) is the same as the edge (v,u) – They are unordered pairs.
Concretely – Graphs are mathematical structures used to study pairwise relationships between objects and entities. It is a branch of Discrete Mathematics and has found multiple applications in Computer Science, Chemistry, Linguistics, Operations Research, Sociology etc.
The Data Science and Analytics field has also used Graphs to model various structures and problems. As a Data Scientist, you should be able to solve problems in an efficient manner and Graphs provide a mechanism to do that in cases where the data is arranged in a specific way.
Formally,
A Graph is a pair of sets. G = (V,E). V is the set of vertices. E is a set of edges. E is made up of pairs of elements from V (unordered pair)
A DiGraph is also a pair of sets. D = (V,A). V is the set of vertices. A is the set of arcs. A is made up of pairs of elements from V (ordered pair)
In the case of digraphs, there is a distinction between `(u,v)` and `(v,u)`. Usually the edges are called arcs in such cases to indicate a notion of direction.
There are packages that exist in R and Python to analyze data using Graph theory concepts. In this article we will be briefly looking at some of the concepts and analyze a dataset using Networkx Python package.
from IPython.display import Image Image('images/network.PNG') Image('images/usecase.PNG')From the above examples it is clear that the applications of Graphs in Data Analytics are numerous and vast. Let us look at a few use cases:
Marketing Analytics – Graphs can be used to figure out the most influential people in a Social Network. Advertisers and Marketers can estimate the biggest bang for the marketing buck by routing their message through the most influential people in a Social Network
Banking Transactions – Graphs can be used to find unusual patterns helping in mitigating Fraudulent transactions. There have been examples where Terrorist activity has been detected by analyzing the flow of money across interconnected Banking networks
Supply Chain – Graphs help in identifying optimum routes for your delivery trucks and in identifying locations for warehouses and delivery centres
Pharma – Pharma companies can optimize the routes of the salesman using Graph theory. This helps in cutting costs and reducing the travel time for salesman
Telecom – Telecom companies typically use Graphs (Voronoi diagrams) to understand the quantity and location of Cell towers to ensure maximum coverage
History and Why Graphs? History of GraphsIf you want to know more on how the ideas from graph has been formlated – read on!
The origin of the theory can be traced back to the Konigsberg bridge problem (circa 1730s). The problem asks if the seven bridges in the city of Konigsberg can be traversed under the following constraints
no doubling back
you end at the same place you started
This is the same as asking if the multigraph of 4 nodes and 7 edges has an Eulerian cycle (An Eulerian cycle is an Eulerian path that starts and ends on the same Vertex. And an Eulerian path is a path in a Graph that traverses each edge exactly once. More Terminology is given below). This problem led to the concept of Eulerian Graph. In the case of the Konigsberg bridge problem the answer is no and it was first answered by (you guessed it) Euler.
In 1840, A.F Mobius gave the idea of complete graph and bipartite graph and Kuratowski proved that they are planar by means of recreational problems. The concept of tree, (a connected graph without cycles) was implemented by Gustav Kirchhoff in 1845, and he employed graph theoretical ideas in the calculation of currents in electrical networks or circuits.
In 1852, Thomas Gutherie found the famous four color problem. Then in 1856, Thomas. P. Kirkman and William R.Hamilton studied cycles on polyhydra and invented the concept called Hamiltonian graph by studying trips that visited certain sites exactly once. In 1913, H.Dudeney mentioned a puzzle problem. Eventhough the four color problem was invented it was solved only after a century by Kenneth Appel and Wolfgang Haken. This time is considered as the birth of Graph Theory.
Caley studied particular analytical forms from differential calculus to study the trees. This had many implications in theoretical chemistry. This lead to the invention of enumerative graph theory. Any how the term “Graph” was introduced by Sylvester in 1878 where he drew an analogy between “Quantic invariants” and covariants of algebra and molecular diagrams.
In 1941, Ramsey worked on colorations which lead to the identification of another branch of graph theory called extremel graph theory. In 1969, the four color problem was solved using computers by Heinrich. The study of asymptotic graph connectivity gave rise to random graph theory. The histories of Graph Theory and Topology are also closely related. They share many common concepts and theorems.
Image('images/Konigsberg.PNG', width = 800) Why Graphs?Here are a few points that help you motivate to use graphs in your day-to-day data science problems –
Graphs provide a better way of dealing with abstract concepts like relationships and interactions. They also offer an intuitively visual way of thinking about these concepts. Graphs also form a natural basis for analyzing relationships in a Social context
Graph Databases have become common computational tools and alternatives to SQL and NoSQL databases
Graphs are used to model analytics workflows in the form of DAGs (Directed acyclic graphs)
Some Neural Network Frameworks also use DAGs to model the various operations in different layers
Graph Theory concepts are used to study and model Social Networks, Fraud patterns, Power consumption patterns, Virality and Influence in Social Media. Social Network Analysis (SNA) is probably the best known application of Graph Theory for Data Science
It is used in Clustering algorithms – Specifically K-Means
System Dynamics also uses some Graph Theory concepts – Specifically loops
Path Optimization is a subset of the Optimization problem that also uses Graph concepts
From a Computer Science perspective – Graphs offer computational efficiency. The Big O complexity for some algorithms is better for data arranged in the form of Graphs (compared to tabular data)
Terminology you should knowBefore you go any further into the article, it is recommended that you should get familiar with these terminologies.
The vertices u and v are called the end vertices of the edge (u,v)
If two edges have the same end vertices they are Parallel
An edge of the form (v,v) is a loop
A Graph is simple if it has no parallel edges and loops
A Graph is said to be Empty if it has no edges. Meaning E is empty
A Graph is a Null Graph if it has no vertices. Meaning V and E is empty
A Graph with only 1 Vertex is a Trivial graph
Edges are Adjacent if they have a common vertex. Vertices are Adjacent if they have a common edge
The degree of the vertex v, written as d(v), is the number of edges with v as an end vertex. By convention, we count a loop twice and parallel edges contribute separately
Isolated Vertices are vertices with degree 1. d(1) vertices are isolated
A Graph is Complete if its edge set contains every possible edge between ALL of the vertices
A Walk in a Graph G = (V,E) is a finite, alternating sequence of the form
V
i
E
i
ViEi
consisting of vertices and edges of the graph G
A Walk is Open if the initial and final vertices are different. A Walk is Closed if the initial and final vertices are the same
A Walk is a Trail if ANY edge is traversed atmost once
A Trail is a Path if ANY vertex is traversed atmost once (Except for a closed walk)
A Closed Path is a Circuit – Analogous to electrical circuits
Graph Theory conceptsIn this section, we’ll look at some of the concepts useful for Data Analysis (in no particular order). Please note that there are a lot more concepts that require a depth which is out of scope of this article. So let’s get into it.
Average Path LengthThe average of the shortest path lengths for all possible node pairs. Gives a measure of ‘tightness’ of the Graph and can be used to understand how quickly/easily something flows in this Network.
BFS and DFSBreadth first search and Depth first search are two different algorithms used to search for Nodes in a Graph. They are typically used to figure out if we can reach a Node from a given Node. This is also known as Graph Traversal
The aim of the BFS is to traverse the Graph as close as possible to the root Node, while the DFS algorithm aims to move as far as possible away from the root node.
CentralityOne of the most widely used and important conceptual tools for analysing networks. Centrality aims to find the most important nodes in a network. There may be different notions of “important” and hence there are many centrality measures. Centrality measures themselves have a form of classification (or Types of centrality measures). There are measures that are characterized by flow along the edges and those that are characterized by Walk Structure.
Some of the most commonly used ones are:
Degree Centrality – The first and conceptually the simplest Centrality definition. This is the number of edges connected to a node. In the case of a directed graph, we can have 2 degree centrality measures. Inflow and Outflow Centrality
Closeness Centrality – Of a node is the average length of the shortest path from the node to all other nodes
Betweenness Centrality – Number of times a node is present in the shortest path between 2 other nodes
These centrality measures have variants and the definitions can be implemented using various algorithms. All in all, this means a large number of definitions and algorithms.
Network DensityA measure of how many edges a Graph has. The actual definition will vary depending on type of Graph and the context in which the question is asked. For a complete undirected Graph the Density is 1, while it is 0 for an empty Graph. Graph Density can be greater than 1 in some situations (involving loops).
Graph RandomizationsWhile the definitions of some Graph metrics maybe easy to calculate, it is not easy to understand their relative importance. We use Network/Graph Randomizations in such cases. We calculate the metric for the Graph at hand and for another similar Graph that is randomly generated. This similarity can for example be the same number of density and nodes. Typically we generate a 1000 similar random graphs and calculate the Graph metric for each of them and then compare it with the same metric for the Graph at hand to arrive at some notion of a benchmark.
In Data Science when trying to make a claim about a Graph it helps if it is contrasted with some randomly generated Graphs.
Getting Familiar with Graphs in pythonWe will be using the networkx package in Python. It can be installed in the Root environment of Anaconda (if you are using the Anaconda distribution of Python). You can also pip install it.
Let us look at some common things that can be done with the Networkx package. These include importing and creating a Graph and ways to visualize it.
Graph Creation import networkx as nx # Creating a Graph G = nx.Graph() # Right now G is empty # Add a node G.add_node(1) G.add_nodes_from([2,3]) # You can also add a list of nodes by passing a list argument # Add edges G.add_edge(1,2) e = (2,3) G.add_edge(*e) # * unpacks the tuple G.add_edges_from([(1,2), (1,3)]) # Just like nodes we can add edges from a listNode and Edge attributes can be added along with the creation of Nodes and Edges by passing a tuple containing node and attribute dict.
In addition to constructing graphs node-by-node or edge-by-edge, they can also be generated by applying classic graph operations, such as:
subgraph(G, nbunch) - induced subgraph view of G on nodes in nbunch union(G1,G2) - graph union disjoint_union(G1,G2) - graph union assuming all nodes are different cartesian_product(G1,G2) - return Cartesian product graph compose(G1,G2) - combine graphs identifying nodes common to both complement(G) - graph complement create_empty_copy(G) - return an empty copy of the same graph class convert_to_undirected(G) - return an undirected representation of G convert_to_directed(G) - return a directed representation of GSeparate classes exist for different types of Graphs. For example the nx.DiGraph() class allows you to create a Directed Graph. Specific graphs containing paths can be created directly using a single method. For a full list of Graph creation methods please refer to the full documentation. Link is given at the end of the article.
Image('images/graphclasses.PNG', width = 400) Accessing edges and nodesNodes and Edges can be accessed together using the G.nodes() and G.edges() methods. Individual nodes and edges can be accessed using the bracket/subscript notation.
G.nodes()NodeView((1, 2, 3))
G.edges()EdgeView([(1, 2), (1, 3), (2, 3)])
G[1] # same as G.adj[1]AtlasView({2: {}, 3: {}})
G[1][2]{}
G.edges[1, 2]{}
Graph VisualizationNetworkx provides basic functionality for visualizing graphs, but its main goal is to enable graph analysis rather than perform graph visualization. Graph visualization is hard and we will have to use specific tools dedicated for this task. Matplotlib offers some convenience functions. But GraphViz is probably the best tool for us as it offers a Python interface in the form of PyGraphViz (link to documentation below).
%matplotlib inline import matplotlib.pyplot as plt nx.draw(G) import pygraphviz as pgv d={'1': {'2': None}, '2': {'1': None, '3': None}, '3': {'1': None}} A = pgv.AGraph(data=d) print(A) # This is the 'string' or simple representation of the Graph Output: strict graph "" { 1 -- 2; 2 -- 3; 3 -- 1; }PyGraphviz provides great control over the individual attributes of the edges and nodes. We can get very beautiful visualizations using it.
# Let us create another Graph where we can individually control the colour of each node B = pgv.AGraph() # Setting node attributes that are common for all nodes B.node_attr['style']='filled' B.node_attr['shape']='circle' B.node_attr['fixedsize']='true' B.node_attr['fontcolor']='#FFFFFF' # Creating and setting node attributes that vary for each node (using a for loop) for i in range(16): B.add_edge(0,i) n=B.get_node(i) n.attr['fillcolor']="#%2x0000"%(i*16) n.attr['height']="%s"%(i/16.0+0.5) n.attr['width']="%s"%(i/16.0+0.5) B.draw('star.png',prog="circo") # This creates a .png file in the local directory. Displayed below. Image('images/star.png', width=650) # The Graph visualization we created above.Usually, visualization is thought of as a separate task from Graph analysis. A graph once analyzed is exported as a Dotfile. This Dotfile is then visualized separately to illustrate a specific point we are trying to make.
Analysis on a DatasetWe will be looking to take a generic dataset (not one that is specifically intended to be used for Graphs) and do some manipulation (in pandas) so that it can be ingested into a Graph in the form of a edgelist. And edgelist is a list of tuples that contain the vertices defining every edge
The dataset we will be looking at comes from the Airlines Industry. It has some basic information on the Airline routes. There is a Source of a journey and a destination. There are also a few columns indicating arrival and departure times for each journey. As you can imagine this dataset lends itself beautifully to be analysed as a Graph. Imagine a few cities (nodes) connected by airline routes (edges). If you are an airline carrier, you can then proceed to ask a few questions like
What is the shortest way to get from A to B? In terms of distance and in terms of time
Is there a way to go from C to D?
Which airports have the heaviest traffic?
Which airport in “in between” most other airports? So that it can be converted into a local hub
import pandas as pd import numpy as np data = pd.read_csv('data/Airlines.csv') data.shape (100, 16) data.dtypes year int64 month int64 day int64 dep_time float64 sched_dep_time int64 dep_delay float64 arr_time float64 sched_arr_time int64 arr_delay float64 carrier object flight int64 tailnum object origin object dest object air_time float64 distance int64 dtype: object
We notice that origin and destination look like good choices for Nodes. Everything can then be imagined as either node or edge attributes. A single edge can be thought of as a journey. And such a journey will have various times, a flight number, an airplane tail number etc associated with it
We notice that the year, month, day and time information is spread over many columns. We want to create one datetime column containing all of this information. We also need to keep scheduled and actual time of arrival and departure separate. So we should finally have 4 datetime columns (Scheduled and actual times of arrival and departure)
Additionally, the time columns are not in a proper format. 4:30 pm is represented as 1630 instead of 16:30. There is no delimiter to split that column. One approach is to use pandas string methods and regular expressions
We should also note that sched_dep_time and sched_arr_time are int64 dtype and dep_time and arr_time are float64 dtype
An additional complication is NaN values
# converting sched_dep_time to 'std' - Scheduled time of departure data['std'] = data.sched_dep_time.astype(str).str.replace('(d{2}$)', '') + ':' + data.sched_dep_time.astype(str).str.extract('(d{2}$)', expand=False) + ':00' # converting sched_arr_time to 'sta' - Scheduled time of arrival data['sta'] = data.sched_arr_time.astype(str).str.replace('(d{2}$)', '') + ':' + data.sched_arr_time.astype(str).str.extract('(d{2}$)', expand=False) + ':00' # converting dep_time to 'atd' - Actual time of departure data['atd'] = data.dep_time.fillna(0).astype(np.int64).astype(str).str.replace('(d{2}$)', '') + ':' + data.dep_time.fillna(0).astype(np.int64).astype(str).str.extract('(d{2}$)', expand=False) + ':00' # converting arr_time to 'ata' - Actual time of arrival data['ata'] = data.arr_time.fillna(0).astype(np.int64).astype(str).str.replace('(d{2}$)', '') + ':' + data.arr_time.fillna(0).astype(np.int64).astype(str).str.extract('(d{2}$)', expand=False) + ':00'We now have time columns in the format we wanted. Finally we may want to combine the year, month and day columns into a date column. This is not an absolutely necessary step. But we can easily obtain the year, month and day (and other) information once it is converted into datetime format.
data['date'] = pd.to_datetime(data[['year', 'month', 'day']]) # finally we drop the columns we don't need data = data.drop(columns = ['year', 'month', 'day'])Now import the dataset using the networkx function that ingests a pandas dataframe directly. Just like Graph creation there are multiple ways Data can be ingested into a Graph from multiple formats.
import networkx as nx FG = nx.from_pandas_edgelist(data, source='origin', target='dest', edge_attr=True,) FG.nodes()Output:
NodeView(('EWR', 'MEM', 'LGA', 'FLL', 'SEA', 'JFK', 'DEN', 'ORD', 'MIA', 'PBI', 'MCO', 'CMH', 'MSP', 'IAD', 'CLT', 'TPA', 'DCA', 'SJU', 'ATL', 'BHM', 'SRQ', 'MSY', 'DTW', 'LAX', 'JAX', 'RDU', 'MDW', 'DFW', 'IAH', 'SFO', 'STL', 'CVG', 'IND', 'RSW', 'BOS', 'CLE')) FG.edges()Output:
EdgeView([('EWR', 'MEM'), ('EWR', 'SEA'), ('EWR', 'MIA'), ('EWR', 'ORD'), ('EWR', 'MSP'), ('EWR', 'TPA'), ('EWR', 'MSY'), ('EWR', 'DFW'), ('EWR', 'IAH'), ('EWR', 'SFO'), ('EWR', 'CVG'), ('EWR', 'IND'), ('EWR', 'RDU'), ('EWR', 'IAD'), ('EWR', 'RSW'), ('EWR', 'BOS'), ('EWR', 'PBI'), ('EWR', 'LAX'), ('EWR', 'MCO'), ('EWR', 'SJU'), ('LGA', 'FLL'), ('LGA', 'ORD'), ('LGA', 'PBI'), ('LGA', 'CMH'), ('LGA', 'IAD'), ('LGA', 'CLT'), ('LGA', 'MIA'), ('LGA', 'DCA'), ('LGA', 'BHM'), ('LGA', 'RDU'), ('LGA', 'ATL'), ('LGA', 'TPA'), ('LGA', 'MDW'), ('LGA', 'DEN'), ('LGA', 'MSP'), ('LGA', 'DTW'), ('LGA', 'STL'), ('LGA', 'MCO'), ('LGA', 'CVG'), ('LGA', 'IAH'), ('FLL', 'JFK'), ('SEA', 'JFK'), ('JFK', 'DEN'), ('JFK', 'MCO'), ('JFK', 'TPA'), ('JFK', 'SJU'), ('JFK', 'ATL'), ('JFK', 'SRQ'), ('JFK', 'DCA'), ('JFK', 'DTW'), ('JFK', 'LAX'), ('JFK', 'JAX'), ('JFK', 'CLT'), ('JFK', 'PBI'), ('JFK', 'CLE'), ('JFK', 'IAD'), ('JFK', 'BOS')]) nx.draw_networkx(FG, with_labels=True) # Quick view of the Graph. As expected we see 3 very busy airports nx.algorithms.degree_centrality(FG) # Notice the 3 airports from which all of our 100 rows of data originates nx.density(FG) # Average edge density of the GraphsOutput:
0.09047619047619047 nx.average_shortest_path_length(FG) # Average shortest path length for ALL paths in the GraphOutput:
2.36984126984127 nx.average_degree_connectivity(FG) # For a node of degree k - What is the average of its neighbours' degree?Output:
{1: 19.307692307692307, 2: 19.0625, 3: 19.0, 17: 2.0588235294117645, 20: 1.95}As is obvious from looking at the Graph visualization (way above) – There are multiple paths from some airports to others. Let us say we want to calculate the shortest possible route between 2 such airports. Right off the bat we can think of a couple of ways of doing it
There is the shortest path by distance
There is the shortest path by flight time
What we can do is to calculate the shortest path algorithm by weighing the paths with either the distance or airtime. Please note that this is an approximate solution – The actual problem to solve is to calculate the shortest path factoring in the availability of a flight when you reach your transfer airport + wait time for the transfer. This is a more complete approach and this is how humans normally plan their travel. For the purposes of this article we will just assume that is flight is readily available when you reach an airport and calculate the shortest path using the airtime as the weight
Let us take the example of JAX and DFW airports:
# Let us find all the paths available for path in nx.all_simple_paths(FG, source='JAX', target='DFW'): print(path) # Let us find the dijkstra path from JAX to DFW. dijpath = nx.dijkstra_path(FG, source='JAX', target='DFW') dijpathOutput:
['JAX', 'JFK', 'SEA', 'EWR', 'DFW'] # Let us try to find the dijkstra path weighted by airtime (approximate case) shortpath = nx.dijkstra_path(FG, source='JAX', target='DFW', weight='air_time') shortpathOutput:
['JAX', 'JFK', 'BOS', 'EWR', 'DFW'] ConclusionThis article has at best only managed a superficial introduction to the very interesting field of Graph Theory and Network analysis. Knowledge of the theory and the Python packages will add a valuable toolset to any Data Scientist’s arsenal. For the dataset used above, a series of other questions can be asked like:
Find the shortest path between two airports given Cost, Airtime and Availability?
You are an airline carrier and you have a fleet of airplanes. You have an idea of the demand available for your flights. Given that you have permission to operate 2 more airplanes (or add 2 airplanes to your fleet) which routes will you operate them on to maximize profitability?
Can you rearrange the flights and schedules to optimize a certain parameter (like Timeliness or Profitability etc)
Bibiliography and References About the AuthorSrivatsa currently works for TheMathCompany and has over 7.5 years of experience in Decision Sciences and Analytics. He has grown, led & scaled global teams across functions, industries & geographies. He has led India Delivery for a cross industry portfolio totalling $10M in revenues. He has also conducted several client workshops and training sessions to help level up technical and business domain knowledge.
During his career span, he has led premium client engagements with Industry leaders in Technology, e-commerce and retail. He helped set up the Analytics Center of Excellence for one of the world’s largest Insurance companies.
Related
Risk Management In Banks – Introducing Awesome Theory
Risk!!!!!!! Whenever we hear this word, we start panicking & thinking about what type of risk it could be, i.e., a physical or financial risk. As per the survey, it’s been found that a person or an individual has always feared losing something of value, which majorly consists of finance. And if we see today, not only an individual but also organizations fear losing their money.
As we all know, no one can grow or earn more without taking risks, but due to modernization, liberalization, and growing competition, the rate of risk and uncertainty has also increased. And this has created trouble for an individual and the banking sectors and financial institutions. In order to sustain and grow in the market, banks have to mitigate or curb these risks. Thus, the risk management concept has come into the picture, providing guidelines or acting as a roadmap for a banking organization to reduce the risk factor.
Below article will focus on quotients like what risk management is. What risks do banks face, and how do they manage through the risk management process?
What is RISK Management in Bank?We all come across the word RISK in our life but have you ever wondered where this word originates from??? What is the origin of this word??? So, firstly we will discuss…
What Is Risk?“Risk” can be a link to the Latin “Rescum,” which means Risk at Sea. Risk can be defined as losing something of value or something which is weighed against the potential to gain something of value. Values can be of any type i.e. health, financial, emotional well-being, etc. Risk can also be an interaction with uncertainty. Risk perception is subjective in nature, people make their own judgments about the severity of a risk, and it varies from person to person. Every human being carries some risk and defines those risks according to their own judgment.
What is Risk Management?As we all are aware, what is the risk? But how can one tackle risk when they face it?? So, the concept of Risk Management manages the risk or uncertain event. Risk Management refers to the exercise or practice of forecasting the potential risks, thus analyzing and evaluating those risks and taking some corrective measures to reduce or minimize those risks.
Today, many organizations or entities practice risk management to curb the risk they can face in the near future. Whenever an organization makes any decision related to investments, they try to find out the number of financial risks attached to it. Financial risks can be high inflation, recession, volatility in capital markets, bankruptcy, etc. The quantum of such risks depends on the type of financial instruments in which an organization or an individual invests.
So, in order to reduce or curb such exposure of risks to investments, fund managers and investors practice or exercise risk management. For example, an individual may consider investing in fixed deposits less risky as compared to investing in the share market. As investment in the equity market is riskier than the fixed deposit, thus through the practice of risk management, equity analysts or investors will diversify its portfolio in order to minimize the risk.
How Important is Risk Management for Banks?Until now, we have seen how risk management works and how important it is to curb or reduce risk. As risk is inherent, particularly in financial institutions, banking organizations, and even in general, this article will deal with how Risk Management is important for banking institutions. To date, banking sectors have been working in a regulated environment and were not exposed to the risks much. Still, due to the increase of severe competition, banks have been experiencing various types of risks, such as financial and non-financial risks.
The function and process of Risk Management in Banks are complex, so the banks are trying to use the simplest and most sophisticated models for analyzing and evaluating the risks. Banks should have the expertise and skills to deal with the risks involved in the integration process. In order to compete effectively, large-scale banking organizations should develop internal risk management models. At a more desired level, Head office staff should be trained in risk modeling and analytic tools to conduct Risk Management in Banks.
Risk Management in Indian Banking SectorThe practice of Risk Management in Banks is newer in Indian banks. Still, the risk management model has gained importance due to the growing competition, increased volatility, and fluctuations in markets. The practice of risk management has resulted in increased efficiency in governing Indian banks and has also increased the practice of corporate governance. The essential feature of the risk management model is to minimize or reduce the risks of the products ad services which are offered by the banks therefore, in order to mitigate the internal & external risks, there is a need for an efficient risk management framework.
Indian banks must prepare risk management models or frameworks due to the increasing global competition by foreign banks, the introduction of innovative financial products and instruments, and increasing deregulation.
Classification of Risks in the Banking Sector 1. Credit Risk
Credit risks involve borrower risk, industry risk, and portfolio risk. As it checks the creditworthiness of the industry, borrower, etc.
It is also a default risk, which checks the inability of an industry, counterparty, or customer who is unable to meet the commitments of making a settlement of financial transactions.
Internal and external factors both influence the credit risk of a bank portfolio.
Internal factors consist of a lack of appraisal of the borrower’s financial status, inadequate risk pricing, lending limits are not defined properly, absence of post-sanction surveillance, proper loan agreements or policies are not defined, etc.
Whereas external factor comprises trade restrictions, fluctuation in exchange rates and interest rates, fluctuations in commodities or equity prices, tax structure, government policies, political system, etc.
How do banks manage this risk?
Top management consent or attention is crucial in order to manage credit risk.
The Credit Risk Management Process includes the following:
In a loan policy of banks, the risk management process should be articulated.
Through credit rating or scoring, the degree of risk can be measured.
It can be quantified by estimating expected and unexpected financial losses, and even risk pricing can be done on a scientific basis.
A Credit Policy Committee should be in each bank to review the credit policies, procedures, and agreements. It thus can analyze, evaluate and manage a bank’s credit risk on a wide basis.
2. Market Risk
Earlier, majorly for all the banks managing credit risk was the primary task or challenge.
But due to the modernization and progress in the banking sector, market risks started arising, such as fluctuations in interest rates, changes in market variables, fluctuations in commodity prices or equity prices, and even fluctuations in foreign exchange rates, etc.
So, it became essential to manage the market risk too. Even a minute change in market variables results in a substantial change in the economic value of banks.
Market risk comprises liquidity risk, interest rate risk, foreign exchange rate risk, and hedging risk.
How do banks manage this risk?
The major concern for the top management of banks is to manage the market risk.
Top management of banks should clearly articulate the market risk policies, agreements, review mechanisms, auditing & reporting systems, etc. These policies should clearly mention the risk measurement systems which capture the sources of materials from banks and thus has an effect on banks.
Banks should form Asset-Liability Management Committee, whose main task is to maintain & manage the balance sheet within the risk or performance parameters.
In order to track the market risk on a real-time basis, banks should set up an independent middle office.
The middle office should consist of members who are market experts in analyzing the market risk. The experts can be: economists, statisticians, and general bankers.
The members of the middle office should be separate from treasury departments or in the daily activities of the treasury department.
3. Operational Risk
Managing operational risk has become essential for better risk management practice.
Operational risk arose due to the modernization of the banking sector and financial markets. It gave rise to structural changes, an increase in the volume of transactions, and complex support systems.
Operational risk cannot be categorized as market risk or credit risk. This risk can be related to the settlement of payments, interruption in business activities, and legal and administrative risk.
As operational risk involves risk related to business interruption or problem, this could trigger market or credit risks. Therefore, operational risk has some sort of linkage with credit or market risks.
How do banks manage this risk?
Measuring operational risk requires an estimation of the probability of operational loss and the potential size of the loss.
Banks can make use of analytical and judgmental techniques to measure operational risk levels.
Risk of operations can be audit ratings, data on quality, historical loss experience, data on turnover or volume, etc. Some international banks have developed rating matrices similar to bond credit ratings.
Operational risk should be assessed & reviewed at regular intervals.
For quantifying operational risk, Indian banks have not evolved any scientific methods and are using a simple benchmark system that measures business activity.
Recommended ArticlesHere are some articles that will help you to get more detail about Risk Management in Banks. So, just go through the link.
Risk Management in Banks InfographicLearn the juice of this article in just a single minute, Infographic of Risk Management in Banks
Update the detailed information about Electron Theory Of Matter And Atoms on the Daihoichemgio.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!