Trending December 2023 # An Introduction To Graph Theory And Network Analysis (With Python Codes) # Suggested January 2024 # Top 21 Popular

You are reading the article An Introduction To Graph Theory And Network Analysis (With Python Codes) updated in December 2023 on the website We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested January 2024 An Introduction To Graph Theory And Network Analysis (With Python Codes)


“A picture speaks a thousand words” is one of the most commonly used phrases. But a graph speaks so much more than that. A visual representation of data, in the form of graphs, helps us gain actionable insights and make better data driven decisions based on them.

But to truly understand what graphs are and why they are used, we will need to understand a concept known as Graph Theory. Understanding this concept makes us better programmers (and better data science professionals!).

But if you have tried to understand this concept before, you’ll have come across tons of formulae and dry theoretical concepts. That is why we decided to write this blog post. We have explained the concepts and then provided illustrations so you can follow along and intuitively understand how the functions are performing. This is a detailed post, because we believe that providing a proper explanation of this concept is a much preferred option over succinct definitions.

In this article, we will look at what graphs are, their applications and a bit of history about them. We’ll also cover some Graph Theory concepts and then take up a case study using python to cement our understanding.

Ready? Let’s dive into it.

Table of Contents

Graphs and their applications

History and why graphs?

Terminologies you need to know

Graph Theory Concepts

Getting familiar with Graphs in python

Analysis on a dataset

Graphs and their applications

Let us look at a simple graph to understand the concept. Look at the image below –

Consider that this graph represents the places in a city that people generally visit, and the path that was followed by a visitor of that city. Let us consider V as the places and E as the path to travel from one place to another.

V = {v1, v2, v3, v4, v5} E = {(v1,v2), (v2,v5), (v5, v5), (v4,v5), (v4,v4)}

The edge (u,v) is the same as the edge (v,u) – They are unordered pairs.

Concretely – Graphs are mathematical structures used to study pairwise relationships between objects and entities. It is a branch of Discrete Mathematics and has found multiple applications in Computer Science, Chemistry, Linguistics, Operations Research, Sociology etc.

The Data Science and Analytics field has also used Graphs to model various structures and problems. As a Data Scientist, you should be able to solve problems in an efficient manner and Graphs provide a mechanism to do that in cases where the data is arranged in a specific way.


A Graph is a pair of sets. G = (V,E). V is the set of vertices. E is a set of edges. E is made up of pairs of elements from V (unordered pair)

A DiGraph is also a pair of sets. D = (V,A). V is the set of vertices. A is the set of arcs. A is made up of pairs of elements from V (ordered pair)

In the case of digraphs, there is a distinction between `(u,v)` and `(v,u)`. Usually the edges are called arcs in such cases to indicate a notion of direction.

There are packages that exist in R and Python to analyze data using Graph theory concepts. In this article we will be briefly looking at some of the concepts and analyze a dataset using Networkx Python package.

from IPython.display import Image Image('images/network.PNG')


From the above examples it is clear that the applications of Graphs in Data Analytics are numerous and vast. Let us look at a few use cases:

Marketing Analytics – Graphs can be used to figure out the most influential people in a Social Network. Advertisers and Marketers can estimate the biggest bang for the marketing buck by routing their message through the most influential people in a Social Network

Banking Transactions – Graphs can be used to find unusual patterns helping in mitigating Fraudulent transactions. There have been examples where Terrorist activity has been detected by analyzing the flow of money across interconnected Banking networks

Supply Chain – Graphs help in identifying optimum routes for your delivery trucks and in identifying locations for warehouses and delivery centres

Pharma – Pharma companies can optimize the routes of the salesman using Graph theory. This helps in cutting costs and reducing the travel time for salesman

Telecom – Telecom companies typically use Graphs (Voronoi diagrams) to understand the quantity and location of Cell towers to ensure maximum coverage

History and Why Graphs? History of Graphs

If you want to know more on how the ideas from graph has been formlated – read on!

The origin of the theory can be traced back to the Konigsberg bridge problem (circa 1730s). The problem asks if the seven bridges in the city of Konigsberg can be traversed under the following constraints

no doubling back

you end at the same place you started

This is the same as asking if the multigraph of 4 nodes and 7 edges has an Eulerian cycle (An Eulerian cycle is an Eulerian path that starts and ends on the same Vertex. And an Eulerian path is a path in a Graph that traverses each edge exactly once. More Terminology is given below). This problem led to the concept of Eulerian Graph. In the case of the Konigsberg bridge problem the answer is no and it was first answered by (you guessed it) Euler.

In 1840, A.F Mobius gave the idea of complete graph and bipartite graph and Kuratowski proved that they are planar by means of recreational problems. The concept of tree, (a connected graph without cycles) was implemented by Gustav Kirchhoff in 1845, and he employed graph theoretical ideas in the calculation of currents in electrical networks or circuits.

In 1852, Thomas Gutherie found the famous four color problem. Then in 1856, Thomas. P. Kirkman and William R.Hamilton studied cycles on polyhydra and invented the concept called Hamiltonian graph by studying trips that visited certain sites exactly once. In 1913, H.Dudeney mentioned a puzzle problem. Eventhough the four color problem was invented it was solved only after a century by Kenneth Appel and Wolfgang Haken. This time is considered as the birth of Graph Theory.

Caley studied particular analytical forms from differential calculus to study the trees. This had many implications in theoretical chemistry. This lead to the invention of enumerative graph theory. Any how the term “Graph” was introduced by Sylvester in 1878 where he drew an analogy between “Quantic invariants” and covariants of algebra and molecular diagrams.

In 1941, Ramsey worked on colorations which lead to the identification of another branch of graph theory called extremel graph theory. In 1969, the four color problem was solved using computers by Heinrich. The study of asymptotic graph connectivity gave rise to random graph theory. The histories of Graph Theory and Topology are also closely related. They share many common concepts and theorems.

Image('images/Konigsberg.PNG', width = 800)

Why Graphs?

Here are a few points that help you motivate to use graphs in your day-to-day data science problems –

Graphs provide a better way of dealing with abstract concepts like relationships and interactions. They also offer an intuitively visual way of thinking about these concepts. Graphs also form a natural basis for analyzing relationships in a Social context

Graph Databases have become common computational tools and alternatives to SQL and NoSQL databases

Graphs are used to model analytics workflows in the form of DAGs (Directed acyclic graphs)

Some Neural Network Frameworks also use DAGs to model the various operations in different layers

Graph Theory concepts are used to study and model Social Networks, Fraud patterns, Power consumption patterns, Virality and Influence in Social Media. Social Network Analysis (SNA) is probably the best known application of Graph Theory for Data Science

It is used in Clustering algorithms – Specifically K-Means

System Dynamics also uses some Graph Theory concepts – Specifically loops

Path Optimization is a subset of the Optimization problem that also uses Graph concepts

From a Computer Science perspective – Graphs offer computational efficiency. The Big O complexity for some algorithms is better for data arranged in the form of Graphs (compared to tabular data)

Terminology you should know

Before you go any further into the article, it is recommended that you should get familiar with these terminologies.

The vertices u and v are called the end vertices of the edge (u,v)

If two edges have the same end vertices they are Parallel

An edge of the form (v,v) is a loop

A Graph is simple if it has no parallel edges and loops

A Graph is said to be Empty if it has no edges. Meaning E is empty

A Graph is a Null Graph if it has no vertices. Meaning V and E is empty

A Graph with only 1 Vertex is a Trivial graph

Edges are Adjacent if they have a common vertex. Vertices are Adjacent if they have a common edge

The degree of the vertex v, written as d(v), is the number of edges with v as an end vertex. By convention, we count a loop twice and parallel edges contribute separately

Isolated Vertices are vertices with degree 1. d(1) vertices are isolated

A Graph is Complete if its edge set contains every possible edge between ALL of the vertices

A Walk in a Graph G = (V,E) is a finite, alternating sequence of the form 






 consisting of vertices and edges of the graph G

A Walk is Open if the initial and final vertices are different. A Walk is Closed if the initial and final vertices are the same

A Walk is a Trail if ANY edge is traversed atmost once

A Trail is a Path if ANY vertex is traversed atmost once (Except for a closed walk)

A Closed Path is a Circuit – Analogous to electrical circuits

Graph Theory concepts

In this section, we’ll look at some of the concepts useful for Data Analysis (in no particular order). Please note that there are a lot more concepts that require a depth which is out of scope of this article. So let’s get into it.

Average Path Length

The average of the shortest path lengths for all possible node pairs. Gives a measure of ‘tightness’ of the Graph and can be used to understand how quickly/easily something flows in this Network.


Breadth first search and Depth first search are two different algorithms used to search for Nodes in a Graph. They are typically used to figure out if we can reach a Node from a given Node. This is also known as Graph Traversal

The aim of the BFS is to traverse the Graph as close as possible to the root Node, while the DFS algorithm aims to move as far as possible away from the root node.


One of the most widely used and important conceptual tools for analysing networks. Centrality aims to find the most important nodes in a network. There may be different notions of “important” and hence there are many centrality measures. Centrality measures themselves have a form of classification (or Types of centrality measures). There are measures that are characterized by flow along the edges and those that are characterized by Walk Structure.

Some of the most commonly used ones are:

Degree Centrality – The first and conceptually the simplest Centrality definition. This is the number of edges connected to a node. In the case of a directed graph, we can have 2 degree centrality measures. Inflow and Outflow Centrality

Closeness Centrality – Of a node is the average length of the shortest path from the node to all other nodes

Betweenness Centrality – Number of times a node is present in the shortest path between 2 other nodes

These centrality measures have variants and the definitions can be implemented using various algorithms. All in all, this means a large number of definitions and algorithms.

Network Density

A measure of how many edges a Graph has. The actual definition will vary depending on type of Graph and the context in which the question is asked. For a complete undirected Graph the Density is 1, while it is 0 for an empty Graph. Graph Density can be greater than 1 in some situations (involving loops).

Graph Randomizations

While the definitions of some Graph metrics maybe easy to calculate, it is not easy to understand their relative importance. We use Network/Graph Randomizations in such cases. We calculate the metric for the Graph at hand and for another similar Graph that is randomly generated. This similarity can for example be the same number of density and nodes. Typically we generate a 1000 similar random graphs and calculate the Graph metric for each of them and then compare it with the same metric for the Graph at hand to arrive at some notion of a benchmark.

In Data Science when trying to make a claim about a Graph it helps if it is contrasted with some randomly generated Graphs.

Getting Familiar with Graphs in python

We will be using the networkx package in Python. It can be installed in the Root environment of Anaconda (if you are using the Anaconda distribution of Python). You can also pip install it.

Let us look at some common things that can be done with the Networkx package. These include importing and creating a Graph and ways to visualize it.

Graph Creation import networkx as nx # Creating a Graph G = nx.Graph() # Right now G is empty # Add a node G.add_node(1) G.add_nodes_from([2,3]) # You can also add a list of nodes by passing a list argument # Add edges G.add_edge(1,2) e = (2,3) G.add_edge(*e) # * unpacks the tuple G.add_edges_from([(1,2), (1,3)]) # Just like nodes we can add edges from a list

Node and Edge attributes can be added along with the creation of Nodes and Edges by passing a tuple containing node and attribute dict.

In addition to constructing graphs node-by-node or edge-by-edge, they can also be generated by applying classic graph operations, such as:

subgraph(G, nbunch) - induced subgraph view of G on nodes in nbunch union(G1,G2) - graph union disjoint_union(G1,G2) - graph union assuming all nodes are different cartesian_product(G1,G2) - return Cartesian product graph compose(G1,G2) - combine graphs identifying nodes common to both complement(G) - graph complement create_empty_copy(G) - return an empty copy of the same graph class convert_to_undirected(G) - return an undirected representation of G convert_to_directed(G) - return a directed representation of G

Separate classes exist for different types of Graphs. For example the nx.DiGraph() class allows you to create a Directed Graph. Specific graphs containing paths can be created directly using a single method. For a full list of Graph creation methods please refer to the full documentation. Link is given at the end of the article.

Image('images/graphclasses.PNG', width = 400)

Accessing edges and nodes

Nodes and Edges can be accessed together using the G.nodes() and G.edges() methods. Individual nodes and edges can be accessed using the bracket/subscript notation.


NodeView((1, 2, 3))


EdgeView([(1, 2), (1, 3), (2, 3)])

G[1] # same as G.adj[1]

AtlasView({2: {}, 3: {}})



G.edges[1, 2]


Graph Visualization

Networkx provides basic functionality for visualizing graphs, but its main goal is to enable graph analysis rather than perform graph visualization. Graph visualization is hard and we will have to use specific tools dedicated for this task. Matplotlib offers some convenience functions. But GraphViz is probably the best tool for us as it offers a Python interface in the form of PyGraphViz (link to documentation below).

%matplotlib inline import matplotlib.pyplot as plt nx.draw(G)

import pygraphviz as pgv d={'1': {'2': None}, '2': {'1': None, '3': None}, '3': {'1': None}} A = pgv.AGraph(data=d) print(A) # This is the 'string' or simple representation of the Graph Output: strict graph "" { 1 -- 2; 2 -- 3; 3 -- 1; }

PyGraphviz provides great control over the individual attributes of the edges and nodes. We can get very beautiful visualizations using it.

# Let us create another Graph where we can individually control the colour of each node B = pgv.AGraph() # Setting node attributes that are common for all nodes B.node_attr['style']='filled' B.node_attr['shape']='circle' B.node_attr['fixedsize']='true' B.node_attr['fontcolor']='#FFFFFF' # Creating and setting node attributes that vary for each node (using a for loop) for i in range(16): B.add_edge(0,i) n=B.get_node(i) n.attr['fillcolor']="#%2x0000"%(i*16) n.attr['height']="%s"%(i/16.0+0.5) n.attr['width']="%s"%(i/16.0+0.5) B.draw('star.png',prog="circo") # This creates a .png file in the local directory. Displayed below. Image('images/star.png', width=650) # The Graph visualization we created above.

Usually, visualization is thought of as a separate task from Graph analysis. A graph once analyzed is exported as a Dotfile. This Dotfile is then visualized separately to illustrate a specific point we are trying to make.

Analysis on a Dataset

We will be looking to take a generic dataset (not one that is specifically intended to be used for Graphs) and do some manipulation (in pandas) so that it can be ingested into a Graph in the form of a edgelist. And edgelist is a list of tuples that contain the vertices defining every edge

The dataset we will be looking at comes from the Airlines Industry. It has some basic information on the Airline routes. There is a Source of a journey and a destination. There are also a few columns indicating arrival and departure times for each journey. As you can imagine this dataset lends itself beautifully to be analysed as a Graph. Imagine a few cities (nodes) connected by airline routes (edges). If you are an airline carrier, you can then proceed to ask a few questions like

What is the shortest way to get from A to B? In terms of distance and in terms of time

Is there a way to go from C to D?

Which airports have the heaviest traffic?

Which airport in “in between” most other airports? So that it can be converted into a local hub

import pandas as pd import numpy as np data = pd.read_csv('data/Airlines.csv') data.shape (100, 16) data.dtypes year int64 month int64 day int64 dep_time float64 sched_dep_time int64 dep_delay float64 arr_time float64 sched_arr_time int64 arr_delay float64 carrier object flight int64 tailnum object origin object dest object air_time float64 distance int64 dtype: object

We notice that origin and destination look like good choices for Nodes. Everything can then be imagined as either node or edge attributes. A single edge can be thought of as a journey. And such a journey will have various times, a flight number, an airplane tail number etc associated with it

We notice that the year, month, day and time information is spread over many columns. We want to create one datetime column containing all of this information. We also need to keep scheduled and actual time of arrival and departure separate. So we should finally have 4 datetime columns (Scheduled and actual times of arrival and departure)

Additionally, the time columns are not in a proper format. 4:30 pm is represented as 1630 instead of 16:30. There is no delimiter to split that column. One approach is to use pandas string methods and regular expressions

We should also note that sched_dep_time and sched_arr_time are int64 dtype and dep_time and arr_time are float64 dtype

An additional complication is NaN values

# converting sched_dep_time to 'std' - Scheduled time of departure data['std'] = data.sched_dep_time.astype(str).str.replace('(d{2}$)', '') + ':' + data.sched_dep_time.astype(str).str.extract('(d{2}$)', expand=False) + ':00' # converting sched_arr_time to 'sta' - Scheduled time of arrival data['sta'] = data.sched_arr_time.astype(str).str.replace('(d{2}$)', '') + ':' + data.sched_arr_time.astype(str).str.extract('(d{2}$)', expand=False) + ':00' # converting dep_time to 'atd' - Actual time of departure data['atd'] = data.dep_time.fillna(0).astype(np.int64).astype(str).str.replace('(d{2}$)', '') + ':' + data.dep_time.fillna(0).astype(np.int64).astype(str).str.extract('(d{2}$)', expand=False) + ':00' # converting arr_time to 'ata' - Actual time of arrival data['ata'] = data.arr_time.fillna(0).astype(np.int64).astype(str).str.replace('(d{2}$)', '') + ':' + data.arr_time.fillna(0).astype(np.int64).astype(str).str.extract('(d{2}$)', expand=False) + ':00'

We now have time columns in the format we wanted. Finally we may want to combine the year, month and day columns into a date column. This is not an absolutely necessary step. But we can easily obtain the year, month and day (and other) information once it is converted into datetime format.

data['date'] = pd.to_datetime(data[['year', 'month', 'day']]) # finally we drop the columns we don't need data = data.drop(columns = ['year', 'month', 'day'])

Now import the dataset using the networkx function that ingests a pandas dataframe directly. Just like Graph creation there are multiple ways Data can be ingested into a Graph from multiple formats.

import networkx as nx FG = nx.from_pandas_edgelist(data, source='origin', target='dest', edge_attr=True,) FG.nodes()


NodeView(('EWR', 'MEM', 'LGA', 'FLL', 'SEA', 'JFK', 'DEN', 'ORD', 'MIA', 'PBI', 'MCO', 'CMH', 'MSP', 'IAD', 'CLT', 'TPA', 'DCA', 'SJU', 'ATL', 'BHM', 'SRQ', 'MSY', 'DTW', 'LAX', 'JAX', 'RDU', 'MDW', 'DFW', 'IAH', 'SFO', 'STL', 'CVG', 'IND', 'RSW', 'BOS', 'CLE')) FG.edges()


EdgeView([('EWR', 'MEM'), ('EWR', 'SEA'), ('EWR', 'MIA'), ('EWR', 'ORD'), ('EWR', 'MSP'), ('EWR', 'TPA'), ('EWR', 'MSY'), ('EWR', 'DFW'), ('EWR', 'IAH'), ('EWR', 'SFO'), ('EWR', 'CVG'), ('EWR', 'IND'), ('EWR', 'RDU'), ('EWR', 'IAD'), ('EWR', 'RSW'), ('EWR', 'BOS'), ('EWR', 'PBI'), ('EWR', 'LAX'), ('EWR', 'MCO'), ('EWR', 'SJU'), ('LGA', 'FLL'), ('LGA', 'ORD'), ('LGA', 'PBI'), ('LGA', 'CMH'), ('LGA', 'IAD'), ('LGA', 'CLT'), ('LGA', 'MIA'), ('LGA', 'DCA'), ('LGA', 'BHM'), ('LGA', 'RDU'), ('LGA', 'ATL'), ('LGA', 'TPA'), ('LGA', 'MDW'), ('LGA', 'DEN'), ('LGA', 'MSP'), ('LGA', 'DTW'), ('LGA', 'STL'), ('LGA', 'MCO'), ('LGA', 'CVG'), ('LGA', 'IAH'), ('FLL', 'JFK'), ('SEA', 'JFK'), ('JFK', 'DEN'), ('JFK', 'MCO'), ('JFK', 'TPA'), ('JFK', 'SJU'), ('JFK', 'ATL'), ('JFK', 'SRQ'), ('JFK', 'DCA'), ('JFK', 'DTW'), ('JFK', 'LAX'), ('JFK', 'JAX'), ('JFK', 'CLT'), ('JFK', 'PBI'), ('JFK', 'CLE'), ('JFK', 'IAD'), ('JFK', 'BOS')]) nx.draw_networkx(FG, with_labels=True) # Quick view of the Graph. As expected we see 3 very busy airports

nx.algorithms.degree_centrality(FG) # Notice the 3 airports from which all of our 100 rows of data originates nx.density(FG) # Average edge density of the Graphs


0.09047619047619047 nx.average_shortest_path_length(FG) # Average shortest path length for ALL paths in the Graph


2.36984126984127 nx.average_degree_connectivity(FG) # For a node of degree k - What is the average of its neighbours' degree?


{1: 19.307692307692307, 2: 19.0625, 3: 19.0, 17: 2.0588235294117645, 20: 1.95}

As is obvious from looking at the Graph visualization (way above) – There are multiple paths from some airports to others. Let us say we want to calculate the shortest possible route between 2 such airports. Right off the bat we can think of a couple of ways of doing it

There is the shortest path by distance

There is the shortest path by flight time

What we can do is to calculate the shortest path algorithm by weighing the paths with either the distance or airtime. Please note that this is an approximate solution – The actual problem to solve is to calculate the shortest path factoring in the availability of a flight when you reach your transfer airport + wait time for the transfer. This is a more complete approach and this is how humans normally plan their travel. For the purposes of this article we will just assume that is flight is readily available when you reach an airport and calculate the shortest path using the airtime as the weight

Let us take the example of JAX and DFW airports:

# Let us find all the paths available for path in nx.all_simple_paths(FG, source='JAX', target='DFW'): print(path) # Let us find the dijkstra path from JAX to DFW. dijpath = nx.dijkstra_path(FG, source='JAX', target='DFW') dijpath


['JAX', 'JFK', 'SEA', 'EWR', 'DFW'] # Let us try to find the dijkstra path weighted by airtime (approximate case) shortpath = nx.dijkstra_path(FG, source='JAX', target='DFW', weight='air_time') shortpath


['JAX', 'JFK', 'BOS', 'EWR', 'DFW'] Conclusion

This article has at best only managed a superficial introduction to the very interesting field of Graph Theory and Network analysis. Knowledge of the theory and the Python packages will add a valuable toolset to any Data Scientist’s arsenal. For the dataset used above, a series of other questions can be asked like:

Find the shortest path between two airports given Cost, Airtime and Availability?

You are an airline carrier and you have a fleet of airplanes. You have an idea of the demand available for your flights. Given that you have permission to operate 2 more airplanes (or add 2 airplanes to your fleet) which routes will you operate them on to maximize profitability?

Can you rearrange the flights and schedules to optimize a certain parameter (like Timeliness or Profitability etc)

Bibiliography and References About the Author

Srivatsa currently works for TheMathCompany and has over 7.5 years of experience in Decision Sciences and Analytics. He has grown, led & scaled global teams across functions, industries & geographies. He has led India Delivery for a cross industry portfolio totalling $10M in revenues. He has also conducted several client workshops and training sessions to help level up technical and business domain knowledge.

During his career span, he has led premium client engagements with Industry leaders in Technology, e-commerce and retail. He helped set up the Analytics Center of Excellence for one of the world’s largest Insurance companies.


You're reading An Introduction To Graph Theory And Network Analysis (With Python Codes)

An Introduction To Mobile Seo

12 key issues you must consider for Mobile SEO

Mobile SEO refers to the search engine optimization of websites combined with flawless viewing on mobile devices, such as smartphones and tablets. Thanks to the increasing boom of portable devices, webmasters should be highly concerned with their mobile SEO plan. After all, more than 50% of Internet users now report surfing websites through their mobile devices daily. Google is already favouring mobile friendly sites.

Based on my experience of different mobile SEO projects,  I have created this detailed guide on mobile SEO. I hope it will help newcomers to mobile SEO missing some of the key issues. Below I will spend time offering my recommendations to boost your understanding on how to properly optimize your website for optimal user experiences across all mobile devices.

First and foremost, according to Google mobile websites typically run on one out of three different configurations:

1. Responsive Web Design

2. Dynamic Serving

3. Separate URLs

Note: It’s critical that Google can clearly understand your website’s mobile setup and which of these three configurations you’re using.

Responsive Web Design

When you use responsive web design, your mobile site will have the same HTML code and content for the same URL regardless of the user’s chosen device. You’ll simply use the meta name=”viewport” tag within your site’s source course to help the Internet browser identify how they should adjust the content. Then, the display settings will change to fit each visitor’s unique screen size.

Benefits of RWD

Responsive web design is very popular among SEO experts everywhere, and it’s even recommended by Google itself. You should definitely consider responsive design because:

It’s easy to share content from a single URL.

Google can easily index your single URL for higher search engine rankings.

You’ll find it convenient to maintain multiple pages for the same content.

This design avoids common SEO and formatting mistakes.

There won’t be much additional setup time.

Googlebot will use less resources and make crawling more efficient.

Users won’t have to deal with redirects, which offers shorter page download times.

Example of RWD for mobile device – how not to do it!

Dynamic Serving

Dynamic serving configurations are designed to have the server respond with different HTML and CSS code on the same URL depending on the user’s device. For this, you’ll need to properly use the Vary HTTP header to signal changes based on the user-agent’s page requests. Valid headers tell the browser how to display the content and help Googlebot discover that your website has mobile-optimized content much faster.

Separate URLs

As the name suggests, this setup configuration involves having different URLs for your website to successfully display your content on different mobile devices. Each URL is equipped with different HTML code for every respective screen size.

Avoiding common mistakes in mobile SEO 1. Don’t block Javascript, CSS, and image files

It’s common for some developers to block some of the Javascripts, CSS, and image files on their website, which is actually against Google’s guidelines. The best approach is to keep all files visible to the search engine’s crawlers. You should also:

Use ‘Fetch as Google’ through the Google Webmaster tools to guarantee your website’s CSS, Javascript, and images are completely crawlable.

Check chúng tôi to make certain you’re not hiding any pages from Google.

Ensure any redirects to separate URLs are functioning properly according to each mobile device.

2. Optimize unplayable content

At times, video content available in a desktop version can’t properly run on mobile devices, which doesn’t lend for a good user experience.

It’s suggested that you use HTML5 for video embedding on all animations to improve your website’s usability on all devices. Plus, you should avoid flash to maintain content that’s easy for search engines to understand.

3. Fix faulty redirects and cross links

As you would for your standard desktop version, carefully remove your crawl errors found in the Google webmaster tool.

You should plan for a regular website health check to make sure your mobile-friendly design is always operating correctly. Well-maintained websites always perform best when vying for spots on Google’s search engine result page.

4. Steer clear of Mobile-only 404s

Some websites serve content to desktop users but show an unsightly 404 error page for mobile users accessing the same URL. Since this is awkward for mobile visitors, it’s recommended that you redirect them to an equivalent mobile page at a different URL instead. Make sure your mobile-friendly page is properly configured to avoid ever showing an error message that will turn away potential business.

5. Keep your site lightning fast

6.  Use ‘Rel=Alternate Media’ 7.  Add the “Vary:User-Agent” HTTP Header 8. Use ‘Rel-Canonical’

Canonical tags are used to avoid issues with duplicated content. Adding the ‘Rel-Canonical’ tag onto your website’s mobile version will help Google properly index all pages and avoid flagging any unoriginal content. This will also prevent confusion by consolidating indexing and ranking signals, such as external links.

9.  Optimize Titles and Meta Descriptions

Since mobile devices have smaller screen sizes, it’s important to keep your website’s information as concise and meaningful as possible.  Make sure you follow all on-page factors like titles and meta-descriptions with keyword-rich content for the best SEO results.

10. Use Structured data

Individuals surfing through websites on their mobile devices don’t want to wait around for any new windows to open. Users and Google prefer fast-loading, lightweight websites that efficiently open in just a second or two. Check your average download time through the page speed tool and fix any delaying errors you find.

After the Hummingbird update, structured data become very important for boosting Google ranking factors

Thanks to GPS, local businesses tend to get the most mobile traffic that turns into desktop traffic and sales. Thus, it’s critical that you optimize your website for local searches by adding in your company’s name, address, phone number, and call out actions.

12. Build mobile sitemaps

Last but certainly not least, create an XML sitemap for your website’s mobile version. Keep your mobile pages separate from the desktop ones to quickly identify any indexing troubles.

Benefits of Mobile-friendly designs

Overall, when you invest in giving your website a makeover with any of the three mobile SEO configurations, you can look forward to:

More website traffic

Improved user experiences

Higher conversion rates

Increased time spend on your website

Lower bounce rates

Faster page loading times

More customer engagement

Improved search engine performance

Flight Simulators: An Introduction

Think back to when you were younger, did you want to be a farmer, train driver, or even a pilot? If you’ve always wanted to see what it would be like, there is a simulator for just about everything these days. Whether you want to recapture a bit of that childhood dream, or even use it as a tool to progress your career, simulators are an appealing genre.

Flight simulators are some of the oldest and most evolved simulators you can get your hands on, with multiple iterations going back almost 40 years. There are simulators catered towards different users, flight styles, and even budgets.

With the highly anticipated Microsoft Flight Simulator 2023 on the way this year, we thought that now would be a great time to explore a bit of the history of flight simulators, the evolution of the games, and even the best hardware to play them on.

Microsoft Flight Simulator

One of the oldest flight simulators available is Microsoft Flight Simulator, which dates back to the early ’80s with the release of Flight Simulator 1.0. As you can see below, this simulator is pretty dated by today’s standards, but at the time it acted as a gateway into the world of aviation.

Microsoft continued to improve on this success with the release of Flight Simulator 2.0, 3.0, 4.0, 5.0, and 5.1. Each of these versions further refined the graphics, added additional aircraft, airports, and textures. Flight Simulator 5.1 also added the ability to have scenery libraries which included the use of satellite imagery when flying.

When Windows 95 was released, Microsoft also developed a version of its flight simulator for the platform. This featured a lot more 3D modeling, improved frame rates, and expanded scenery to outside of Europe and the USA.

During the 2000s, Microsoft developed Flight Simulator 2000, 2002, and most notably Flight Simulator X. These titles massively increased the number of airports in the game as well as adding more instruments found in real-life aircraft, including a GPS feature. Flight Simulator X even included multiplayer which allowed for two players to pilot the same plane as well as occupy control towers.

Flight Simulator X also made its way onto Steam in 2014, re-released as Microsoft Flight Simulator X: Steam Edition. This edition is updated to use Steam’s functionality and also allows for an incredible amount of content to be easily purchased and installed alongside its 24 aircraft.

Finally, the latest edition of Microsoft’s flight simulators is set to be released on August 18th, 2023. This title is set to simulate the entire Earth and recreates 3D models of buildings and geographical features.

Microsoft has said that Microsoft Flight Simulator 2023 will also feature more than 40,000 airports and over two million cities.


One of the main competitors of Microsoft’s flight simulator series is X-Plane. Originally released back in 1995, X-Plane is now on its 11th version and is well known for its realism and attention to detail.

Featuring an improved model of simulation for its aircraft, X-Plane quickly won over flight enthusiasts who strive for realism. X-Plane also allows pilots to connect with each other in multiplayer and has helped spawn a lot of tight-knit flying communities.

One downside with X-Plane (and many other flight simulators) is that although this is a great looking game, a lot of the scenery and files needed to make it look its best are only available as payware.


Developed by Lockheed Martin, Prepar3D is a flight simulator that aims more towards the professional crowd. The official website states that it is “Ideal for commercial, academic, professional, or military instruction. Prepar3D can be used to quickly create learning scenarios anywhere in the virtual world”.

This simulator is often used by those training for their real-life pilot license, which speaks volumes for its realism. It’s not cheap though, a professional license will set you back $199.00, but you can pick up an academic license for $59.95.


If you’d like to get a taste for flying before jumping into a paid simulator, FlightGear is a free, open-source option that does a great job. Aiming to become a simulator used in academic environments, pilot training, and of course, a gaming environment, FlightGear features three different flight dynamic models to play around with.

There are over 20,000 accurate real-world airports, a detailed sky model, and even multi-screen support. Although it isn’t the best looking simulator on the market, it doesn’t look horrible and even has some fairly moderate hardware requirements.

Combat Flight Simulators

Moving on from traditional flight simulators, a lot of people fell in love with flight simulators first through a combat simulator. Offering a more fast-paced experience than a traditional flying simulator, combat flight simulators put you in the hot seat of some of the world’s quickest and most dangerous aircraft.

If you’re familiar with the Microsoft Flight Simulator series, they also offer the Microsoft Combat Flight Simulator series to try out – although it’s starting to show its age now.

One of the most popular games available at the moment is the IL-2 Sturmovik series. This is a World War II combat flight simulator with a focus on air battles from the Eastern front. Although this is a fairly old title, there have been numerous updates and content packs released that make it a joy to play today.

If you’re looking for more of a modern feel, Digital Combat Simulator is a realistic simulation of military aircraft which boasts some pretty impressive graphics. You can currently pick up a free version to try out which has a limited amount of vehicles and airspace. This is a continuously developed game that offers some of the most detailed military aircraft found in any sim.

You can typically find an air combat simulator for almost every major military conflict over the last 100 years. If you aren’t striving for simulation, some games such as Tom Clancy’s H.A.W.X offer a more casual approach, without sacrificing any of the action.

Taking Your Simulation to the Next Level

If you’re pretty dedicated to simulators, you’ll have no doubt bought some type of peripheral in the past. There are steering wheels for racing games, rail controllers for train simulators, and of course, a whole load of options for the frequent fliers.

It is possible to experience a flight simulator with just a mouse and keyboard or even a controller, but this can quickly become frustrating and take away a lot of the enjoyment. Luckily there are some pretty cheap options available when it comes to joysticks.

At the most basic level, you can pick up a joystick or HOTAS (hands-on throttle-and-stick) for around $30 which will allow you to control your aircraft with more precision. Some will even have a separate throttle and most will give you a couple of buttons that you can map to functions such as flaps or trim.

As with most things though, the sky’s the limit. You can easily spend up to $500 on a decent joystick and the price starts to further increase if you opt to purchase a yoke and some rudder pedals. If you’re just starting out on your journey, don’t feel the need to splash out. You may want to save some of that money for DLC or different simulators.

Companies such as Thrustmaster, Logitech, and CH Products all offer great products suitable for different budget levels.

In most flight simulators, you can also pick up extra aircraft from third-party developers. This isn’t cheap, however. A lot of models easily match the price of the game and some can cost in excess of $100. You are paying for quality and attention to detail in these models though and they are definitely aimed towards a hardcore user.

Can Your System Handle a Simulator?

If we take another quick look at the upcoming Microsoft Flight Simulator 2023 trailer, we can see that it is a pretty impressive looking game. With so much rendered and calculated at any one time, this can start to take a toll on your system.

You will be able to turn some of these features down, reduce graphics and objects or even tone down the realism settings if needed. However, with a few simple upgrades, you should be able to get the most out of your simulator.

You’ll want to focus on your processor and graphics card as a priority for flight simulators. Although Microsoft has stated that their 2023 flight simulator won’t be as resource heavy as those in the past, you’ll still want a decent PC if you’re planning on playing other intensive simulators.

Unfortunately, a lot of simulators don’t use multi-threading to the best of its ability at the moment, but a decent AMD Ryzen processor with a high core count would make a good starting point for any Flight Simulator build.

Pair this with a graphics card with a decent amount of VRAM and a system with at least 16 GB of RAM and you should be in for a comfortable flight. If you’d like some ideas on where to get started with your upgrade, why not check out some of our build guides?

Final Word

Flight Simulators are a perfect way to spend a couple of hours flying through the clouds. We hope this introduction to flight simulators has helped you pick out one to try next or even given you some tips on upgrading your PC for the latest flight sims.

With the launch of Microsoft Flight Simulator 2023 coming in just a few short weeks, we can’t wait to see what it’ll be like. At first glance, the graphics look exceptional so you’ll definitely want to take a look at the system requirements for those before purchasing to get the most out of the game.

Introduction To Google Firebase Cloud Storage Using Python

This article was published as a part of the Data Science Blogathon.


Firebase is a very popular Backend as a Service (BaaS) offered by Google. It aims to replace conventional backend servers for web and mobile applications by offering multiple services on the same platform like authentication, real-time database, Firestore (NoSQL database), cloud functions, machine learning, cloud storage, and many more. These services are cloud-based and production-ready that can automatically scale as per demand without any need for configuration.

In my previous article, I covered Google Firestore, a cloud-based NoSQL database offered by Firebase. You can read my previous article on Google Firestore here. One such offering is Cloud Storage, which is a powerful yet simple storage service offered by Firebase. The Cloud Storage offering in Firebase is the Google cloud storage available on the Google Cloud Platform (GCP). The free-tier version provides 5GB of storage space for a bucket. In this article, we will learn about cloud storage and how it can be used to store and access files securely over the internet using python.

Setting up Firebase to access Cloud Storage Connecting Python to Cloud Storage

To connect to Google Firestore, we need to install a python package called “firebase-admin.” This can be installed like any other python package using pip. Ensure that your python version is 3.6 or below, as this module throws an exception because of the async module added in python 3.7 onwards. If you have a higher version installed, you can use anaconda to create a new environment with python 3.6. Run the following commands to create and activate a new environment in the anaconda.

conda create -n cloud_storage_env python=3.6.5 conda activate cloud_storage_env

To install the “firebase-admin” package, run the following.

pip install firebase-admin

Now that we have the credentials let’s connect to Firebase and start accessing the cloud storage service. To do so, paste the following code snippet shown below and add the file path of the credentials file that got downloaded in the previous step. You can find your storage bucket link in your Firebase cloud storage console.

import firebase_admin from firebase_admin import credentials, storage cred = credentials.Certificate("path/to/your/credentials.json") firebase_admin.initialize_app(cred,{'storageBucket': 'your_bucket_link_without_gs://'}) # connecting to firebase

Now that we have connected to Firebase let’s try to use the cloud storage service.

Using Google Cloud Storage

Now consider that you maintain a folder structure on your server and wish to replicate the same folder structure in your storage bucket as well. For this, we can directly use the “upload_from_filename()” function, which is a property of the blob object. This function will replicate the folder structure of each file that is being uploaded. This means that if you have a text file inside a folder named “text_files”, the same folder structure will also be replicated in your storage bucket. Now, let’s see how to use this function to upload files to our storage bucket.

Firstly, I will upload an image file present in the root directory to our storage bucket. Once that is done, I will try to upload a text file present inside a folder named “text_docs” to our storage bucket using the above-described function.

file_path = "sample_image_file.jpg" bucket = storage.bucket() # storage bucket blob = bucket.blob(file_path) blob.upload_from_filename(file_path)

We can see that the image file has been uploaded to our storage bucket in the root directory. Now let’s try to upload the text file present inside the “text_docs directory.”

file_path = "text_docs/sample_text_file.txt" bucket = storage.bucket() # storage bucket blob = bucket.blob(file_path) blob.upload_from_filename(file_path)

We can see that the text file has been uploaded inside the text_docs folder, just like it is on our local machine.

Now consider that you do not maintain a folder structure on your server and wish to maintain a proper folder structure in your storage bucket. For this, we can also use the “upload_from_filename()” function with a slight modification. Let’s try to upload the image file inside a folder named “images”. On our local machine, the image file is present in the root directory and there is no folder named images. We will also rename the image file while storing it in the storage bucket.

from import storage from google.oauth2 import service_account def upload_blob(bucket_name, source_file_name, destination_blob_name): credentials = service_account.Credentials.from_service_account_file("path/to/your/credentials.json") storage_client = storage.Client(credentials=credentials) bucket = storage_client.bucket(bucket_name) blob = bucket.blob(destination_blob_name) blob.upload_from_filename(source_file_name) print(f"File {source_file_name} uploaded to {destination_blob_name}.") upload_blob(, 'sample_image_file.jpg', 'images/beatiful_picture.jpg')

Now let’s see if the image from our root directory has been uploaded inside a folder named “images” in our storage bucket. We can see that a new folder called “images” has been created, and the image file has also been uploaded inside that folder.

Now, if you want to access your files from your bucket and want to download them, you can do that easily with a few lines of code. Let’s try downloading the text file we uploaded to our storage bucket inside the text_docs folder and rename the file as “downloaded_file.txt”. The code snippet shown below will download the file to our local machine.

credentials = service_account.Credentials.from_service_account_file("path/to/your/credentials.json") storage.Client(credentials=credentials).bucket('text_docs/sample_text_file.txt').download_to_filename('downloaded_file.txt')

Now, if you want to share the files over the internet or want them to be public, you can directly access the “public_url” property of the blob object that returns a URL for that file. Let’s try to get the URL of all the files present in our storage bucket. To do so, we first need to get all the files present in our storage bucket and then access their public URL.

credentials = service_account.Credentials.from_service_account_file("path/to/your/credentials.json") files = storage.Client(credentials=credentials).list_blobs( # fetch all the files in the bucket for i in files: print('The public url is ', i.public_url) Conclusion

Understanding how to set up a Firebase project in detail

Uploading and downloading files to and from the cloud-based storage bucket using python

Extracting a public URL for the files from our storage bucket for sharing across the internet

As mentioned earlier, Google Firebase offers a lot of production-ready services for free that are hosted on the Google Cloud. Firebase has been a lifesaver for many front-end developers, who do not have to explicitly know backend programming and frameworks like nodejs, flask, etc., to build a full-stack web or mobile application. If you are interested in learning about other services offered by Google Firebase, you can refer to my article on Firestore, which is a NoSQL database offered by Google. I will try to cover other services Google Firebase offers in the coming weeks, so stay tuned!

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.


An Introduction To The Powerful Bayes’ Theorem For Data Science Professionals


Bayes’ Theorem is one of the most powerful concepts in statistics – a must-know for data science professionals

Get acquainted with Bayes’ Theorem, how it works, and its multiple and diverse applications

Plenty of intuitive examples in this article to grasp the idea behind Bayes’ Theorem


But I’ve seen a lot of aspiring data scientists shunning statistics, especially Bayesian statistics. It remains incomprehensible to a lot of analysts and data scientists. I’m sure a lot of you are nodding your head at this!

Bayes’ Theorem, a major aspect of Bayesian Statistics, was created by Thomas Bayes, a monk who lived during the eighteenth century. The very fact that we’re still learning about it shows how influential his work has been across centuries! Bayes’ Theorem enables us to work on complex data science problems and is still taught at leading universities worldwide.

In this article, we will explore Bayes’ Theorem in detail along with its applications, including in Naive Bayes’ Classifiers and Discriminant Functions, among others. There’s a lot to unpack in this article so let’s get going!

Table of Contents

Prerequisites for Bayes’ Theorem

What is Bayes’ Theorem?

An Illustration of Bayes’ Theorem

Applications of Bayes’ Theorem

Naive Bayes’ Classifiers

Discriminant Functions and Decision Surfaces

Bayesian Parameter Estimation

Demonstration of Bayesian Parameter Estimation

Prerequisites for Bayes’ Theorem

We need to understand a few concepts before diving into the world of Bayes’ Theorem. These concepts are essentially the prerequisites for understanding Bayes’ Theorem.

1. Experiment

What’s the first image that comes to your mind when you hear the word “experiment”? Most people, including me, imagine a chemical laboratory filled with test tubes and beakers. The concept of an experiment in probability theory is actually quite similar:

An experiment is a planned operation carried out under controlled conditions.

Tossing a coin, rolling a die, and drawing a card out of a well-shuffled pack of cards are all examples of experiments.

2. Sample Space

The result of an experiment is called an outcome. The set of all possible outcomes of an event is called the sample space. For example, if our experiment is throwing dice and recording its outcome, the sample space will be:

S1 = {1, 2, 3, 4, 5, 6}

What will be the sample when we’re tossing a coin? Think about it before you see the answer below:

S2 = {H, T} 3. Event

An event is a set of outcomes (i.e. a subset of the sample space) of an experiment.

Let’s get back to the experiment of rolling a dice and define events E and F as:

E = An even number is obtained = {2, 4, 6}

F = A number greater than 3 is obtained = {4, 5, 6}

The probability of these events:

P(E) = Number of favourable outcomes / Total number of possible outcomes = 3 / 6 = 0.5 P(F) = 3 / 6 = 0.5

The basic operations in set theory, union and intersection of events, are possible because an event is a set.

Then, E∪F = {2, 4, 5, 6} and E∩F = {4, 6}

Now consider an event G = An odd number is obtained:

Then E ∩ G = empty set = Φ

Such events are called disjoint events. These are also called mutually exclusive events because only one out of the two events can occur at a time:

4. Random Variable

A Random Variable is exactly what it sounds like – a variable taking on random values with each value having some probability (which can be zero). It is a real-valued function defined on the sample space of an experiment:

Consider that Y is the observed temperature (in Celsius) of a given place on a given day. So, we can say that Y is a continuous random variable defined on the same space, S = [0, 100] (Celsius Scale is defined from zero degree Celsius to 100 degrees Celsius).

5. Exhaustive Events

A set of events is said to be exhaustive if at least one of the events must occur at any time. Thus, two events A and B are said to be exhaustive if A ∪ B = S, the sample space.

For example, let’s say that A is the event that a card drawn out of a pack is red and B is the event that the card drawn is black. Here, A and B are exhaustive because the sample space S = {red, black}. Pretty straightforward stuff, right?

6. Independent Events

If the occurrence of one event does not have any effect on the occurrence of another, then the two events are said to be independent. Mathematically, two events A and B are said to be independent if:

P(A ∩ B) = P(AB) = P(A)*P(B)

For example, if A is obtaining a 5 on throwing a die and B is drawing a king of hearts from a well-shuffled pack of cards, then A and B are independent just by their definition. It’s usually not as easy to identify independent events, hence we use the formula I mentioned above.

7. Conditional Probability

Consider that we’re drawing a card from a given deck. What is the probability that it is a black card? That’s easy – 1/2, right? However, what if we know it was a black card – then what would be the probability that it was a king?

The approach to this question is not as simple.

P(A ∩ B) = P(Obtaining a black card which is a King) = 2/52 P(B) = P(Picking a black card) = 1/2 8. Marginal Probability

It is the probability of an event A occurring, independent of any other event B, i.e. marginalizing the event B.

This is just a fancy way of saying:

P(A) = P(A ∩ B) + P(A ∩ ~B) #from our knowledge of conditional probability

where ~B represents the event that B does not occur.

Let’s check if this concept of marginal probability holds true. Here, we need to calculate the probability that a random card drawn out of a pack is red (event A). The answer is obviously 1/2. Let’s calculate the same through marginal probability with event B as drawing a king.

P(A ∩ B) = 2/52 (because there are 2 kings in red suits, one of hearts and other of diamonds) and P(A ∩ ~B) = 24/52 (remaining cards from the red suit) Therefore, P(A) = 2/52 + 24/52 = 26/52 = 1/2

Perfect! So this is good enough to cover our basics of Bayes’ Theorem. Let’s now take a few moments to understand what exactly Bayes’ Theorem is and how it works.

What is Bayes’ Theorem?

Have you ever seen the popular TV show ‘Sherlock’ (or any crime thriller show)? Think about it – our beliefs about the culprit change throughout the episode. We process new evidence and refine our hypothesis at each step. This is Bayes’ Theorem in real life!

Now, let’s understand this mathematically. This will be pretty simple now that our basics are clear.

Consider that A and B are any two events from a sample space S where P(B) ≠ 0. Using our understanding of conditional probability, we have:

This is the Bayes’ Theorem.

P(A) is called Prior probability and P(B) is called Evidence.

Equivalently, Bayes Theorem can be written as:

posterior = likelihood * prior / evidence An Illustration of Bayes’ Theorem

Let’s solve a problem using Bayes’ Theorem. This will help you understand and visualize where you can apply it. We’ll take an example which I’m sure almost all of us have seen in school.

There are 3 boxes labeled A, B, and C:

Box A contains 2 red and 3 black balls

Box B contains 3 red and 1 black ball

And box C contains 1 red ball and 4 black balls

The three boxes are identical and have an equal probability of getting picked. Consider that a red ball is chosen. Then what is the probability that this red ball was picked out of box A?

We have prior probabilities P(A) = P(B) = P (C) = 1 / 3, since all boxes have equal probability of getting picked. = (2/5) * (1/3) + (3/4) * (1/3) + (1/5) * (1/3) = 0.45 Applications of Bayes’ Theorem

There are plenty of applications of the Bayes’ Theorem in the real world. Don’t worry if you do not understand all the mathematics involved right away. Just getting a sense of how it works is good enough to start off.

Bayesian Decision Theory is a statistical approach to the problem of pattern classification. Under this theory, it is assumed that the underlying probability distribution for the categories is known. Thus, we obtain an ideal Bayes Classifier against which all other classifiers are judged for performance.

We will discuss the three main applications of Bayes’ Theorem:

Naive Bayes’ Classifiers

Discriminant Functions and Decision Surfaces

Bayesian Parameter Estimation

Let’s look at each application in detail.

Naive Bayes’ Classifiers

This is probably the most famous application of Bayes’ Theorem, probably even the most powerful. You’ll come across the Naive Bayes algorithm a lot in machine learning.

Naive Bayes’ Classifiers are a set of probabilistic classifiers based on the Bayes’ Theorem. The underlying assumption of these classifiers is that all the features used for classification are independent of each other. That’s where the name ‘naive’ comes in since it is rare that we obtain a set of totally independent features.

The way these classifiers work is exactly how we solved in the illustration, just with a lot more features assumed to be independent of each other.

Let’s talk about the famous Titanic dataset. We have the following features:

Remember the problem statement? We need to calculate the probability of survival conditional to all the other variables available in the dataset. Then, based on this probability, we predict if the person survived or not, i.e, class 1 or 0.

This is where I pass the buck to you. Refer to our popular article to learn about these Naive Bayes classifiers along with the relevant code in both Python and R, and try solving the Titanic dataset yourself.

You can also enrol in our free course to learn about this interesting algorithm in a structured way: Naive Bayes from Scratch

Discriminant Functions and Surfaces

The name is pretty self-explanatory. A discriminant function is used to “discriminate” its argument into its relevant class. Want an example? Let’s take one!

You might have come across Support Vector Machines (SVM) if you have explored classification problems in machine learning. The SVM algorithm classifies the vectors by finding the differentiating hyperplane which best segregates our training examples. This hyperplane can be linear or non-linear:

These hyperplanes are our decision surfaces and the equation of this hyperplane is our discriminant function. Make sure you check out our article on Support Vector Machine. It is thorough and includes code both in R and Python.

Alright – now let’s discuss the topic formally.

Let w_1, w_2, ….., w_c denote the c classes that our data vector X can be classified into. Then the decision rule becomes:

These functions g_i(X), i = 1, 2, …., c, are known as Discriminant functions. These functions separate the vector space into c decision regions – R_1, R_2, …., R_c corresponding to each of the c classes. The boundaries of these regions are called decision surfaces or boundaries.

If g_i(X) = g_j(X) is the largest value out of the c discriminant functions, then the classification of vector X into class w_i and w_j is ambiguous. So, X is said to lie on a decision boundary or surface.

Check out the below figure:

Source: Duda, R. O., Hart, P. E., & Stork, D. G. (2012). Pattern classification. John Wiley & Sons.

It’s a pretty cool concept, right? The 2-dimensional vector space is separated into two decision regions, R_1 and R_2, separated by two hyperbolas.

Note that any function f(g_i(X)) can also be used as a discriminant function if f(.) is a monotonically increasing function. The logarithm function is a popular choice for f(.).

Now, consider a two-category case with classes w _1 and w_2. The ‘minimum error-rate classification‘ decision rule becomes:

Notice that the ‘evidence’ on the denominator is merely used for scaling and hence we can eliminate it from the decision rule.

Thus, an obvious choice for the discriminant functions is:

The 2-category case can generally be classified using a single discriminant function.

g(X) = g_1(X) - g_2(X) Decide w_2, if g(X) < 0 if g(X) = 0, X lies on the decision surface.

In the figure above, g(X) is a linear function in a 2-dimensional vector X. However, more complicated decision boundaries are also possible:

Bayesian Parameter Estimation

This is the third application of Bayes’ Theorem. We’ll use univariate Gaussian Distribution and a bit of mathematics to understand this. Don’t worry if it looks complicated – I’ve broken it down into easy-to-understand terms.

You must have heard about the ultra-popular IMDb Top 250. This is a list of 250 top-rated movies of all time. Shawshank Redemption is #1 on the list with a rating of 9.2/10.

How do you think these ratings are calculated? The original formula used by IMDb claimed to use a “true Bayesian estimate”. The formula has since changed and is not publicly disclosed. Nonetheless, here is the previous formula:

The final rating W is a weighted average of R and C with weights v and m respectively. m is a prior estimate.

As the number of votes, v, increases and surpasses m, the minimum votes required, W, approaches the straight average for the movie, R

As v gets closer to zero (less number of votes are cast for the movie), W approaches the mean rating for all films, C

We generally do not have complete information about the probabilistic nature of a classification problem. Instead, we have a vague idea of the situation along with a number of training examples. We then use this information to design a classifier.

The basic idea is that the underlying probability distribution has a known form. We can, therefore, describe it using a parameter vector Θ. For example, a Gaussian distribution can be described by Θ  = [μ, σ²].

Source: Wikipedia

Then, we need to estimate this vector. This is generally achieved in two ways:

Maximum Likelihood Estimation (MLE): The assumption is that the underlying probability distribution p(Θ) has an unknown but fixed parameter vector. The best estimate maximizes the likelihood function:

I recommend reading this article to get an intuitive and in-depth explanation of maximum likelihood estimation along with a case study in R.

Bayesian Parameter Estimation – In Bayesian Learning, Θ is assumed to be a random variable as opposed to an “unknown but fixed” value in MLE. We use training examples to convert a distribution on this variable into a posterior probability density.

We can write it informally as:

Key points you should be aware of:

Any prior information that we might have about Θ is contained in a known prior probability density p(Θ)

Demonstration of Bayesian Parameter Estimation – Univariate Gaussian Case

Let me demonstrate how Bayesian Parameter Estimation works. This will provide further clarity on the theory we just covered.

First, let p(X) be normally distributed with a mean μ and variance σ², where μ is the only unknown parameter we wish to estimate. Then:

We’ll ease up on the math here. So, let prior probability density p(μ) also be normally distributed with mean µ’ and variance σ’² (which are both known).

My observations:

As n increases, σ_n² decreases. Hence, uncertainty in our estimate decreases

Since uncertainty decreases, the density curve becomes sharply peaked at its mean μ_n:

Here’s a superb and practical implementation of Bayesian Statistics and estimation:

End Notes

The beauty and power of Bayes’ Theorem never cease to amaze me. A simple concept, given by a monk who died more than 250 years ago, has its use in some of the most prominent machine learning techniques today.


Implementing Artificial Neural Network(Classification) In Python From Scratch

This article was published as a part of the Data Science Blogathon

Neural networks. One of the booming technological breakthroughs in the 21st century.

Are you interested in creating your own neural network from scratch in Python?. Well, you are at the right place. In this article, we will be creating an artificial neural network from scratch in python. The Artificial Neural Network that we are going to develop here is the one that will solve a classification problem. So stretch your fingers, and let’s get started.

Interesting Sidenote

Artificial Neural Networks(ANN) are part of supervised machine learning where we will be having input as well as corresponding output present in our dataset. Our whole aim is to figure out a way of mapping this input to the respective output. ANN can be used for solving both regression and classification problems.

From the perspective of this blog, we will be developing an ANN for solving the classification class of problems.

Pre-Requisites for Artificial Neural Network Implementation

Following will be the libraries and software that we will be needing in order to implement ANN.

1. Python – 3.6 or later

2. Jupyter Notebook ( Google Colab can also be used )

3. Pandas

4. Numpy

5. Tensorflow 2. x

6. Scikit-Learn

Understanding the Problem Statement for Artificial Neural Network

Here we are dealing with a dataset from the finance domain. We have a dataset where we are having 14 dimensions in total and 100000 records. The dimensions that we will be dealing with are as follows:-

RowNumber:- Represents the number of rows

CustomerId:- Represents customerId

Surname:- Represents surname of the customer

CreditScore:- Represents credit score of the customer

Geography:- Represents the city to which customers belongs to

Gender:- Represents Gender of the customer

Age:- Represents age of the customer

Tenure:- Represents tenure of the customer with a bank

Balance:- Represents balance hold by the customer

NumOfProducts:- Represents the number of bank services used by the customer

HasCrCard:- Represents if a customer has a credit card or not

IsActiveMember:- Represents if a customer is an active member or not

EstimatedSalary:- Represents estimated salary of the customer

Exited:- Represents if a customer is going to exit the bank or not.

Structure of dataset

As we can see from the above data dictionary, we are dealing with a total of 14 dimensions.

Here our main goal is to create an artificial neural network that will take into consideration all independent variables(first 13) and based on that will predict if our customer is going to exit the bank or not(Exited is dependent variable here).

Once we understand the steps for constructing neural networks, we can directly implement those same steps to other datasets as well.

One of the ways where we can find such datasets is the UCI machine learning repository. These datasets are classified into regression and classification problems. Since we are implementing this neural network to solve classification problems, you can download any classification dataset from there and can apply the same steps on any dataset of your choice !. How cool is that?

Importing Necessary Libraries for Artificial Neural Network

Let’s import all the necessary libraries here

#Importing necessary Libraries import numpy as np import pandas as pd import tensorflow as tf Importing Dataset

In this step, we are going to import our dataset. Since our dataset is in csv format, we are going to use the read_csv() method of pandas in order to load the dataset.

#Loading Dataset data = pd.read_csv("Churn_Modelling.csv") Generating Matrix of Features (X)

The basic principle while creating a machine learning model is to generate X also called as Matrix of Features. This X basically contains all our independent variables. Let’s create the same here.

Python Code:

Here I have used iloc method of Pandas data frame which allows us to fetch the desired values from the desired column within the dataset. Here as we can see that we are fetching all the data from the 3rd column till the last minus one column. The reason for that is the first 3 columns i.e RowNumber, CustomerId, and Surname have nothing to do with deciding whether the customer is going to exit or not. Hence in this case we started fetching all the values from the 3rd column onwards. Lastly, since our last column is basically a dependent variable hence we have mentioned -1 in iloc method using which allows us to exclude the last column from being included in our matrix of features X.

Generating Dependent Variable Vector(Y)

In the same fashion where we have created our matrix of features(X) for the independent variable, we also have to create a dependent variable vector(Y) which will only contain our dependent variable values.

#Generating Dependent Variable Vectors Y = data.iloc[:,-1].values Encoding Categorical Variable Gender

Now we have defined our X and Y, from this point on we are going to start with one of the highly time-consuming phases in any machine learning problem-solving. This phase is known as feature engineering. To define it in a simple manner, feature engineering is a phase where we either generate new variables from existing ones or modify existing variables so as to use them in our machine learning model.

In the above image depicting the structure of the dataset, we can see that most of the variables are numeric in nature with exception of a few – Gender, Country. Essentially, a machine learning model is a mathematical formula that is only going to accept digits as input. So we try to create an ML model using this dataset which contains a mix of data( numeric + string), our model will simply fail during the creation process itself. Hence we need to convert those string values into their numerical equivalent without losing their significance.

One of the most efficient ways of doing this is by using a technique called encoding. It is a process that will convert strings or categories directly into their numerical equivalent without losing significance.

Here our gender column has only 2 categories which are male and female, we are going to use LabelEncoding. This type of encoding will simply convert this column into a column having values of 0 and 1. In order to use Label Encoding, we are going to use LabelEncoder class from sklearn library.

#Encoding Categorical Variable Gender from sklearn.preprocessing import LabelEncoder LE1 = LabelEncoder() X[:,2] = np.array(LE1.fit_transform(X[:,2]))

Here we have applied label encoding on the Gender column of our dataset.

Encoding Categorical Variable Country

Now let’s deal with another categorical column named country. This column has a cardinality of 3 meaning that it has 3 distinct categories present i.e France, Germany, Spain.

Here we have 2 options:-

1. We can use Label Encoding here and directly convert those values into 0,1,2 like that

2. We can use One Hot Encoding here which will convert those strings into a binary vector stream. For example – Spain will be encoded as 001, France will be 010, etc.

The first approach is easy and faster to implement. However, once those values are encoded, those will be converted into 0,1,2. However, there does exist another method of encoding known as one-hot encoding. In one hot encoding, all the string values are converted into binary streams of 0’s and 1’s. One-hot encoding ensures that the machine learning algorithm does not assume that higher numbers are more important.

#Encoding Categorical variable Geography from sklearn.preprocessing import OneHotEncoder ct =ColumnTransformer(transformers=[('encoder',OneHotEncoder(),[1])],remainder="passthrough") X = np.array(ct.fit_transform(X))

Here we have used OneHotEncoder class from sklearn in order to perform one-hot encoding. Now you might have a query here. What is the use of ColumnTransformer? Well, ColumnTransformer is another class in sklearn that will allow us to select a particular column from our dataset on which we can apply one-hot encoding.

Splitting Dataset into Training and Testing Dataset

In this step, we are going to split our dataset into training and testing datasets. This is one of the bedrocks of the entire machine learning process. The training dataset is the one on which our model is going to train while the testing dataset is the one on which we are going to test the performance of our model.

#Splitting dataset into training and testing dataset from sklearn.model_selection import train_test_split X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.2,random_state=0)

Here we have used the train_test_split function from the sklearn library. We have split our dataset in a configuration such that 80 percent of data will be there in the training phase and 20 percent of data will be in the testing phase.

Additionally, the best part about using the train_test_split function from sklearn is that, while splitting it will also be performing data shuffling in order to create a more generalized dataset.

Performing Feature Scaling

The very last step in our feature engineering phase is feature scaling. It is a procedure where all the variables are converted into the same scale. Why you might ask?. Sometimes in our dataset, certain variables have very high values while certain variables have very low values. So there is a chance that during model creation, the variables having extremely high-value dominate variables having extremely low value. Because of this, there is a possibility that those variables with the low value might be neglected by our model, and hence feature scaling is necessary.

Now here I am going to answer one of the most important questions asked in a machine learning interview. ” When to perform feature scaling – before the train-test split or after the train-test split?”.

Well, the answer is after we split the dataset into training and testing datasets. The reason being, the training dataset is something on which our model is going to train or learned itself. While the testing dataset is something on which our model is going to be evaluated. If we perform feature scaling before the train-test split then it will cause information leakage on testing datasets which neglects the purpose of having a testing dataset and hence we should always perform feature scaling after the train-test split.

Now how we are going to perform feature scaling? Well, there are many ways of performing feature scaling. The two most efficient techniques in the context are:-

1. Standardization

2. Normalization

Whenever standardization is performed, all values in the dataset will be converted into values ranging between -3 to +3. While in the case of normalization, all values will be converted into a range between -1 to +1.

There are few conditions on which technique to use and when. Usually, Normalization is used only when our dataset follows a normal distribution while standardization is a universal technique that can be used for any dataset irrespective of the distribution. Here we are going to use Standardization.

#Performing Feature Scaling from sklearn.preprocessing import StandardScaler sc = StandardScaler() X_train = sc.fit_transform(X_train) X_test = sc.transform(X_test)

Here we have used StandardScalar class from the sklearn library in order to perform standardization.

Now we have completed our feature engineering phase. We can now start with the creation of our artificial neural network from the next point onwards.

Initializing Artificial Neural Network

This is the very first step while creating ANN. Here we are going to create our ann object by using a certain class of Keras named Sequential.

#Initialising ANN ann = tf.keras.models.Sequential()

As a part of tensorflow 2.0, Keras is now integrated with tensorflow and is now considered as a sub-library of tensorflow. The Sequential class is a part of the models module of Keras library which is a part of the tensorflow library now.

Creating Hidden Layers

Once we initialize our ann, we are now going to create layers for the same. Here we are going to create a network that will have 2 hidden layers, 1 input layer, and 1 output layer. So, let’s create our very first hidden layer

 #Adding First Hidden Layer ann.add(tf.keras.layers.Dense(units=6,activation="relu"))

Here we have created our first hidden layer by using the Dense class which is part of the layers module. This class accepts 2 inputs:-

1. units:- number of neurons that will be present in the respective layer

2. activation:- specify which activation function to be used

For the first input, I had tested with many values in the past and the optimal value that I had found is 6. Obviously, we can try with any other value as there is no hard rule about the number of neurons that should be present in the layer.

For the second input, we are always going to use “relu”[rectified linear unit] as an activation function for hidden layers. Since we are going to create two hidden layers, this same step we are going to repeat for the creation of the second hidden layer as well.

 #Adding Second Hidden Layer ann.add(tf.keras.layers.Dense(units=6,activation="relu")) Creating Output Layer

In this step, we are going to create our output layer for ann. The output layer will be responsible for giving output.

 #Adding Output Layer ann.add(tf.keras.layers.Dense(units=1,activation="sigmoid"))

Here again, we are going to use the Dense class in order to create the output layer. Two important things to remember here:-

1. In a binary classification problem(like this one) where we will be having only two classes as output (1 and 0), we will be allocating only one neuron to output this result. For the multiclass classification problem, we have to use more than one neuron in the output layer. For example – if our output contains 4 categories then we need to create 4 different neurons[one for each category].

2. For the binary classification Problems, the activation function that should always be used is sigmoid. For a multiclass classification problem, the activation function that should be used is softmax.

Here since we are dealing with binary classification hence we are allocating only one neuron in the output layer and the activation function which is used is softmax.

Compiling Artificial Neural Network

We have now created layers for our neural network. In this step, we are going to compile our ANN.

#Compiling ANN

We have used compile method of our ann object in order to compile our network. Compile method accepts the below inputs:-

1. optimizer:- specifies which optimizer to be used in order to perform stochastic gradient descent. I had experimented with various optimizers like RMSProp, adam and I have found that adam optimizer is a reliable one that can be used with any neural network.

2. loss:- specifies which loss function should be used. For binary classification, the value should be binary_crossentropy. For multiclass classification, it should be categorical_crossentropy.

3. metrics:- which performance metrics to be used in order to compute performance. Here we have used accuracy as a performance metric.

Fitting Artificial Neural Network

This is the last step in our ann creation process. Here we are just going to train our ann on the training dataset.

#Fitting ANN,Y_train,batch_size=32,epochs = 100)

Here we have used the fit method in order to train our ann. The fit method is accepting 4 inputs in this case:-

1.X_train:- Matrix of features for the training dataset

2.Y_train:- Dependent variable vectors for the training dataset

3.batch_size: how many observations should be there in the batch. Usually, the value for this parameter is 32 but we can experiment with any other value as well.

4. epochs: How many times neural networks will be trained. Here the optimal value that I have found from my experience is 100.

Are you interested to see how the training process looks like? Well here is the snap for the same.

Training of Artificial Neural Network

Here we can see that in each epoch our loss is decreasing and our accuracy is increasing. As we can see here that our final accuracy is 86.59 which is pretty remarkable for a neural network with this simplicity.

That’s it :). We have created our artificial neural network from scratch using Python.

As an additional bonus, I am attaching the code below that will allow us to perform single-point prediction for any custom values of input.

Predicting Result for Single Point Observation #Predicting result for Single Observation



Here our neural network is trying to predict whether our customer is going to exit or not based on the values of independent variables


Now we have created our model. I am giving you a pro tip now on how can you save your created neural network.

#Saving created neural network"ANN.h5")

That’s it. Using this one line of code allows us to save our ML model. You might have a query here?

What is the h5 file format? Well, h5 is a specific file format used by neural networks. Using this format we can directly save our neural network as a serialized object. It is similar to the pickle file format implementation that we use for storing traditional machine learning models.

Well, that’s all about implementing neural networks from scratch in Python.

If you’re an enthusiast who is looking forward to unravel the world of Generative AI. Then, please register for our upcoming event, DataHack Summit 2023.


Hope you like this article.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.


Update the detailed information about An Introduction To Graph Theory And Network Analysis (With Python Codes) on the website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!