You are reading the article Introduction To Google Firebase Cloud Storage Using Python updated in December 2023 on the website Daihoichemgio.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested January 2024 Introduction To Google Firebase Cloud Storage Using Python
This article was published as a part of the Data Science Blogathon.
IntroductionFirebase is a very popular Backend as a Service (BaaS) offered by Google. It aims to replace conventional backend servers for web and mobile applications by offering multiple services on the same platform like authentication, real-time database, Firestore (NoSQL database), cloud functions, machine learning, cloud storage, and many more. These services are cloud-based and production-ready that can automatically scale as per demand without any need for configuration.
In my previous article, I covered Google Firestore, a cloud-based NoSQL database offered by Firebase. You can read my previous article on Google Firestore here. One such offering is Cloud Storage, which is a powerful yet simple storage service offered by Firebase. The Cloud Storage offering in Firebase is the Google cloud storage available on the Google Cloud Platform (GCP). The free-tier version provides 5GB of storage space for a bucket. In this article, we will learn about cloud storage and how it can be used to store and access files securely over the internet using python.
Setting up Firebase to access Cloud Storage Connecting Python to Cloud StorageTo connect to Google Firestore, we need to install a python package called “firebase-admin.” This can be installed like any other python package using pip. Ensure that your python version is 3.6 or below, as this module throws an exception because of the async module added in python 3.7 onwards. If you have a higher version installed, you can use anaconda to create a new environment with python 3.6. Run the following commands to create and activate a new environment in the anaconda.
conda create -n cloud_storage_env python=3.6.5 conda activate cloud_storage_envTo install the “firebase-admin” package, run the following.
pip install firebase-adminNow that we have the credentials let’s connect to Firebase and start accessing the cloud storage service. To do so, paste the following code snippet shown below and add the file path of the credentials file that got downloaded in the previous step. You can find your storage bucket link in your Firebase cloud storage console.
import firebase_admin from firebase_admin import credentials, storage cred = credentials.Certificate("path/to/your/credentials.json") firebase_admin.initialize_app(cred,{'storageBucket': 'your_bucket_link_without_gs://'}) # connecting to firebaseNow that we have connected to Firebase let’s try to use the cloud storage service.
Using Google Cloud StorageNow consider that you maintain a folder structure on your server and wish to replicate the same folder structure in your storage bucket as well. For this, we can directly use the “upload_from_filename()” function, which is a property of the blob object. This function will replicate the folder structure of each file that is being uploaded. This means that if you have a text file inside a folder named “text_files”, the same folder structure will also be replicated in your storage bucket. Now, let’s see how to use this function to upload files to our storage bucket.
Firstly, I will upload an image file present in the root directory to our storage bucket. Once that is done, I will try to upload a text file present inside a folder named “text_docs” to our storage bucket using the above-described function.
file_path = "sample_image_file.jpg" bucket = storage.bucket() # storage bucket blob = bucket.blob(file_path) blob.upload_from_filename(file_path)We can see that the image file has been uploaded to our storage bucket in the root directory. Now let’s try to upload the text file present inside the “text_docs directory.”
file_path = "text_docs/sample_text_file.txt" bucket = storage.bucket() # storage bucket blob = bucket.blob(file_path) blob.upload_from_filename(file_path)We can see that the text file has been uploaded inside the text_docs folder, just like it is on our local machine.
Now consider that you do not maintain a folder structure on your server and wish to maintain a proper folder structure in your storage bucket. For this, we can also use the “upload_from_filename()” function with a slight modification. Let’s try to upload the image file inside a folder named “images”. On our local machine, the image file is present in the root directory and there is no folder named images. We will also rename the image file while storing it in the storage bucket.
from google.cloud import storage from google.oauth2 import service_account def upload_blob(bucket_name, source_file_name, destination_blob_name): credentials = service_account.Credentials.from_service_account_file("path/to/your/credentials.json") storage_client = storage.Client(credentials=credentials) bucket = storage_client.bucket(bucket_name) blob = bucket.blob(destination_blob_name) blob.upload_from_filename(source_file_name) print(f"File {source_file_name} uploaded to {destination_blob_name}.") upload_blob(firebase_admin.storage.bucket().name, 'sample_image_file.jpg', 'images/beatiful_picture.jpg')Now let’s see if the image from our root directory has been uploaded inside a folder named “images” in our storage bucket. We can see that a new folder called “images” has been created, and the image file has also been uploaded inside that folder.
Now, if you want to access your files from your bucket and want to download them, you can do that easily with a few lines of code. Let’s try downloading the text file we uploaded to our storage bucket inside the text_docs folder and rename the file as “downloaded_file.txt”. The code snippet shown below will download the file to our local machine.
credentials = service_account.Credentials.from_service_account_file("path/to/your/credentials.json") storage.Client(credentials=credentials).bucket(firebase_admin.storage.bucket().name).blob('text_docs/sample_text_file.txt').download_to_filename('downloaded_file.txt')Now, if you want to share the files over the internet or want them to be public, you can directly access the “public_url” property of the blob object that returns a URL for that file. Let’s try to get the URL of all the files present in our storage bucket. To do so, we first need to get all the files present in our storage bucket and then access their public URL.
credentials = service_account.Credentials.from_service_account_file("path/to/your/credentials.json") files = storage.Client(credentials=credentials).list_blobs(firebase_admin.storage.bucket().name) # fetch all the files in the bucket for i in files: print('The public url is ', i.public_url) Conclusion
Understanding how to set up a Firebase project in detail
Uploading and downloading files to and from the cloud-based storage bucket using python
Extracting a public URL for the files from our storage bucket for sharing across the internet
As mentioned earlier, Google Firebase offers a lot of production-ready services for free that are hosted on the Google Cloud. Firebase has been a lifesaver for many front-end developers, who do not have to explicitly know backend programming and frameworks like nodejs, flask, etc., to build a full-stack web or mobile application. If you are interested in learning about other services offered by Google Firebase, you can refer to my article on Firestore, which is a NoSQL database offered by Google. I will try to cover other services Google Firebase offers in the coming weeks, so stay tuned!
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.
Related
You're reading Introduction To Google Firebase Cloud Storage Using Python
An Introduction To Graph Theory And Network Analysis (With Python Codes)
Introduction
“A picture speaks a thousand words” is one of the most commonly used phrases. But a graph speaks so much more than that. A visual representation of data, in the form of graphs, helps us gain actionable insights and make better data driven decisions based on them.
But to truly understand what graphs are and why they are used, we will need to understand a concept known as Graph Theory. Understanding this concept makes us better programmers (and better data science professionals!).
But if you have tried to understand this concept before, you’ll have come across tons of formulae and dry theoretical concepts. That is why we decided to write this blog post. We have explained the concepts and then provided illustrations so you can follow along and intuitively understand how the functions are performing. This is a detailed post, because we believe that providing a proper explanation of this concept is a much preferred option over succinct definitions.
In this article, we will look at what graphs are, their applications and a bit of history about them. We’ll also cover some Graph Theory concepts and then take up a case study using python to cement our understanding.
Ready? Let’s dive into it.
Table of Contents
Graphs and their applications
History and why graphs?
Terminologies you need to know
Graph Theory Concepts
Getting familiar with Graphs in python
Analysis on a dataset
Graphs and their applicationsLet us look at a simple graph to understand the concept. Look at the image below –
Consider that this graph represents the places in a city that people generally visit, and the path that was followed by a visitor of that city. Let us consider V as the places and E as the path to travel from one place to another.
V = {v1, v2, v3, v4, v5} E = {(v1,v2), (v2,v5), (v5, v5), (v4,v5), (v4,v4)}The edge (u,v) is the same as the edge (v,u) – They are unordered pairs.
Concretely – Graphs are mathematical structures used to study pairwise relationships between objects and entities. It is a branch of Discrete Mathematics and has found multiple applications in Computer Science, Chemistry, Linguistics, Operations Research, Sociology etc.
The Data Science and Analytics field has also used Graphs to model various structures and problems. As a Data Scientist, you should be able to solve problems in an efficient manner and Graphs provide a mechanism to do that in cases where the data is arranged in a specific way.
Formally,
A Graph is a pair of sets. G = (V,E). V is the set of vertices. E is a set of edges. E is made up of pairs of elements from V (unordered pair)
A DiGraph is also a pair of sets. D = (V,A). V is the set of vertices. A is the set of arcs. A is made up of pairs of elements from V (ordered pair)
In the case of digraphs, there is a distinction between `(u,v)` and `(v,u)`. Usually the edges are called arcs in such cases to indicate a notion of direction.
There are packages that exist in R and Python to analyze data using Graph theory concepts. In this article we will be briefly looking at some of the concepts and analyze a dataset using Networkx Python package.
from IPython.display import Image Image('images/network.PNG') Image('images/usecase.PNG')From the above examples it is clear that the applications of Graphs in Data Analytics are numerous and vast. Let us look at a few use cases:
Marketing Analytics – Graphs can be used to figure out the most influential people in a Social Network. Advertisers and Marketers can estimate the biggest bang for the marketing buck by routing their message through the most influential people in a Social Network
Banking Transactions – Graphs can be used to find unusual patterns helping in mitigating Fraudulent transactions. There have been examples where Terrorist activity has been detected by analyzing the flow of money across interconnected Banking networks
Supply Chain – Graphs help in identifying optimum routes for your delivery trucks and in identifying locations for warehouses and delivery centres
Pharma – Pharma companies can optimize the routes of the salesman using Graph theory. This helps in cutting costs and reducing the travel time for salesman
Telecom – Telecom companies typically use Graphs (Voronoi diagrams) to understand the quantity and location of Cell towers to ensure maximum coverage
History and Why Graphs? History of GraphsIf you want to know more on how the ideas from graph has been formlated – read on!
The origin of the theory can be traced back to the Konigsberg bridge problem (circa 1730s). The problem asks if the seven bridges in the city of Konigsberg can be traversed under the following constraints
no doubling back
you end at the same place you started
This is the same as asking if the multigraph of 4 nodes and 7 edges has an Eulerian cycle (An Eulerian cycle is an Eulerian path that starts and ends on the same Vertex. And an Eulerian path is a path in a Graph that traverses each edge exactly once. More Terminology is given below). This problem led to the concept of Eulerian Graph. In the case of the Konigsberg bridge problem the answer is no and it was first answered by (you guessed it) Euler.
In 1840, A.F Mobius gave the idea of complete graph and bipartite graph and Kuratowski proved that they are planar by means of recreational problems. The concept of tree, (a connected graph without cycles) was implemented by Gustav Kirchhoff in 1845, and he employed graph theoretical ideas in the calculation of currents in electrical networks or circuits.
In 1852, Thomas Gutherie found the famous four color problem. Then in 1856, Thomas. P. Kirkman and William R.Hamilton studied cycles on polyhydra and invented the concept called Hamiltonian graph by studying trips that visited certain sites exactly once. In 1913, H.Dudeney mentioned a puzzle problem. Eventhough the four color problem was invented it was solved only after a century by Kenneth Appel and Wolfgang Haken. This time is considered as the birth of Graph Theory.
Caley studied particular analytical forms from differential calculus to study the trees. This had many implications in theoretical chemistry. This lead to the invention of enumerative graph theory. Any how the term “Graph” was introduced by Sylvester in 1878 where he drew an analogy between “Quantic invariants” and covariants of algebra and molecular diagrams.
In 1941, Ramsey worked on colorations which lead to the identification of another branch of graph theory called extremel graph theory. In 1969, the four color problem was solved using computers by Heinrich. The study of asymptotic graph connectivity gave rise to random graph theory. The histories of Graph Theory and Topology are also closely related. They share many common concepts and theorems.
Image('images/Konigsberg.PNG', width = 800) Why Graphs?Here are a few points that help you motivate to use graphs in your day-to-day data science problems –
Graphs provide a better way of dealing with abstract concepts like relationships and interactions. They also offer an intuitively visual way of thinking about these concepts. Graphs also form a natural basis for analyzing relationships in a Social context
Graph Databases have become common computational tools and alternatives to SQL and NoSQL databases
Graphs are used to model analytics workflows in the form of DAGs (Directed acyclic graphs)
Some Neural Network Frameworks also use DAGs to model the various operations in different layers
Graph Theory concepts are used to study and model Social Networks, Fraud patterns, Power consumption patterns, Virality and Influence in Social Media. Social Network Analysis (SNA) is probably the best known application of Graph Theory for Data Science
It is used in Clustering algorithms – Specifically K-Means
System Dynamics also uses some Graph Theory concepts – Specifically loops
Path Optimization is a subset of the Optimization problem that also uses Graph concepts
From a Computer Science perspective – Graphs offer computational efficiency. The Big O complexity for some algorithms is better for data arranged in the form of Graphs (compared to tabular data)
Terminology you should knowBefore you go any further into the article, it is recommended that you should get familiar with these terminologies.
The vertices u and v are called the end vertices of the edge (u,v)
If two edges have the same end vertices they are Parallel
An edge of the form (v,v) is a loop
A Graph is simple if it has no parallel edges and loops
A Graph is said to be Empty if it has no edges. Meaning E is empty
A Graph is a Null Graph if it has no vertices. Meaning V and E is empty
A Graph with only 1 Vertex is a Trivial graph
Edges are Adjacent if they have a common vertex. Vertices are Adjacent if they have a common edge
The degree of the vertex v, written as d(v), is the number of edges with v as an end vertex. By convention, we count a loop twice and parallel edges contribute separately
Isolated Vertices are vertices with degree 1. d(1) vertices are isolated
A Graph is Complete if its edge set contains every possible edge between ALL of the vertices
A Walk in a Graph G = (V,E) is a finite, alternating sequence of the form
V
i
E
i
ViEi
consisting of vertices and edges of the graph G
A Walk is Open if the initial and final vertices are different. A Walk is Closed if the initial and final vertices are the same
A Walk is a Trail if ANY edge is traversed atmost once
A Trail is a Path if ANY vertex is traversed atmost once (Except for a closed walk)
A Closed Path is a Circuit – Analogous to electrical circuits
Graph Theory conceptsIn this section, we’ll look at some of the concepts useful for Data Analysis (in no particular order). Please note that there are a lot more concepts that require a depth which is out of scope of this article. So let’s get into it.
Average Path LengthThe average of the shortest path lengths for all possible node pairs. Gives a measure of ‘tightness’ of the Graph and can be used to understand how quickly/easily something flows in this Network.
BFS and DFSBreadth first search and Depth first search are two different algorithms used to search for Nodes in a Graph. They are typically used to figure out if we can reach a Node from a given Node. This is also known as Graph Traversal
The aim of the BFS is to traverse the Graph as close as possible to the root Node, while the DFS algorithm aims to move as far as possible away from the root node.
CentralityOne of the most widely used and important conceptual tools for analysing networks. Centrality aims to find the most important nodes in a network. There may be different notions of “important” and hence there are many centrality measures. Centrality measures themselves have a form of classification (or Types of centrality measures). There are measures that are characterized by flow along the edges and those that are characterized by Walk Structure.
Some of the most commonly used ones are:
Degree Centrality – The first and conceptually the simplest Centrality definition. This is the number of edges connected to a node. In the case of a directed graph, we can have 2 degree centrality measures. Inflow and Outflow Centrality
Closeness Centrality – Of a node is the average length of the shortest path from the node to all other nodes
Betweenness Centrality – Number of times a node is present in the shortest path between 2 other nodes
These centrality measures have variants and the definitions can be implemented using various algorithms. All in all, this means a large number of definitions and algorithms.
Network DensityA measure of how many edges a Graph has. The actual definition will vary depending on type of Graph and the context in which the question is asked. For a complete undirected Graph the Density is 1, while it is 0 for an empty Graph. Graph Density can be greater than 1 in some situations (involving loops).
Graph RandomizationsWhile the definitions of some Graph metrics maybe easy to calculate, it is not easy to understand their relative importance. We use Network/Graph Randomizations in such cases. We calculate the metric for the Graph at hand and for another similar Graph that is randomly generated. This similarity can for example be the same number of density and nodes. Typically we generate a 1000 similar random graphs and calculate the Graph metric for each of them and then compare it with the same metric for the Graph at hand to arrive at some notion of a benchmark.
In Data Science when trying to make a claim about a Graph it helps if it is contrasted with some randomly generated Graphs.
Getting Familiar with Graphs in pythonWe will be using the networkx package in Python. It can be installed in the Root environment of Anaconda (if you are using the Anaconda distribution of Python). You can also pip install it.
Let us look at some common things that can be done with the Networkx package. These include importing and creating a Graph and ways to visualize it.
Graph Creation import networkx as nx # Creating a Graph G = nx.Graph() # Right now G is empty # Add a node G.add_node(1) G.add_nodes_from([2,3]) # You can also add a list of nodes by passing a list argument # Add edges G.add_edge(1,2) e = (2,3) G.add_edge(*e) # * unpacks the tuple G.add_edges_from([(1,2), (1,3)]) # Just like nodes we can add edges from a listNode and Edge attributes can be added along with the creation of Nodes and Edges by passing a tuple containing node and attribute dict.
In addition to constructing graphs node-by-node or edge-by-edge, they can also be generated by applying classic graph operations, such as:
subgraph(G, nbunch) - induced subgraph view of G on nodes in nbunch union(G1,G2) - graph union disjoint_union(G1,G2) - graph union assuming all nodes are different cartesian_product(G1,G2) - return Cartesian product graph compose(G1,G2) - combine graphs identifying nodes common to both complement(G) - graph complement create_empty_copy(G) - return an empty copy of the same graph class convert_to_undirected(G) - return an undirected representation of G convert_to_directed(G) - return a directed representation of GSeparate classes exist for different types of Graphs. For example the nx.DiGraph() class allows you to create a Directed Graph. Specific graphs containing paths can be created directly using a single method. For a full list of Graph creation methods please refer to the full documentation. Link is given at the end of the article.
Image('images/graphclasses.PNG', width = 400) Accessing edges and nodesNodes and Edges can be accessed together using the G.nodes() and G.edges() methods. Individual nodes and edges can be accessed using the bracket/subscript notation.
G.nodes()NodeView((1, 2, 3))
G.edges()EdgeView([(1, 2), (1, 3), (2, 3)])
G[1] # same as G.adj[1]AtlasView({2: {}, 3: {}})
G[1][2]{}
G.edges[1, 2]{}
Graph VisualizationNetworkx provides basic functionality for visualizing graphs, but its main goal is to enable graph analysis rather than perform graph visualization. Graph visualization is hard and we will have to use specific tools dedicated for this task. Matplotlib offers some convenience functions. But GraphViz is probably the best tool for us as it offers a Python interface in the form of PyGraphViz (link to documentation below).
%matplotlib inline import matplotlib.pyplot as plt nx.draw(G) import pygraphviz as pgv d={'1': {'2': None}, '2': {'1': None, '3': None}, '3': {'1': None}} A = pgv.AGraph(data=d) print(A) # This is the 'string' or simple representation of the Graph Output: strict graph "" { 1 -- 2; 2 -- 3; 3 -- 1; }PyGraphviz provides great control over the individual attributes of the edges and nodes. We can get very beautiful visualizations using it.
# Let us create another Graph where we can individually control the colour of each node B = pgv.AGraph() # Setting node attributes that are common for all nodes B.node_attr['style']='filled' B.node_attr['shape']='circle' B.node_attr['fixedsize']='true' B.node_attr['fontcolor']='#FFFFFF' # Creating and setting node attributes that vary for each node (using a for loop) for i in range(16): B.add_edge(0,i) n=B.get_node(i) n.attr['fillcolor']="#%2x0000"%(i*16) n.attr['height']="%s"%(i/16.0+0.5) n.attr['width']="%s"%(i/16.0+0.5) B.draw('star.png',prog="circo") # This creates a .png file in the local directory. Displayed below. Image('images/star.png', width=650) # The Graph visualization we created above.Usually, visualization is thought of as a separate task from Graph analysis. A graph once analyzed is exported as a Dotfile. This Dotfile is then visualized separately to illustrate a specific point we are trying to make.
Analysis on a DatasetWe will be looking to take a generic dataset (not one that is specifically intended to be used for Graphs) and do some manipulation (in pandas) so that it can be ingested into a Graph in the form of a edgelist. And edgelist is a list of tuples that contain the vertices defining every edge
The dataset we will be looking at comes from the Airlines Industry. It has some basic information on the Airline routes. There is a Source of a journey and a destination. There are also a few columns indicating arrival and departure times for each journey. As you can imagine this dataset lends itself beautifully to be analysed as a Graph. Imagine a few cities (nodes) connected by airline routes (edges). If you are an airline carrier, you can then proceed to ask a few questions like
What is the shortest way to get from A to B? In terms of distance and in terms of time
Is there a way to go from C to D?
Which airports have the heaviest traffic?
Which airport in “in between” most other airports? So that it can be converted into a local hub
import pandas as pd import numpy as np data = pd.read_csv('data/Airlines.csv') data.shape (100, 16) data.dtypes year int64 month int64 day int64 dep_time float64 sched_dep_time int64 dep_delay float64 arr_time float64 sched_arr_time int64 arr_delay float64 carrier object flight int64 tailnum object origin object dest object air_time float64 distance int64 dtype: object
We notice that origin and destination look like good choices for Nodes. Everything can then be imagined as either node or edge attributes. A single edge can be thought of as a journey. And such a journey will have various times, a flight number, an airplane tail number etc associated with it
We notice that the year, month, day and time information is spread over many columns. We want to create one datetime column containing all of this information. We also need to keep scheduled and actual time of arrival and departure separate. So we should finally have 4 datetime columns (Scheduled and actual times of arrival and departure)
Additionally, the time columns are not in a proper format. 4:30 pm is represented as 1630 instead of 16:30. There is no delimiter to split that column. One approach is to use pandas string methods and regular expressions
We should also note that sched_dep_time and sched_arr_time are int64 dtype and dep_time and arr_time are float64 dtype
An additional complication is NaN values
# converting sched_dep_time to 'std' - Scheduled time of departure data['std'] = data.sched_dep_time.astype(str).str.replace('(d{2}$)', '') + ':' + data.sched_dep_time.astype(str).str.extract('(d{2}$)', expand=False) + ':00' # converting sched_arr_time to 'sta' - Scheduled time of arrival data['sta'] = data.sched_arr_time.astype(str).str.replace('(d{2}$)', '') + ':' + data.sched_arr_time.astype(str).str.extract('(d{2}$)', expand=False) + ':00' # converting dep_time to 'atd' - Actual time of departure data['atd'] = data.dep_time.fillna(0).astype(np.int64).astype(str).str.replace('(d{2}$)', '') + ':' + data.dep_time.fillna(0).astype(np.int64).astype(str).str.extract('(d{2}$)', expand=False) + ':00' # converting arr_time to 'ata' - Actual time of arrival data['ata'] = data.arr_time.fillna(0).astype(np.int64).astype(str).str.replace('(d{2}$)', '') + ':' + data.arr_time.fillna(0).astype(np.int64).astype(str).str.extract('(d{2}$)', expand=False) + ':00'We now have time columns in the format we wanted. Finally we may want to combine the year, month and day columns into a date column. This is not an absolutely necessary step. But we can easily obtain the year, month and day (and other) information once it is converted into datetime format.
data['date'] = pd.to_datetime(data[['year', 'month', 'day']]) # finally we drop the columns we don't need data = data.drop(columns = ['year', 'month', 'day'])Now import the dataset using the networkx function that ingests a pandas dataframe directly. Just like Graph creation there are multiple ways Data can be ingested into a Graph from multiple formats.
import networkx as nx FG = nx.from_pandas_edgelist(data, source='origin', target='dest', edge_attr=True,) FG.nodes()Output:
NodeView(('EWR', 'MEM', 'LGA', 'FLL', 'SEA', 'JFK', 'DEN', 'ORD', 'MIA', 'PBI', 'MCO', 'CMH', 'MSP', 'IAD', 'CLT', 'TPA', 'DCA', 'SJU', 'ATL', 'BHM', 'SRQ', 'MSY', 'DTW', 'LAX', 'JAX', 'RDU', 'MDW', 'DFW', 'IAH', 'SFO', 'STL', 'CVG', 'IND', 'RSW', 'BOS', 'CLE')) FG.edges()Output:
EdgeView([('EWR', 'MEM'), ('EWR', 'SEA'), ('EWR', 'MIA'), ('EWR', 'ORD'), ('EWR', 'MSP'), ('EWR', 'TPA'), ('EWR', 'MSY'), ('EWR', 'DFW'), ('EWR', 'IAH'), ('EWR', 'SFO'), ('EWR', 'CVG'), ('EWR', 'IND'), ('EWR', 'RDU'), ('EWR', 'IAD'), ('EWR', 'RSW'), ('EWR', 'BOS'), ('EWR', 'PBI'), ('EWR', 'LAX'), ('EWR', 'MCO'), ('EWR', 'SJU'), ('LGA', 'FLL'), ('LGA', 'ORD'), ('LGA', 'PBI'), ('LGA', 'CMH'), ('LGA', 'IAD'), ('LGA', 'CLT'), ('LGA', 'MIA'), ('LGA', 'DCA'), ('LGA', 'BHM'), ('LGA', 'RDU'), ('LGA', 'ATL'), ('LGA', 'TPA'), ('LGA', 'MDW'), ('LGA', 'DEN'), ('LGA', 'MSP'), ('LGA', 'DTW'), ('LGA', 'STL'), ('LGA', 'MCO'), ('LGA', 'CVG'), ('LGA', 'IAH'), ('FLL', 'JFK'), ('SEA', 'JFK'), ('JFK', 'DEN'), ('JFK', 'MCO'), ('JFK', 'TPA'), ('JFK', 'SJU'), ('JFK', 'ATL'), ('JFK', 'SRQ'), ('JFK', 'DCA'), ('JFK', 'DTW'), ('JFK', 'LAX'), ('JFK', 'JAX'), ('JFK', 'CLT'), ('JFK', 'PBI'), ('JFK', 'CLE'), ('JFK', 'IAD'), ('JFK', 'BOS')]) nx.draw_networkx(FG, with_labels=True) # Quick view of the Graph. As expected we see 3 very busy airports nx.algorithms.degree_centrality(FG) # Notice the 3 airports from which all of our 100 rows of data originates nx.density(FG) # Average edge density of the GraphsOutput:
0.09047619047619047 nx.average_shortest_path_length(FG) # Average shortest path length for ALL paths in the GraphOutput:
2.36984126984127 nx.average_degree_connectivity(FG) # For a node of degree k - What is the average of its neighbours' degree?Output:
{1: 19.307692307692307, 2: 19.0625, 3: 19.0, 17: 2.0588235294117645, 20: 1.95}As is obvious from looking at the Graph visualization (way above) – There are multiple paths from some airports to others. Let us say we want to calculate the shortest possible route between 2 such airports. Right off the bat we can think of a couple of ways of doing it
There is the shortest path by distance
There is the shortest path by flight time
What we can do is to calculate the shortest path algorithm by weighing the paths with either the distance or airtime. Please note that this is an approximate solution – The actual problem to solve is to calculate the shortest path factoring in the availability of a flight when you reach your transfer airport + wait time for the transfer. This is a more complete approach and this is how humans normally plan their travel. For the purposes of this article we will just assume that is flight is readily available when you reach an airport and calculate the shortest path using the airtime as the weight
Let us take the example of JAX and DFW airports:
# Let us find all the paths available for path in nx.all_simple_paths(FG, source='JAX', target='DFW'): print(path) # Let us find the dijkstra path from JAX to DFW. dijpath = nx.dijkstra_path(FG, source='JAX', target='DFW') dijpathOutput:
['JAX', 'JFK', 'SEA', 'EWR', 'DFW'] # Let us try to find the dijkstra path weighted by airtime (approximate case) shortpath = nx.dijkstra_path(FG, source='JAX', target='DFW', weight='air_time') shortpathOutput:
['JAX', 'JFK', 'BOS', 'EWR', 'DFW'] ConclusionThis article has at best only managed a superficial introduction to the very interesting field of Graph Theory and Network analysis. Knowledge of the theory and the Python packages will add a valuable toolset to any Data Scientist’s arsenal. For the dataset used above, a series of other questions can be asked like:
Find the shortest path between two airports given Cost, Airtime and Availability?
You are an airline carrier and you have a fleet of airplanes. You have an idea of the demand available for your flights. Given that you have permission to operate 2 more airplanes (or add 2 airplanes to your fleet) which routes will you operate them on to maximize profitability?
Can you rearrange the flights and schedules to optimize a certain parameter (like Timeliness or Profitability etc)
Bibiliography and References About the AuthorSrivatsa currently works for TheMathCompany and has over 7.5 years of experience in Decision Sciences and Analytics. He has grown, led & scaled global teams across functions, industries & geographies. He has led India Delivery for a cross industry portfolio totalling $10M in revenues. He has also conducted several client workshops and training sessions to help level up technical and business domain knowledge.
During his career span, he has led premium client engagements with Industry leaders in Technology, e-commerce and retail. He helped set up the Analytics Center of Excellence for one of the world’s largest Insurance companies.
Related
How To Deal With Missing Data Using Python
This article was published as a part of the Data Science Blogathon
Overview of Missing DataReal-world data is messy and usually holds a lot of missing values. Missing data can skew anything for data scientists and, A data scientist doesn’t want to design biased estimates that point to invalid results. Behind, any analysis is only as great as the data. Missing data appear when no value is available in one or more variables of an individual. Due to Missing data, the statistical power of the analysis can reduce, which can impact the validity of the results.
This article will help you to a guild the following topics.
The reason behind missing data?
What are the types of missing data?
Missing Completely at Random (MCAR)
Missing at Random (MAR)
Missing Not at Random (MNAR)
Detecting Missing values
Detecting missing values numerically
Detecting missing data visually using Missingno library
Finding relationship among missing data
Using matrix plot
Using a Heatmap
Treating Missing values
Deletions
Pairwise Deletion
Listwise Deletion/ Dropping rows
Dropping complete columns
Basic Imputation Techniques
Imputation with a constant value
Imputation using the statistics (mean, median, mode)
K-Nearest Neighbor Imputation
let’s start…..
What are the reasons behind missing data?Missing data can occur due to many reasons. The data is collected from various sources and, while mining the data, there is a chance to lose the data. However, most of the time cause for missing data is item nonresponse, which means people are not willing(Due to a lack of knowledge about the question ) to answer the questions in a survey, and some people unwillingness to react to sensitive questions like age, salary, gender.
Types of Missing dataBefore dealing with the missing values, it is necessary to understand the category of missing values. There are 3 major categories of missing values.
Missing Completely at Random(MCAR):A variable is missing completely at random (MCAR)if the missing values on a given variable (Y) don’t have a relationship with other variables in a given data set or with the variable (Y) itself. In other words, When data is MCAR, there is no relationship between the data missing and any values, and there is no particular reason for the missing values.
Missing at Random(MAR):Let’s understands the following examples:
Women are less likely to talk about age and weight than men.
Men are less likely to talk about salary and emotions than women.
familiar right?… This sort of missing content indicates missing at random.
MAR occurs when the missingness is not random, but there is a systematic relationship between missing values and other observed data but not the missing data.
Let me explain to you: you are working on a dataset of ABC survey. You will find out that many emotion observations are null. You decide to dig deeper and found most of the emotion observations are null that belongs to men’s observation.
Missing Not at Random(MNAR):The final and most difficult situation of missingness. MNAR occurs when the missingness is not random, and there is a systematic relationship between missing value, observed value, and missing itself. To make sure, If the missingness is in 2 or more variables holding the same pattern, you can sort the data with one variable and visualize it.
Source: Medium
‘Housing’ and ‘Loan’ variables referred to the same missingness pattern.
Detecting missing dataDetecting missing values numerically:
First, detect the percentage of missing values in every column of the dataset will give an idea about the distribution of missing values.
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns import warnings # Ignores any warning warnings.filterwarnings("ignore") train = pd.read_csv("Train.csv") mis_val =train.isna().sum() mis_val_per = train.isna().sum()/len(train)*100 mis_val_table = pd.concat([mis_val, mis_val_per], axis=1) mis_val_table_ren_columns = mis_val_table.rename( columns = {0 : 'Missing Values', 1 : '% of Total Values'}) mis_val_table_ren_columns = mis_val_table_ren_columns[ mis_val_table_ren_columns.iloc[:,:] != 0].sort_values( '% of Total Values', ascending=False).round(1) mis_val_table_ren_columnsDetecting missing values visually using Missingno library :
Missingno is a simple Python library that presents a series of visualizations to recognize the behavior and distribution of missing data inside a pandas data frame. It can be in the form of a barplot, matrix plot, heatmap, or a dendrogram.
To use this library, we require to install and import it
pip install missingno import missingno as msno msno.bar(train)The above bar chart gives a quick graphical summary of the completeness of the dataset. We can observe that Item_Weight, Outlet_Size columns have missing values. But it makes sense if it could find out the location of the missing data.
The msno.matrix() is a nullity matrix that will help to visualize the location of the null observations.
The plot appears white wherever there are missing values.
Once you get the location of the missing data, you can easily find out the type of missing data.
Let’s check out the kind of missing data……
Both the Item_Weight and the Outlet_Size columns have a lot of missing values. The missingno package additionally lets us sort the chart by a selective column. Let’s sort the value by Item_Weight column to detect if there is a pattern in the missing values.
sorted = train.sort_values('Item_Weight') msno.matrix(sorted)The above chart shows the relationship between Item_Weight and Outlet_Size.
Let’s examine is any relationship with observed data.
data = train.loc[(train["Outlet_Establishment_Year"] == 1985)]data
The above chart shows that all the Item_Weight are null that belongs to the 1985 establishment year.
The Item_Weight is null that belongs to Tier3 and Tier1, which have outlet_size medium, low, and contain low and regular fat. This missingness is a kind of Missing at Random case(MAR) as all the missing Item_Weight relates to one specific year.
msno. heatmap() helps to visualize the correlation between missing features.
msno.heatmap(train)Item_Weight has a negative(-0.3) correlation with Outlet_Size.
After classified the patterns in missing values, it needs to treat them.
Deletion:
The Deletion technique deletes the missing values from a dataset. followings are the types of missing data.
Listwise deletion:
Listwise deletion is preferred when there is a Missing Completely at Random case. In Listwise deletion entire rows(which hold the missing values) are deleted. It is also known as complete-case analysis as it removes all data that have one or more missing values.
In python we use dropna() function for Listwise deletion.
train_1 = train.copy() train_1.dropna()Listwise deletion is not preferred if the size of the dataset is small as it removes entire rows if we eliminate rows with missing data then the dataset becomes very short and the machine learning model will not give good outcomes on a small dataset.
Pairwise Deletion:
Pairwise Deletion is used if missingness is missing completely at random i.e MCAR.
Pairwise deletion is preferred to reduce the loss that happens in Listwise deletion. It is also called an available-case analysis as it removes only null observation, not the entire row.
All methods in pandas like mean, sum, etc. intrinsically skip missing values.
train_2 = train.copy() train_2['Item_Weight'].mean() #pandas skips the missing values and calculates mean of the remaining values.Dropping complete columns
If a column holds a lot of missing values, say more than 80%, and the feature is not meaningful, that time we can drop the entire column.
Imputation techniques:The imputation technique replaces missing values with substituted values. The missing values can be imputed in many ways depending upon the nature of the data and its problem. Imputation techniques can be broadly they can be classified as follows:
Imputation with constant value:
As the title hints — it replaces the missing values with either zero or any constant value.
We will use the SimpleImputer class from sklearn.
from sklearn.impute import SimpleImputer train_constant = train.copy() #setting strategy to 'constant' mean_imputer = SimpleImputer(strategy='constant') # imputing using constant value train_constant.iloc[:,:] = mean_imputer.fit_transform(train_constant) train_constant.isnull().sum()Imputation using Statistics:
The syntax is the same as imputation with constant only the SimpleImputer strategy will change. It can be “Mean” or “Median” or “Most_Frequent”.
“Mean” will replace missing values using the mean in each column. It is preferred if data is numeric and not skewed.
“Median” will replace missing values using the median in each column. It is preferred if data is numeric and skewed.
“Most_frequent” will replace missing values using the most_frequent in each column. It is preferred if data is a string(object) or numeric.
Before using any strategy, the foremost step is to check the type of data and distribution of features(if numeric).
train['Item_Weight'].dtype sns.distplot(train['Item_Weight'])Item_Weight column satisfying both conditions numeric type and doesn’t have skewed(follow Gaussian distribution). here, we can use any strategy.
from sklearn.impute import SimpleImputer train_most_frequent = train.copy() #setting strategy to 'mean' to impute by the mean mean_imputer = SimpleImputer(strategy='most_frequent')# strategy can also be mean or median train_most_frequent.iloc[:,:] = mean_imputer.fit_transform(train_most_frequent) train_most_frequent.isnull().sum()Advanced Imputation Technique:
Unlike the previous techniques, Advanced imputation techniques adopt machine learning algorithms to impute the missing values in a dataset. Followings are the machine learning algorithms that help to impute missing values.
K_Nearest Neighbor Imputation:
The KNN algorithm helps to impute missing data by finding the closest neighbors using the Euclidean distance metric to the observation with missing data and imputing them based on the non-missing values in the neighbors.
train_knn = train.copy(deep=True) from sklearn.impute import KNNImputer knn_imputer = KNNImputer(n_neighbors=2, weights="uniform") train_knn['Item_Weight'] = knn_imputer.fit_transform(train_knn[['Item_Weight']]) train_knn['Item_Weight'].isnull().sum()The fundamental weakness of KNN doesn’t work on categorical features. We need to convert them into numeric using any encoding method. It requires normalizing data as KNN Imputer is a distance-based imputation method and different scales of data generate biased replacements for the missing values.
ConclusionThere is no single method to handle missing values. Before applying any methods, it is necessary to understand the type of missing values, then check the datatype and skewness of the missing column, and then decide which method is best for a particular problem.
The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.
Related
Using The Find() Method In Python
One of the most useful functions in Python is the find() method, which allows you to search for a specific substring within a string and return its index position. In this article, we will explore the find() method in detail, including its syntax, usage, and related concepts.
What is find()?The find() method is a built-in function in Python that allows you to search for a substring within a string and return its index position. It is commonly used to extract a specific part of a string, or to check if a certain character or sequence of characters exists within a larger string. The find() method is case-sensitive, which means that it will only match substrings that have the same case as the search string.
The syntax for the find() method is as follows:
string.find(substring, start, end)Here, string is the string that you want to search within, substring is the string that you want to find, start is the index position from where the search should start (optional), and end is the index position where the search should end (optional).
If the substring is found within the string, the find() method returns the index position of the first occurrence of the substring. If the substring is not found within the string, the find() method returns -1.
Examples of Python FindLet’s take a look at some examples of how to use the find() method in Python.
Example 1: Finding a Substring within a String string = "Hello, world!" substring = "world" index = string.find(substring) print(index)Output:
7In this example, we have a string string that contains the substring world at index position 7. We use the find() method to search for the substring world within the string string, and it returns the index position of the first occurrence of the substring.
Example 2: Finding a Substring within a String (Case-Sensitive) string = "Hello, World!" substring = "world" index = string.find(substring) print(index)Output:
-1In this example, we have a string string that contains the substring World at index position 7. However, we are searching for the substring world (with a lowercase w), which does not exist in the string. Since the find() method is case-sensitive, it returns -1 to indicate that the substring was not found.
Example 3: Specifying a Start Position for the Search string = "Hello, world!" substring = "o" index = string.find(substring, 5) print(index)Output:
7In this example, we are searching for the first occurrence of the character o within the string string, starting from index position 5. Since the o occurs at index position 7, the find() method returns 7.
Example 4: Specifying a Start and End Position for the Search string = "Hello, world!" substring = "l" index = string.find(substring, 3, 7) print(index)Output:
3In this example, we are searching for the first occurrence of the character l within the string string, starting from index position 3 and ending at index position 7. Since the l occurs at index position 3, the find() method returns 3.
Example 5: Checking if a Substring Exists within a String string = "Hello, world!" substring = "Python" if string.find(substring) == -1: print("Substring is not found") else: print("Substring is found")Output:
Substring is not foundIn this example, we are searching for the substring Python within the string string. Since the substring does not exist in the string, the find() method returns -1, and we print a message indicating that the substring was not found.
ConclusionThe find() method is a powerful and versatile function in Python that allows you to search for substrings within strings and return their index positions. It is useful for a variety of applications, ranging from data analysis to web development. By understanding the syntax and usage of the find() method, you can easily extract specific parts of strings and check for the existence of certain characters or sequences of characters.
Beginner’s Guide To Web Scraping In Python Using Beautifulsoup
Overview
Learn web scraping in Python using the BeautifulSoup library
Web Scraping is a useful technique to convert unstructured data on the web to structured data
BeautifulSoup is an efficient library available in Python to perform web scraping other than urllib
A basic knowledge of HTML and HTML tags is necessary to do web scraping in Python
IntroductionThe need and importance of extracting data from the web is becoming increasingly loud and clear. Every few weeks, I find myself in a situation where we need to extract data from the web to build a machine learning model.
For example, last week we were thinking of creating an index of hotness and sentiment about various data science courses available on the internet. This would not only require finding new courses, but also scraping the web for their reviews and then summarizing them in a few metrics!
This is one of the problems / products whose efficacy depends more on web scraping and information extraction (data collection) than the techniques used to summarize the data.
Note: We have also created a free course for this article – Introduction to Web Scraping using Python. This structured format will help you learn better.
Ways to extract information from webThere are several ways to extract information from the web. Use of APIs being probably the best way to extract data from a website. Almost all large websites like Twitter, Facebook, Google, Twitter, StackOverflow provide APIs to access their data in a more structured manner. If you can get what you need through an API, it is almost always preferred approach over web scraping. This is because if you are getting access to structured data from the provider, why would you want to create an engine to extract the same information.
Sadly, not all websites provide an API. Some do it because they do not want the readers to extract huge information in a structured way, while others don’t provide APIs due to lack of technical knowledge. What do you do in these cases? Well, we need to scrape the website to fetch the information.
There might be a few other ways like RSS feeds, but they are limited in their use and hence I am not including them in the discussion here.
What is Web Scraping?You can perform web scraping in various ways, including use of Google Docs to almost every programming language. I would resort to Python because of its ease and rich ecosystem. It has a library known as ‘BeautifulSoup’ which assists this task. In this article, I’ll show you the easiest way to learn web scraping using python programming.
For those of you, who need a non-programming way to extract information out of web pages, you can also look at import.io . It provides a GUI driven interface to perform all basic web scraping operations. The hackers can continue to read this article!
Libraries required for web scrapingAs we know, Python is an open source programming language. You may find many libraries to perform one function. Hence, it is necessary to find the best to use library. I prefer BeautifulSoup (Python library), since it is easy and intuitive to work on. Precisely, I’ll use two Python modules for scraping data:
Urllib2: It is a Python module which can be used for fetching URLs. It defines functions and classes to help with URL actions (basic and digest authentication, redirections, cookies, etc). For more detail refer to the documentation page. Note: urllib2 is the name of the library included in Python 2. You can use the urllib.request library included with Python 3, instead. The urllib.request library works the same way urllib.request works in Python 2. Because it is already included you don’t need to install it.
BeautifulSoup: It is an incredible tool for pulling out information from a webpage. You can use it to extract tables, lists, paragraph and you can also put filters to extract information from web pages. In this article, we will use latest version BeautifulSoup 4. You can look at the installation instruction in its documentation page.
BeautifulSoup does not fetch the web page for us. That’s why, I use urllib2 in combination with the BeautifulSoup library.
Python has several other options for HTML scraping in addition to BeatifulSoup. Here are some others:
Basics – Get familiar with HTML (Tags)While performing web scarping, we deal with html tags. Thus, we must have good understanding of them. If you already know basics of HTML, you can skip this section. Below is the basic syntax of HTML:This syntax has various tags as elaborated below:
Other useful HTML tags are:
If you are new to this HTML tags, I would also recommend you to refer HTML tutorial from W3schools. This will give you a clear understanding about HTML tags.
Scraping a web page using BeautifulSoupHere, I am scraping data from a Wikipedia page. Our final goal is to extract list of state, union territory capitals in India. And some basic detail like establishment, former capital and others form this wikipedia page. Let’s learn with doing this project step wise step:
#import the library used to query a website import urllib2 #if you are using python3+ version, import urllib.request #specify the url #Query the website and return the html to the variable 'page' page = urllib2.urlopen(wiki) #For python 3 use urllib.request.urlopen(wiki) #import the Beautiful soup functions to parse the data returned from the website from bs4 import BeautifulSoup #Parse the html in the 'page' variable, and store it in Beautiful Soup format soup = BeautifulSoup(page) Above, you can see that structure of the HTML tags. This will help you to know about different available tags and how can you play with these to extract information.
Work with HTML tags
In[30]:soup.title In [38]:soup
.title
.string
Out[38]:u'List of state and union territory capitals in India - Wikipedia, the free encyclopedia' In [40]:soup
.a
Above, it is showing all links including titles, links and other information. Now to show only links, we need to iterate over each a tag and then return the link using attribute “href” with get.
Find the right table: As we are seeking a table to extract information about state capitals, we should identify the right table first. Let’s write the command to extract information within all table tags. all_tables=soup.find_all('table') right_table=soup.find('table', class_='wikitable sortable plainrowheaders') right_table Above, we are able to identify right table.
#Generate lists A=[] B=[] C=[] D=[] E=[] F=[] G=[] for row in right_table.findAll("tr"): cells = row.findAll('td') states=row.findAll('th') #To store second column data if len(cells)==6: #Only extract table body not heading A.append(cells[0].find(text=True)) B.append(states[0].find(text=True)) C.append(cells[1].find(text=True)) D.append(cells[2].find(text=True)) E.append(cells[3].find(text=True)) F.append(cells[4].find(text=True)) G.append(cells[5].find(text=True)) #import pandas to convert list to data frame import pandas as pd df=pd.DataFrame(A,columns=['Number']) df['State/UT']=B df['Admin_Capital']=C df['Legislative_Capital']=D df['Judiciary_Capital']=E df['Year_Capital']=F df['Former_Capital']=G dfSimilarly, you can perform various other types of web scraping using “BeautifulSoup“. This will reduce your manual efforts to collect data from web pages. You can also look at the other attributes like .parent, .contents, .descendants and .next_sibling, .prev_sibling and various attributes to navigate using tag name. These will help you to scrap the web pages effectively.-
But, why can’t I just use Regular Expressions?Now, if you know regular expressions, you might be thinking that you can write code using regular expression which can do the same thing for you. I definitely had this question. In my experience with BeautifulSoup and Regular expressions to do same thing I found out:
Code written in BeautifulSoup is usually more robust than the one written using regular expressions. Codes written with regular expressions need to be altered with any changes in pages. Even BeautifulSoup needs that in some cases, it is just that BeautifulSoup is relatively better.
Regular expressions are much faster than BeautifulSoup, usually by a factor of 100 in giving the same outcome.
So, it boils down to speed vs. robustness of the code and there is no universal winner here. If the information you are looking for can be extracted with simple regex statements, you should go ahead and use them. For almost any complex work, I usually recommend BeautifulSoup more than regex.
End NoteIn this article, we looked at web scraping methods using “BeautifulSoup” and “urllib2” in Python. We also looked at the basics of HTML and perform the web scraping step by step while solving a challenge. I’d recommend you to practice this and use it for collecting data from web pages.
Note: We have also created a free course for this article – Introduction to Web Scraping using Python. This structured format will help you learn better.
If you like what you just read & want to continue your analytics learning, subscribe to our emails, follow us on twitter or like our facebook page.Related
Companies Using Cloud Security: Top Use Cases
Utilizing cloud computing has become more essential than ever as companies look to scale up their remote operations.
Along with the convenience and flexibility cloud tech provides comes the need to secure the cloud, especially for businesses in industries with strict regulations.
Here are some examples of how companies are working with cybersecurity providers to implement cloud security solutions:
Qlik is a software vendor that provides data visualization, executive dashboards, and self-service business intelligence (BI) to customers worldwide.
They rely on AWS cloud to accelerate their business operations and adoption of Kubernetes for containers.
But when it comes to cloud sharing, keeping customer data secure is of utmost importance.
Industry: Software
Cloud Security Product: Palo Alto Networks Prism Cloud
Outcomes:
Static container scanning and run-time protection
Accurate image-scanning results
Centralized workload monitoring
Run-time protection
Read the full Qlik and Palo Alto Networks Prism Cloud case study here.
OneLink provides management consulting, outsourcing services, and custom integration solutions for clients in Latin America.
With over 14,000 employees spread across 16 locations, OneLink’s business model requires reliable network connections and cloud infrastructure.
As they shifted the majority of their service agents into remote work, OneLink engaged in a massive deployment of virtual private networks (VPNs) to all of its employees to allow them to safely connect to the network.
“We chose FortiGSLB Cloud to improve the stability of the VPN connections of all our ‘incredibles’ working from, due to its ease of integration with our network architecture and cybersecurity,” says Alejandro Mata, director of IT operations at OneLink.
Industry: Technology
Cloud Security Products: FortiGSLB Cloud, FortiGate, and FortiAuthenticator
Outcomes:
Secure and reliable remote access to company systems and applications
Stable connection for over 3,000 remote employees
Scalable, growth-friendly solution
Read the full OneLink and FortiGSLB Cloud case study here.
Mercedes-AMG is an engineering company that contracts manufacturers and engineers to customize Mercedes-Benz AMG vehicles.
They collect a continuous stream of data from 18,000 channels from their racing cars, measuring variables from over 300 sensors, and generating 1 TB of data each race weekend.
Mercedes-AMG needed a way to protect their intellectual property and data flowing in from cars, while continuously monitoring the landscape for potential threats. However, they also wanted to eliminate the burden of having to manage a cybersecurity program in its entirety.
They selected CrowdStrike’s Falcon Complete Managed Endpoint Security as a cloud security solution.
“As a team, we generate, process and analyze significant amounts of data, very quickly — we must ensure our information systems are an enabler for performance, not a blocker. But conversely, we also need to ensure they are secure,” says Michael Taylor, IT director at Mercedes-AMG.
Industry: Motorsports and engineering
Cloud Security Product: CrowdStrike Falcon Complete Managed Endpoint Security
Outcomes:
24/7 threat-hunting support team
Access to globally sourced threat intelligence in over 20 countries
Real-time data analysis for threat detection and mitigation
Read the full Mercedes-AMG and CrowdStrike Falcon Complete case study here.
Akami is a provider of edge security, web and mobile performance, and enterprise access and video-delivery solutions and services globally.
Due to the nature of their work, Akamai’s platform processes 250,000 edge servers deployed in thousands of locations worldwide each day.
This reliance on long-distance connectivity makes cloud security and data protection a priority.
“OneTrust PreferenceChoice is run by our marketing team, and the nice thing about the tool is we have had to do very little on the legal side,” says Jim Casey, associate general counsel and chief data protection officer at Akamai.
“That’s a really powerful aspect of the tool — it doesn’t require a team of lawyers and can be used cross-functionally throughout the business.”
Industry: Technology
Cloud Security Product: OneTrust PreferenceChoice and Website Scanning
Outcomes:
Gaining customer trust
Tool and management flexibility
Direct support through implementation, scaling, and upgrading
Read the full Akamai and OneTrust PreferenceChoice case study here.
Corix is a utility and energy solutions provider.
They harness natural resources and provide sustainable water, wastewater, electricity generation, and gas distribution solutions for districts and communities in the U.S. and Canada.
Corix also stores, manages, and analyzes massive volumes of customer and business data, ranging from individual consumers and business partners to municipalities and military installations.
“It became very apparent how incredibly difficult it would be for our small team to respond to a major incident at Corix,” says Carol Vorster, CIO at Corix.
“Deploying FireEye was more cost-effective than paying for the eight separate, independent security products we had deployed at the time.”
Industry: Utility and energy
Cloud security products: FireEye Email Security Cloud Edition, FireEye Helix, and Mandiant Managed Defense
Outcomes:
Saved money by cutting personnel costs and independent products
Streamlined security operations
Increased visibility across threat vectors
Fortified security posture with Mandiant experts on call
Read the full Corix and FireEye case study here.
Update the detailed information about Introduction To Google Firebase Cloud Storage Using Python on the Daihoichemgio.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!