You are reading the article Interesting Python Projects With Code For Beginners – Part 2 updated in December 2023 on the website Daihoichemgio.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested January 2024 Interesting Python Projects With Code For Beginners – Part 2
1. Convert the image to Gray using cv2.COLOR_BGR2GRAY.
cv2.cvtColor(input_image, cv2.COLOR_BGR2GRAY)
2. Finding contours in the image:
To find contours use cv2.findContours(). It takes three parameters: the source image, contour retrieval mode, contour approximation method. This will return a python list of all contours. Contour is nothing but a NumPy array of (x,y) coordinates of boundary points in the object.
3. Apply OCR.
By looping through each contour, take x,y and width, height using cv2.boundingRect() function. Then draw a rectangle function in image using cv2.rectange(). This has five parameters: input image, (x, y), (x+w, y+h), boundary colour for rectangle, size of the boundary.
4. Crop the rectangular region and pass that to tesseract to extract text. Save your content in a file by opening it in append mode.
Code:
import cv2 import pytesseract # path to Tesseract-OCR in your computer pytesseract.pytesseract.tesseract_cmd = 'path_to_tesseract.exe' img = cv2.imread("input.png") #input image gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # Converting image to gray scale # performing OTSU threshold # give structure shape and kernel size # kernel size increases or decreases the area of the rectangle to be detected. rect_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (18, 18)) #dilation on the threshold image dilation = cv2.dilate(img_thresh , rect_kernel, iterations = 1) img_contours, hierarchy = cv2.findContours(dilation, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE) im2 = img.copy() file = open("Output.txt", "w+") #text file to save results file.write("") file.close() #loop through each contour for contour in img_contours: x, y, w, h = cv2.boundingRect(contour) rect = cv2.rectangle(im2, (x, y), (x + w, y + h), (0, 255, 0), 2) cropped_image = im2[y:y + h, x:x + w] #crop the text block file = open("Output.txt", "a") text = pytesseract.image_to_string(cropped_image) #applying OCR file.write(text) file.write("n") file.close()Input image:
Output image:
2. Convert your PDF File to Audio Speech
Say you have some book as PDF to read, but you are feeling too lazy to scroll; how good it would be then if that PDF is converted to an audiobook. So, let’s implement this using python.
We will need these two packages:
pyttsx3: It is for Text to Speech, and it will help the machine speak.
PyPDF2: It is a PDF toolkit. It is capable of extracting document information, merging documents, etc.
Install them using these commands:
pip install pyttsx3 pip install PyPDF2Steps:
Import the required modules.
Use PdfFileReader() to read PDF file.
getPage() method is used to select the page to be read from.
Extract the text using extract text().
By using pyttx3, speak out the text.
Code:
# import the modules import PyPDF2 import pyttsx3 # path of your PDF file path = open('Book.pdf', 'rb') # PdfFileReader object pdfReaderObj = PyPDF2.PdfFileReader(path) # the page with which you want to start from_page = pdfReaderObj.getPage(12) content = from_page.extractText() # reading the text speak = pyttsx3.init() speak.say(content) speak.runAndWait()That’s it! It will do the job. This small code is beneficial to you when you don’t want to read; you can hear.
Next, you can provide a GUI to this project using tikinter or anything else. You can give a GUI to enter the pdf path, the page number to start from, a stop button. Try this!
Let’s move to the next project.
3. Reading mails and downloading attachments from the mailboxLet’s understand what the benefit of reading the mailbox with Python is. So, let’s suppose if we are working on a project where some data comes daily in word or excel, which is required for the script as input or to Machine learning model as input. So, if you have to download this data file daily and give it to the hand, it will be hectic. But if we can automate this step, read this file, and download the required attachment, it would be a great help. So, let’s implement this.
We will use pywin32 to implement automatic attachment download from a particular mail. It can access Windows applications like Excel, PowerPoint, Word, Outlook, etc., to perform some actions. We will focus on Outlook and download attachments from the outlook mailbox.
Note: This does not need authentication like user email id or password. It can access Outlook that is already logged in to your machine. (Keep the outlook app open while running the script).
In the above example, we chose smtplib because it can only send emails and not download attachments. So, we will go with pywin32 to download attachments from Outlook, and it will be pretty straightforward. Let’s look at the code.
Command to install: pip install pywin32
Import module
import win32com.clientNow, establish a connection to Outlook.
outlook = win32com.client.Dispatch(“Outlook.Application”).GetNamespace(“MAPI”)
Let’s try to access Inbox:
inbox = outlook.GetDefaultFolder(number)This function takes a number/integer as input which will tell the index of the inbox folder in our outlook app.
To check the index of all folders, just run this code snippet:
import win32com.client outlook=win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI") for i in range(50): try: box = outlook.GetDefaultFolder(i) name = box.Name print(i, name) except: passOutput:
3 Deleted Items 4 Outbox 5 Sent Items 6 Inbox 9 CalendarAs you can see in the output Inbox index is 6. So we will use 6 in the function.
inbox = outlook.GetDefaultFolder(6)If you want to print the subject of all the emails in the inbox, use this:
messages = inbox.Items # get the first email message = messages.GetFirst() # to loop through all the email in the inbox while True: try: print(message.subject) # get the subject of the email message = messages.GetNext() except: message = messages.GetNext()There are other properties also like “message. subject”, “message. senton”, which can be used accordingly.
Downloading AttachmentIf you want to print all the names of attachments in a mail:
for attachment in message.Attachments: print(attachment.FileName)Let’s download an attachment (an excel file with extension .xlsx) from a specific sender.
import win32com.client import re import os outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI") inbox = outlook.GetDefaultFolder(6) messages = inbox.Items message = messages.GetFirst() while True: try: if re.search('Data Report', str(message.Subject).lower()) != None and re.search("ABC prasad", str(message.Sender).lower()) != None: attachments = message.Attachments for attachment in message.Attachments: if ".xlsx" in attachment.FileName or ".XLSX" in attachment.FileName: attachment_name = str(attachment.FileName).lower() attachment.SaveASFile(os.path.join(download_folder_path, attachment_name)) else: pass message = messages.GetNext() except: message = messages.GetNext() exit ExplanationThis is the complete code to download an attachment from Outlook inbox. Inside try block, you can change conditions. For example, I am searching for those mails which have subjects such as Data Report and Sender name “ABC prasad”. So, it will iterate from the first mail in the inbox, and if the condition gets true, it will then look if that particular mail has an attachment with the extension .xlsx or .XLSX. So you can change all these things subject, sender, file type and download the file you want. Once it finds the file, it is saved to a path given as “download_folder_path”.
End NotesWe discussed three projects in a previous article and three in this article. I hope these python projects with codes helped you to polish your skill set. Just do some hands-on and try these; you will enjoy coding them. I hope you find this article helpful. Let’s connect on Linkedin.
Thanks for reading 🙂
Happy coding!
The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.
You're reading Interesting Python Projects With Code For Beginners – Part 2
Python Numpy Tutorial For Beginners: Learn With Examples
What is NumPy in Python?
NumPy is an open source library available in Python, which helps in mathematical, scientific, engineering, and data science programming. It is a very useful library to perform mathematical and statistical operations in Python. It works perfectly for multi-dimensional arrays and matrix multiplication. It is easy to integrate with C/C++ and Fortran.
For any scientific project, NumPy is the tool to know. It has been built to work with the N-dimensional array, linear algebra, random number, Fourier transform, etc.
NumPy is a programming language that deals with multi-dimensional arrays and matrices. On top of the arrays and matrices, NumPy supports a large number of mathematical operations. In this part, we will review the essential functions that you need to know for the tutorial on ‘TensorFlow.’
Why use NumPy?NumPy is memory efficiency, meaning it can handle the vast amount of data more accessible than any other library. Besides, NumPy is very convenient to work with, especially for matrix multiplication and reshaping. On top of that, NumPy is fast. In fact, TensorFlow and Scikit learn to use NumPy array to compute the matrix multiplication in the back end.
In this Python NumPy Tutorial, we will learn:
How to Install NumPyTo install NumPy library, please refer our tutorial How to install TensorFlow. NumPy is installed by default with Anaconda.
In remote case, NumPy not installed-
You can install NumPy using Anaconda:
conda install -c anaconda numpy
In Jupyter Notebook :
import sys !conda install --yes --prefix {sys.prefix} numpy Import NumPy and Check VersionThe command to import numpy is:
import numpy as npAbove code renames the Numpy namespace to np. This permits us to prefix Numpy function, methods, and attributes with ” np ” instead of typing ” numpy.” It is the standard shortcut you will find in the numpy literature
To check your installed version of NumPy, use the below command:
print (np.__version__)Output:
1.18.0 What is Python NumPy Array?NumPy arrays are a bit like Python lists, but still very much different at the same time. For those of you who are new to the topic, let’s clarify what it exactly is and what it’s good for.
As the name kind of gives away, a NumPy array is a central data structure of the numpy library. The library’s name is actually short for “Numeric Python” or “Numerical Python”.
Creating a NumPy ArraySimplest way to create an array in Numpy is to use Python List
myPythonList = [1,9,8,3]To convert python list to a numpy array by using the object np.array.
numpy_array_from_list = np.array(myPythonList)To display the contents of the list
numpy_array_from_listOutput:
array([1, 9, 8, 3])In practice, there is no need to declare a Python List. The operation can be combined.
a = np.array([1,9,8,3])NOTE: Numpy documentation states use of np.ndarray to create an array. However, this the recommended method.
You can also create a numpy array from a Tuple.
Mathematical Operations on an ArrayYou could perform mathematical operations like additions, subtraction, division and multiplication on an array. The syntax is the array name followed by the operation (+.-,*,/) followed by the operand
Example:
numpy_array_from_list + 10Output:
array([11, 19, 18, 13])This operation adds 10 to each element of the numpy array.
Shape of ArrayYou can check the shape of the array with the object shape preceded by the name of the array. In the same way, you can check the type with dtypes.
import numpy as np a = np.array([1,2,3]) print(a.shape) print(a.dtype) (3,) int64An integer is a value without decimal. If you create an array with decimal, then the type will change to float.
#### Different type b = np.array([1.1,2.0,3.2]) print(b.dtype) float64 2 Dimension ArrayYou can add a dimension with a “,”coma
Note that it has to be within the bracket []
### 2 dimension c = np.array([(1,2,3), (4,5,6)]) print(c.shape) (2, 3) 3 Dimension ArrayHigher dimension can be constructed as follow:
### 3 dimension d = np.array([ [[1, 2,3], [4, 5, 6]], [[7, 8,9], [10, 11, 12]] ]) print(d.shape) (2, 2, 3)Objective Code
Create array array([1,2,3])
print the shape array([.]).shape
What is numpy.zeros()?numpy.zeros() or np.zeros Python function is used to create a matrix full of zeroes. numpy.zeros() in Python can be used when you initialize the weights during the first iteration in TensorFlow and other statistic tasks.
numpy.zeros() function Syntax
numpy.zeros(shape, dtype=float, order='C')Python numpy.zeros() Parameters
Here,
Shape: is the shape of the numpy zero array
Dtype: is the datatype in numpy zeros. It is optional. The default value is float64
Order: Default is C which is an essential row style for numpy.zeros() in Python.
Python numpy.zeros() Example
import numpy as np np.zeros((2,2))Output:
array([[0., 0.], [0., 0.]])Example of numpy zero with Datatype
import numpy as np np.zeros((2,2), dtype=np.int16)Output:
array([[0, 0], [0, 0]], dtype=int16) What is numpy.ones()?np.ones() function is used to create a matrix full of ones. numpy.ones() in Python can be used when you initialize the weights during the first iteration in TensorFlow and other statistic tasks.
Python numpy.ones() Syntax
numpy.ones(shape, dtype=float, order='C')Python numpy.ones() Parameters
Here,
Shape: is the shape of the chúng tôi Python Array
Dtype: is the datatype in numpy ones. It is optional. The default value is float64
Order: Default is C which is an essential row style.
Python numpy.ones() 2D Array with Datatype Example
import numpy as np np.ones((1,2,3), dtype=np.int16)Output:
array([[[1, 1, 1], [1, 1, 1]]], dtype=int16)numpy.reshape() function in Python
Python NumPy Reshape function is used to shape an array without changing its data. In some occasions, you may need to reshape the data from wide to long. You can use the np.reshape function for this.
Syntax of np.reshape()
numpy.reshape(a, newShape, order='C')Here,
a: Array that you want to reshape
newShape: The new desires shape
Order: Default is C which is an essential row style.
Example of NumPy Reshape
import numpy as np e = np.array([(1,2,3), (4,5,6)]) print(e) e.reshape(3,2)Output:
[[1 2 3] [4 5 6]] array([[1, 2], [3, 4], [5, 6]])numpy.flatten() in Python
Python NumPy Flatten function is used to return a copy of the array in one-dimension. When you deal with some neural network like convnet, you need to flatten the array. You can use the np.flatten() functions for this.
Syntax of np.flatten()
numpy.flatten(order='C')Order: Default is C which is an essential row style.
Example of NumPy Flatten
e.flatten()Output:
array([1, 2, 3, 4, 5, 6])What is numpy.hstack() in Python?
Numpy.hstack is a function in Python that is used to horizontally stack sequences of input arrays in order to make a single array. With hstack() function, you can append data horizontally. It is a very convenient function in NumPy.
Lets study hstack in Python with an example:
Example:
## Horitzontal Stack import numpy as np f = np.array([1,2,3]) g = np.array([4,5,6]) print('Horizontal Append:', np.hstack((f, g)))Output:
Horizontal Append: [1 2 3 4 5 6]What is numpy.vstack() in Python?
Numpy.vstack is a function in Python which is used to vertically stack sequences of input arrays in order to make a single array. With vstack() function, you can append data vertically.
Lets study it with an example:
Example:
## Vertical Stack import numpy as np f = np.array([1,2,3]) g = np.array([4,5,6]) print('Vertical Append:', np.vstack((f, g)))Output:
Vertical Append: [[1 2 3] [4 5 6]]After studying NumPy vstack and hstack, let’s learn an example to generate random numbers in NumPy.
Generate Random Numbers using NumPyTo generate random numbers for Gaussian distribution, use:
numpy.random.normal(loc, scale, size)Here,
Loc: the mean. The center of distribution
Scale: standard deviation.
Size: number of returns
Example:
## Generate random nmber from normal distribution normal_array = np.random.normal(5, 0.5, 10) print(normal_array) [5.56171852 4.84233558 4.65392767 4.946659 4.85165567 5.61211317 4.46704244 5.22675736 4.49888936 4.68731125]If plotted the distribution will be similar to following plot
Example to Generate Random Numbers using NumPy
NumPy Asarray FunctionThe asarray()function is used when you want to convert an input to an array. The input could be a lists, tuple, ndarray, etc.
Syntax:
numpy.asarray(data, dtype=None, order=None)[source]Here,
data: Data that you want to convert to an array
dtype: This is an optional argument. If not specified, the data type is inferred from the input data
Order: Default is C which is an essential row style. Other option is F (Fortan-style)
Example:
Consider the following 2-D matrix with four rows and four columns filled by 1
import numpy as np A = np.matrix(np.ones((4,4)))If you want to change the value of the matrix, you cannot. The reason is, it is not possible to change a copy.
np.array(A)[2]=2 print(A) [[1. 1. 1. 1.] [1. 1. 1. 1.] [1. 1. 1. 1.] [1. 1. 1. 1.]]Matrix is immutable. You can use asarray if you want to add modification in the original array. Let’s see if any change occurs when you want to change the value of the third rows with the value 2.
np.asarray(A)[2]=2 print(A)Code Explanation:
np.asarray(A): converts the matrix A to an array
[2]: select the third rows
Output:
[[1. 1. 1. 1.] [1. 1. 1. 1.] [2. 2. 2. 2.] # new value [1. 1. 1. 1.]] What is numpy.arange()?numpy.arange() is an inbuilt numpy function that returns an ndarray object containing evenly spaced values within a defined interval. For instance, you want to create values from 1 to 10; you can use np.arange() in Python function.
Syntax:
numpy.arange(start, stop, step, dtype)Python NumPy arange Parameters:
Start: Start of interval for np.arange in Python function.
Stop: End of interval.
Step: Spacing between values. Default step is 1.
Dtype: Is a type of array output for NumPy arange in Python.
Example:
import numpy np np.arange(1, 11)Output:
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])Example:
If you want to change the step in this NumPy arange function in Python example, you can add a third number in the parenthesis. It will change the step.
import numpy np np.arange(1, 14, 4)Output:
array([ 1, 5, 9, 13]) NumPy Linspace FunctionLinspace gives evenly spaced samples.
Syntax:
numpy.linspace(start, stop, num, endpoint)Here,
Start: Starting value of the sequence
Stop: End value of the sequence
Num: Number of samples to generate. Default is 50
Endpoint: If True (default), stop is the last value. If False, stop value is not included.
Example:
For instance, it can be used to create 10 values from 1 to 5 evenly spaced.
import numpy as np np.linspace(1.0, 5.0, num=10)Output:
array([1. , 1.44444444, 1.88888889, 2.33333333, 2.77777778, 3.22222222, 3.66666667, 4.11111111, 4.55555556, 5. ])If you do not want to include the last digit in the interval, you can set endpoint to false
np.linspace(1.0, 5.0, num=5, endpoint=False)Output:
array([1. , 1.8, 2.6, 3.4, 4.2]) LogSpace NumPy Function in PythonLogSpace returns even spaced numbers on a log scale. Logspace has the same parameters as np.linspace.
Syntax:
numpy.logspace(start, stop, num, endpoint)Example:
np.logspace(3.0, 4.0, num=4)Output:
array([ 1000. , 2154.43469003, 4641.58883361, 10000. ])Finaly, if you want to check the memory size of an element in an array, you can use itemsize
x.itemsize
Output:
16Each element takes 16 bytes.
Indexing and Slicing in PythonExample:
## Slice import numpy as np e = np.array([(1,2,3), (4,5,6)]) print(e) [[1 2 3] [4 5 6]]Remember with numpy the first array/column starts at 0.
## First column print('First row:', e[0]) ## Second col print('Second row:', e[1])Output:
First row: [1 2 3] Second row: [4 5 6]In Python, like many other languages,
The values before the comma stand for the rows
The value on the rights stands for the columns.
If you want to select a column, you need to add : before the column index.
: means you want all the rows from the selected column.
print('Second column:', e[:,1]) Second column: [2 5]To return the first two values of the second row. You use : to select all columns up to the second
## Second Row, two values print(e[1, :2]) [4 5] Statistical Functions in PythonNumPy has quite a few useful statistical functions for finding minimum, maximum, percentile standard deviation and variance, etc from the given elements in the array. The functions are explained as follows −
Numpy is equipped with the robust statistical function as listed below
Function Numpy
Min np.min()
Max np.max()
Mean np.mean()
Median np.median()
Standard deviation np.std()
Consider the following Array:
Example:
import numpy as np normal_array = np.random.normal(5, 0.5, 10) print(normal_array)Output:
[5.56171852 4.84233558 4.65392767 4.946659 4.85165567 5.61211317 4.46704244 5.22675736 4.49888936 4.68731125]Example of NumPy Statistical function
### Min print(np.min(normal_array)) ### Max print(np.max(normal_array)) ### Mean print(np.mean(normal_array)) ### Median print(np.median(normal_array)) ### Sd print(np.std(normal_array))Output:
4.467042435266913 5.612113171990201 4.934841002270593 4.846995625786663 0.3875019367395316 What is numpy dot product?Numpy.dot product is a powerful library for matrix computation. For instance, you can compute the dot product with chúng tôi chúng tôi product is the dot product of a and b. numpy.dot() in Python handles the 2D arrays and perform matrix multiplications.
Syntax:
numpy.dot(x, y, out=None)Parameters
Here,
x,y: Input arrays. x and y both should be 1-D or 2-D for the np.dot() function to work
out: This is the output argument for 1-D array scalar to be returned. Otherwise ndarray should be returned.
Returns
The function numpy.dot() in Python returns a Dot product of two arrays x and y. The dot() function returns a scalar if both x and y are 1-D; otherwise, it returns an array. If ‘out’ is given then it is returned.
Raises
Dot product in Python raises a ValueError exception if the last dimension of x does not have the same size as the second last dimension of y.
Example:
## Linear algebra ### Dot product: product of two arrays f = np.array([1,2]) g = np.array([4,5]) ### 1*4+2*5 np.dot(f, g)Output:
14 Matrix Multiplication in PythonThe Numpy matmul() function is used to return the matrix product of 2 arrays. Here is how it works
1) 2-D arrays, it returns normal product
3) 1-D array is first promoted to a matrix, and then the product is calculated
Syntax:
numpy.matmul(x, y, out=None)Here,
x,y: Input arrays. scalars not allowed
out: This is optional parameter. Usually output is stored in ndarray
Example:
In the same way, you can compute matrices multiplication with np.matmul
### Matmul: matruc product of two arrays h = [[1,2],[3,4]] i = [[5,6],[7,8]] ### 1*5+2*7 = 19 np.matmul(h, i)Output:
array([[19, 22], [43, 50]]) DeterminantLast but not least, if you need to compute the determinant, you can use np.linalg.det(). Note that numpy takes care of the dimension.
Example:
## Determinant 2*2 matrix ### 5*8-7*6np.linalg.det(i)Output:
-2.000000000000005 Summary
NumPy is an open source library available in Python, which helps in mathematical, scientific, engineering, and data science programming.
numpy.zeros() or np.zeros Python function is used to create a matrix full of zeroes.
numpy.ones() in Python can be used when you initialize the weights during the first iteration in TensorFlow and other statistic tasks.
Python NumPy Reshape function is used to shape an array without changing its data.
Python NumPy Flatten function is used to return a copy of the array in one-dimension.
Numpy.hstack is a function in Python that is used to horizontally stack sequences of input arrays in order to make a single array.
Numpy.vstack is a function in Python which is used to vertically stack sequences of input arrays in order to make a single array.
numpy.arange() is an inbuilt numpy function that returns an ndarray object containing evenly spaced values within a defined interval.
Numpy.dot product is a powerful library for matrix computation.
The Numpy matmul() function is used to return the matrix product of 2 arrays.
Python Generators And Iterators In 2 Minutes For Data Science Beginners
This article was published as a part of the Data Science Blogathon
IntroductionWe are continuing our Python: Understanding in 2 minutes series where we cover the medium-level topics that are also frequently asked in Python and Data Science interviews. Last time, we talked about an important topic called *args and **kwargs in 2 minutes. This series is dedicated to aspiring data scientists who want to take the “next step” in Python after learning the basics. Today, we’ll continue our discussion with yet another important topic called Generator and Iterator.
Iterators in PythonThe dictionary meaning of “iterate” is to “perform a task repeatedly”.
In computer programming, Wikipedia defines iterators as:
An iterator is an object that enables a programmer to traverse a container, particularly lists.
So, we get the idea that iterators have got to do something with traversing the elements.
Now, what does it mean when something is iterable? It simply means that the items can be looped over. The list is an example of an iterable because we can loop the elements.
Image source: Github user Ethen8181
Let’s try a very simple example by considering this logic. We will first create a list and will try to implement Python’s built-in iter() method to our list.
my_list = [1,2,3,5,8,13] # converting to a list_iterator with iter() final_list = iter(my_list) final_listThe output would look something like this:
Let’s try to implement the next() function to our final_list.
next(final_list)Output:
1This is the first item on our list.
Again, try doing the same thing:
next(final_list)Output:
2This is the second item on our list.
One more time:
next(final_list)Output:
3This is the third item on our list.
So, basically, we get the idea that the iter() method makes converts an iterable item (such as a list) to an iterator.
To summarize:
An iterable is an object that can be converted into an iterator (just the way we converted a list into a list_iterator).
An iterator is an object that has a next() method.
I assume we don’t have any confusion with iterable and iterator now.
GeneratorsImage Source: Morioh
Wikipedia defines Generators as:
One way of implementing iterators is to use a restricted form of coroutine, known as a generator. By contrast with a subroutine, a generator coroutine can yield values to its caller multiple times, instead of returning just once.
We shift our focus to Generators now. Python generators are a simple way of creating iterators. It is a function that returns an object (iterator) which we can iterate over (one value at a time). Let’s see a simple example without a generator and then try to implement a generator to the same operation. We would like to create a function that squares up all the elements in the list. Let’s see how we perform this operation normally.
def square(my_list): result = [] for i in my_list: result.append(i**2) return resultAnd now, let’s pass a list and see the result.
final = square([1,2,3,4,5]) finalOutput:
[1, 4, 9, 16, 25]The process was pretty straightforward. We implemented a function where we initialized a new empty list called “result”. Then, we looped through “my_list” that we wanted to pass and we appended the squared result to our previously empty “result” list one by one. Pretty straightforward, right? And on top of that, it’s calculating everything at once. This means, it’s consuming more memory, and performance-wise, this process may be inefficient.
What if we try out the same thing with a generator?
def square(my_list): for i in my_list: yield i**2And let’s pass a list:
final = square([1,2,3,4,5]) finalOutput:
Notice, it created a generator object, and therefore, we can implement a next() function to our final variable. Let’s try:
next(final)Output:
1Let’s do it again!
next(final)Output:
4One more time:
next(final)Output
9What did we do differently here? In our second example, we created a function like the previous. Then, instead of initializing an empty list, we directly looped through our list to be passed on. In each loop, we yield the corresponding square value and that was it! Finally, we created a “final” variable to pass our intended list. This is our generator. Upon applying the next() method, we obtained the squared values every time. This means, not every result was calculated at once. This is called lazy evaluation in Python. In short, a lazy evaluation is a process in which an object is evaluated when it is needed, not when it is created.
What is “yield” doing?Yield simply produces a sequence of values. We generally use yield when we want to iterate over a sequence, but the idea is that the yield method doesn’t store the entire sequence in memory but executes only when they are told. Note that you can have multiple yield statements inside a function but you cannot have multiple returns.
Closing up, Generators do not store all the values in memory. They yield one result at a time. They are best for calculating large result sets where you don’t want to allocate the memory for all results at the same time.
In the endThe concepts of iterators, iterable, yield, and generators are mostly intermediate-level stuff that beginners often aren’t familiar with. Also, from my professional experience, these topics are frequently asked in the interview process as well. Understanding these concepts demands practice.
About the Author:Hi there! My name is Akash and I’ve been working as a Python developer for over 4 years now. In the course of my career, I began as a Junior Python Developer at Nepal’s biggest Job portal site, Merojob. Later, I was involved in Data Science and research at Nepal’s first ride-sharing company, Tootle. Currently, I’ve been actively involved in Data Science as well as Web Development with Django.
You can find my other projects on:
Connect me on LinkedIn
End Notes:Thanks for reading!
Previous blog posts in this series:
**args and **kwargs in 2 minutes
I am also planning to start The Data Science Blog on my Github page. I will try to include how real companies have been working in the field of Data Science, how to excel in Data Science and/or tech interviews, and other useful content related to Python and general programming. Feel free to check them once in a while.
The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.
Related
Machine Learning With Python: Top 10 Projects For Freshers To Pursue
With source program in Python, check these top 10 Machine Learning with Python projects for freshers
Machine learning is same as how it sounds. It is the idea that multiple types of technology, such as computers and tablets, can learn something from programming and other data. It appears to be an abstract idea. However, this type of technology is used by several people each day. Speech identification is a good example of this. Virtual assistants including Siri and Alexa use technology to present messages, answer questions and respond to instructions.
In this tutorial, you will find top 10 machine learning project ideas for freshers, intermediates, and professionals to gain real-world experience of this developing technology in 2023. These machine learning project ideas will assist you in learning all the practicalities that you want to with prevailing in your profession and to make you employable in the business.
1.Movie Recommendations from Movielens DatasetMany individuals currently use technology to stream TV and film shows. Although choosing the next stream to watch can be complex and time-consuming, recommendations are generally built based on customer habits and history. This is accomplished by machine learning and is a great and simple task for beginners to tackle. Starting developers can learn by writing program utilizing one of the two languages, Python and R, and using data from Movielens Dataset. Movielens has over 6000 people make it currently involves more than 1 million film valuations of 3900 movies.
2.Music Recommendation System ML ProjectThis is one of the most popular machine learning projects and can be used across multiple domains. You should be very familiar with a recommendation system if you have utilized any E-commerce site or Movie/Music website. In some E-commerce sites such as Amazon, at the time of checkout, the system will recommend elements that can be added to the cart.
3.BigMart Sales Prediction ML ProjectAs a fresher, you should work on multiple machine learning projects ideas to expand your skillset. Therefore, we have added a project that will learn unsupervised machine learning algorithms to us by utilizing the business dataset of a grocery supermarket store.
4.TensorFlowThis open-source artificial intelligence library is a best place for fresher to enhance their machine learning skills. With TensorFlow, they can use the library to make data flow graphs, projects utilizing Java, and an array of applications. It also involves APIs for Java.
5.Iris ClassificationThis is one of the simplest machine learning projects with Iris Flowers being the elementary machine learning datasets in classification writing. This machine learning problem is defined as the “Hello World” of machine learning. The dataset has numeric characteristics and ML freshers need to figure out how to load and handle information. The iris dataset is small which simply fits into the memory and does not need any specific transformations or scaling, to start with.
6.Sales Forecasting with WalmartWhile predicting future sales efficiently may not be applicable, businesses can come near to machine learning. For example, Walmart supports datasets for 98 products across 45 outlets so programmer can access data on weekly sales by locations and branch. The main objective of this project is to create better data-driven decisions in channel optimization and stock planning.
7.Stock Price PredictionsIt is same as sales forecasting, forecasts of prices for stocks can be changed from the data of previous prices, indexes of volatility, and different fundamental indicators. For freshers, it is possible to start with a concept like this and create use of stock industry data to create predictions over the recent months. It is a best way to get familiar with making predictions utilizing huge data sets.
8.Breast Cancer PredictionThis project uses machine learning to make data that helps decide whether the tumour in the breast is mild or deadly. There are multiple factors considered, including the thickness of the lump, the number of bare nuclei, and mitosis. It is also a best method for a new expert in machine learning to get familiar with using R.
9.Sorting of Specific Tweets on TwitterIn an optimal world, quickly filtering tweets with definite words and elements would be best. There’s a huge fresher-level machine-learning project which enables programmers to develop an algorithm that takes scraped tweets processed by an artificial language processor to recognize which tweets are more likely to be associated to specific topics or talk about specific individuals, etc.
10.Making Handwritten Documents Digital VersionsFuzzywuzzy Python Library: Interesting Tool For Nlp And Text Analytics
This article was published as a part of the Data Science Blogathon
IntroductionThere are many ways to compare text in python. But, often we search for an easy way to compare text. Comparing text is needed for various text analytics and Natural Language Processing purposes.
One of the easiest ways of comparing text in python is using the fuzzy-wuzzy library. Here, we get a score out of 100, based on the similarity of the strings. Basically, we are given the similarity index. The library uses Levenshtein distance to calculate the difference between two strings.
Levenshtein DistanceThe Levenshtein distance is a string metric to calculate the difference between two different strings. Soviet mathematician Vladimir Levenshtein formulated this method and it is named after him.
where the tail of some string x is a string of all but the first character of x, and x[n] is the nth character of the string x starting with character 0.
FuzzyWuzzyFuzzy Wuzzy is an open-source library developed and released by SeatGeek. You can read their original blog here. The simple implementation and the unique score (out of 100) metic makes it interesting to use FuzzyWuzzy for text comparison and it has numerous applications.
Installation:
pip install fuzzywuzzy pip install python-LevenshteinThese are the requirements that must be installed.
Let us now get started with the code by importing the necessary libraries.
Python Code:
Here, in this case, even though the two different strings had different cases, conversion of both to the lower case was done and the score was 100.
Substring MatchingNow, often various cases in text-matching might arise where we need to compare two different strings where one might be a substring of the other. For example, we are testing a text summarizer and we have to check how well is the summarizer performing. So, the summarized text will be a substring of the original string. FuzzyWuzzy has powerful functions to deal with such cases.
#fuzzywuzzy functions to work with substring matching b1 = "The Samsung Group is a South Korean multinational conglomerate headquartered in Samsung Town, Seoul." b2 = "Samsung Group is a South Korean company based in Seoul" print("Ratio:",Ratio) print("Partial Ratio:",Partial_Ratio)Output:
Ratio: 64 Partial Ratio: 74Here, we can see that the score for the Partial Ratio function is more. This indicates that it is able to recognize the fact that the string b2 has words from b1.
Token Sort RatioBut, the above method of substring matching is not foolproof. Often the words are jumbled up and do not follow an order. Similarly, in the case of similar sentences, the order of words is different or mixed up. In this case, we use a different function.
Output:
Ratio: 56 Partial Ratio: 60 Token Sort Ratio: 100So, here, in this case, we can see that the strings are just jumbled up versions of each other. And the two strings show the same sentiment and also mention the same entity. The standard fuzz function shows the score between them to be 56. And the Token Sort Ratio function shows the similarity to be 100.
So, it becomes clear that in some situations or applications, the Token Sort Ratio will be more useful.
Token Set RatioBut, now if the two strings have different lengths. Token sort ratio functions might not be able to perform well in this situation. For this purpose, we have the Token Set Ratio function.
Output:
Ratio: 41 Partial Ratio: 65 Token Sort Ratio: 59 Token Set Ratio: 100Ah! The score of 100. Well, the reason is that the string d2 components are entirely present in string d1.
Now, let us slightly modify string d2.
By, slightly modifying the text d2 we can see that the score is reduced to 92. This is because the text “10” is not present in string d1.
WRatio()This function helps to manage the upper case, lower case, and some other parameters.
#fuzz.WRatio()Output:
Slightly change of cases: 100Let us try removing a space.
#fuzz.WRatio()Output:
Slightly change of cases and a space removed: 97Let us try some punctuation.
#handling some random punctuations g1='Microsoft Windows is good, but takes up lof of ram!!!' g2='Microsoft Windows is good but takes up lof of ram?'Output: 99
Thus, we can see that FuzzyWuzzy has a lot of interesting functions which can be used to do interesting text comparison tasks.
Some Suitable Applications:FuzzyWuzzy can have some interesting applications.
It can be used to assess summaries of larger texts and judge their similarity. This can be used to measure the performance of text summarizers.
Based on the similarity of texts, it can also be used to identify the authenticity of a text, article, news, book etc. Often, we come across various incorrect text/ data. Often cross-checking each and every text data is not possible. Using text similarity, cross-checking of various texts can be done.
FuzzyWuzzy can also come in handy in selecting the best similar text out of a number of texts. So, the applications of FuzzyWuzzy are numerous.
Text similarity is an important metric that can be used for various NLP and Text Analytics purposes. The interesting thing about FuzzyWuzzy is that similarities are given as a score out of 100. This allows relative scoring and also generates a new feature /data that can be used for analytics/ ML purposes.
Summary Similarity:
#uses of fuzzy wuzzy #summary similarityThe above is the original text.
output_text="Text Analytics involves the use of unstructured text data, processing them into usable structured data. Text Analytics is an interesting application of Natural Language Processing. Text Analytics has various processes including cleaning of text, removing stopwords, word frequency calculation, and much more. Text Analytics is used to understand patterns and trends in text data. Keywords, topics, and important features of Text are found using Text Analytics. There are many more interesting aspects of Text Analytics, now let us proceed with our resume dataset. The dataset contains text from various resume types and can be used to understand what people mainly use in resumes."Output:
Ratio: 54 Partial Ratio: 79 Token Sort Ratio: 54 Token Set Ratio: 100We can see the various scores. The partial ratio does show that they are quite similar, which should be the case. Also, the token set ratio is 100, which is evident as the summary is completely taken from the original text.
Best possible String match:
Let us use the process library to find the best possible string match among a list of strings.
#choosing the possible string match #using process library query = 'Stack Overflow' choices = ['Stock Overhead', 'Stack Overflowing', 'S. Overflow',"Stoack Overflow"] print("List of ratios: ")Output:
List of ratios: [('Stoack Overflow', 97), ('Stack Overflowing', 90), ('S. Overflow', 85), ('Stock Overhead', 64)] Best choice: ('Stoack Overflow', 97)Hence, the similarity scores and the best match are given.
Final WordsFuzzyWuzzy library is created on top of the difflib library. And python-Levenshtein used for optimizing the speed. So we can understand that FuzzyWuzzy is one of the best ways for string comparison in Python.
Do check out the code on Kaggle here.
About me:
Prateek Majumder
Connect with me on Linkedin.
My other articles on Analytics Vidhya: Link.
Thank You.
The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.
Related
Filezilla Guide For Beginners – Webnots
When you use any of the self-hosted platforms like chúng tôi it is necessary to know how to work with FTP applications. FileZilla is one of the easy to use File Transfer Protocol applications widely used by webmasters. You can use FileZilla to download files from your hosting server, upload new files and edit existing files remotely. If you are looking for learning how to use FTP, here is a FileZilla guide for beginner level users on how to setup FileZilla application and do file transfer.
Downloading FileZilla AppDownload FileZilla
Earlier, you have to download the application from chúng tôi website. However, this is not required any amore and you can directly download the app from FileZilla site. You have three options available for downloading – standard FileZilla, with manual and pro.
FileZilla Versions
The manual version offers a PDF manual along with the app download which will cost you $4.99. The basic app is the same for all three versions except with pro version you can use cloud services. You can download the free app from their site or get the pro app from App Store.
FileZilla Pro App
Connecting to ServerSite Manager Option in FileZilla
Add your FTP account details which you must have created with your hosting company. If you don’t know how to create FTP account, read our article on how to connect Bluehost FTP account with FileZilla.
FileZilla Screen OverviewOnce you connected to hosting server the screen will look something like below with various sections. Don’t get panic by looking various sections, it will take few minutes for you to understand the screen overview of FileZilla.
Copy Current Connection Settings to Site Manager
Message Log – Here you can view the connection status, command and response when you open each directory of your site on the server side. The file transfer status will also be shown here.
Local Site – This section is divided into two hals and all the folders of a selected path on your local system will be shown here in the first half. When a folder is selected, all files and folders under the selected folder will be showing in the second half.
Remote Site – Like local site, this section contains two halves to show remote site’s folder and the content inside the selected folder showing below. Remote site is nothing but your live site hosted on the server.
Transfer Queues – Here you can view the status of the transfer when you download, upload or edit a file on the server.
If you feel the screen is congested, you can enable or disable each section of the screen from the “View” menu or resize the viewing area by dragging the horizontal and vertical section dividers.
View or Hide FileZilla Screen Sections
Note: FileZilla toolbar shown above the quickconnect bar has quick shortcut icons for site manager, toggle each sections, disconnect server and few other options. Filelist status bars show the number of files and directories under the selected folders of local and remote site sections.
Upload, Download, View and Edit FilesDrag and drop files between local and remote sites to download or upload the files from or to server.
File TransferStatus of file upload and download can be seen both in message log and transfer queue sections.
Check File Transfer Status in FileZilla
Transfer queue has three sections – Queued files, Failed transfers and Successful transfers and each sections shows the corresponding file’s status.
File Permission Settings
View and Change File Permissions
In the pop-up menu, select the read, write or execute option for each category of owner, group and public. Based on the selection the numeric value will change which is the one you will be seeing under “Permissions” column as explained above.
Setting File Permissions in FileZilla
Viewing Hidden FilesDepending on the server settings sometimes hidden files are not shown in FileZilla and hence you will not be able to see files like .htaccess. Select “Force showing hidden files” option from “Server” menu to view and edit hidden files.
Showing Hidden Files in FileZilla
You will see a warning message like below when enabling this option. If you are not able to view normal directory structure properly then disable this option and try again.
Warning When Force Showing Hidden Files in FileZilla
Disconnecting from ServerAfter you have completed the transactions it is recommended to close your connection by choosing “Disconnect” option from “Server” menu.
Disconnect Server in FileZilla
Additional OptionsFollowing are some of the general operations you can do with FileZilla:
FileZilla is ideal FTP program for connecting to hosting server remotely for uploading, downloading and modifying files. You can use this app when migrating the whole site content to another hosting company.
Pros
Very easy to configure and connect to server
Supports FTP, FTPS and SFTP clients
Drag and drop support
Directory comparison
Available in 47 languages
Open source and free
Tabbed and easy graphical user interface
Works on macOS X, Windows and Linux based computers
Cons
Missing on-screen explanation of error messages.
Update the detailed information about Interesting Python Projects With Code For Beginners – Part 2 on the Daihoichemgio.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!