You are reading the article Implementing Particle Swarm Optimization Using Python updated in November 2023 on the website Minhminhbmm.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested December 2023 Implementing Particle Swarm Optimization Using Python
This article was published as a part of the Data Science Blogathon.
IntroductionThe solution works by the logic where a candidate solution is worked upon to make it better by a specific measure of quality and then the operation is performed iteratively until a better solution is not available and thus reaching an optimal state itself. The language here will be Python and we will see a hands-on implementation of it using a python package “PySwarms”. We will cover the following topics here :
PSO: Particle Swarm Optimization
The inner workings
Variants or types of PSO
Implementing PSO with PySwarms
What is Particle Swarm Optimization (PSO)?Image 1
The studies concluded that there was knowledge or wisdom sharing amongst animals that helped them get insight into a situation they haven’t faced before. In simpler words, finding food or evading prey becomes easy when all the birds of a flock share their observations and knowledge. The optimal solution hence becomes easier to pin down with swarm intelligence.
Inner working of Particle Swarm OptimizationSwarm behaviour can be of two types. First is the exploratory behaviour where animals seek their object in a larger solution space. Second is the exploitative behaviour where swarm intelligence helps them search a smaller more optimum area. PSOs and their parameters are mostly designed to find an optimum balance between exploration and exploitation. This ensures a good rate of convergence to the optimum.
2.1 Convergence of Particle Swarm OptimizationIn a PSO convergence, it does not matter how a swarm will operate but the convergence to a specific local optimum when various personal bests which let’s say P or the best-known position of the swarm or also known as G will reach a local optimum for the problem. The ability which a PSO algorithm has so that it can exploit and explore can get affected by the topology structure. So for different structures, the algorithm can converge in a different manner because of the shape of the topology which is generally either a local ring or a global star. The topology structure defines the factors like searching, information sharing, speed, and even the direction in which each particle should follow.
Image 2
A PSO which has a global star structure where all particles are connected with each other has one benefit of the shortest average distance but a local ring structure where one particle is connected with the two nearest ones has the highest average distance in the whole swarm as shown in the above image. The figure on the left is the global star structure whereas the left one is the local ring structure. Each group consists of 16 particles and you can see why the average distance differs for both cases.
2.2 Adaptive mechanismIn an Adaptive mechanism, one can implement it with the required trade-off between convergence and divergence or exploitation and exploration. An APSO or Adaptive Particle Swarm Optimisation can outperform a regular PSO. With a faster convergence time, an APSO could also execute any global search across the whole search space.
Variants of Particle Swarm OptimizationPSO algorithms can be of different types, even simple ones. The particles and velocities can be initiated in different ways. Update the Swarm and then set values for Pi and G and so forth.
3.1 Gradient Particle Swarm OptimizationWe can construct gradient-based PSOs by combining the efficiency of the PSO at exploring many local minimums with the gradient-based local search algorithms. This helps us to accurately calculate a local minimum. The PSO algorithm can be used for gradient-based PSO also where several local minima are discovered along with a deep basin of attraction for a deep local minimum. The deep minimum can then be confirmed using the local gradient-based search techniques.
3.2 Hybrid Particle Swarm OptimizationImage 4
4. Implementing Particle Swarm Optimization using PySwarmsAs the name suggests, PySwarms is a python based tool that helps with swarm optimisation. Researchers, practitioners, and students alike use this tool in order to apply the PSO algorithm using a high-level interface. PySwarms is the best tool to integrate swarm optimisation with basic optimization.
This tool allows you to implement and use a number of many-particle swarm optimisation techniques. It’s also extremely user-friendly and adaptable to different projects. Your optimisation problem can also benefit from the support modules.
Now we will move on to using a global best optimizer using PySwarm’s functional API which is “pyswarms.single.GBestPSO” . We will plot it in both 2D and 3D manner.
You can install PySwarm by the following commands :
!pip install pyswarmsNow we will begin by importing some modules first :
Code :
# Import all the modules import numpy as np # Importing PySwarms import pyswarms as ps from pyswarms.utils.functions import single_obj as fx import matplotlib.pyplot as plt from pyswarms.utils.plotters import plot_contour, plot_surface from pyswarms.utils.plotters.formatters import Designer from pyswarms.utils.plotters.formatters import MesherAfter importing them all, we will begin with improving our sphere function, you can put any arbitrary settings to begin within our optimizer. The main three steps here are :
1. Set the hyperparameters to configure the swarm as a dictionary
2. To create the instance of an optimizer, pass the dictionary with all the relevant input parameters
3. The best cost and position in a variable can be saved by invoking the “optimize()” function.
4.1 Now on Visualizing the functionPySwarm already comes with various tools which will help you to visualise the behaviours of the swarm. These behaviours are constructed on the top of matplotlib which results in high customizable charts and user friendly too. There are also animation methods which are “plot_contour()” and “plot_surface()” methods. These methods plot the particle in a 2D and 3D space respectively. To plot the sphere function, we need to add a specific function called mesh function into our swarm.
These help to visualise the particles graphically. We have already imported the “Mesher” class which helps us to achieve this. The “pyswarms.utils.plotters.formatters” module has many different formats so that you can customize the plots. Apart from Mesher, there is a “Designer” class that modifies the figure size, font size, and more along with an animator class for setting repeats and animation delays.
Code for 2D Plot :
m = Mesher(func=fx.sphere) animation = plot_contour(pos_history=optimizer.pos_history, mesher=m, mark=(0,0))Output :
#Code for 3D Plot : # The preprocessing # Adjusting the figure d = Designer(limits=[(-1,1), (-1,1), (-0.1,1)], label=['x-axis', 'y-axis', 'z-axis']) animation3d = plot_surface(pos_history=pos_history_3d, # The cost_history that we computed mesher=m, designer=d, # Various Customizations mark=(0,0,0)) # Mark the minimaOutput :
Note: Support for .gif will soon be added and hence youtube upload for now. The First 10 seconds of the video are essential, the rest is not much. The collab notebook link is given below :
ConclusionSo in this post, we have seen the theory and how a PSO works and explored the inner mechanism. We saw some of its variants and how it has been used in various domains and communities. Last but not least we also implemented a small python based swarm of our own. Feel free to text me if you face any issues or run into any problems.
Ankita Roy
Class of 2023 IT, Open to full-time/internships
[email protected]
Links to images :
The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.
Related
You're reading Implementing Particle Swarm Optimization Using Python
Introduction To Redis Using Python
This article was published as a part of the Data Science Blogathon.
IntroductionRedis is a popular in-memory key-value data store, which is a type of NoSQL database. Redis is chiefly used as a cache database, but its application does not end there. You can find many articles explaining how Redis can be the all-in-one database for all kinds of applications. In this article, we will understand how to connect and use Redis in python.
Installing Redis on WindowsRedis is not officially supported on Windows. However, you can install Redis by setting up Windows Subsystem for Linux 2 and configuring it. Alternatively, you can run Redis on a container using Docker, which I will be covering in this article. The first step is to install Docker on your Windows machine. You can download Docker Desktop from here. The installation process is fairly simple and direct. Now that we have Docker on our machine, enter the following command on a command prompt to fetch the Redis image from Docker Hub, which can be used to build and run a container.
docker pull redisOnce this is done, the third step is to start a container using the Redis image that we downloaded earlier.
Please refer to the below picture for reference.
Congrats, you have now successfully started the Redis server on your machine.
Installing Redis-pyTo connect and use Redis in python, we will be using a python module called redis-py. It can be installed by running the following command in the command prompt.
pip install redisNow that we have everything ready, let’s get our hands dirty and dive into the programming part.
Using Redis-pyBefore performing any CRUD operations, we first need to connect to the Redis server, which is very direct in redis-py. We need to set the decode_responses parameter as true so that we do not get the response in byte format.
import redis r = redis.Redis(decode_responses=True) r.ping() TrueNow that we have connected to the Redis server, let’s start performing simple CRUD operations.
To set a key-value pair, we use the “set” function that accepts the key and the value as the parameter. Please note that the key should always be either of string data type or bytes.
r.set('hello', 'world')To get the value for a specific key, we use the “get” function that accepts the key for which we want the value to be returned.
r.get('hello') worldTo set multiple key-value pairs, we use the “mset” function that stands for multiple-set and accepts multiple key-value pairs as parameters.
data = { 'hello': 'world', 'lorem': 'ipsum' } r.mset(data) r.get('lorem') ipsumTo set a key-value pair where the value is of a set data type, we use the “sadd” function. The set data type contains only unique elements, unlike the list data type. Let’s now store a set of fruits without the fruits being repeated.
fruits = ["avocado", "strawberry", "strawberry", "mango", "orange"] r.sadd('fruits', *fruits)To get all the values of fruits that we just stored, we can use the “smembers” function.
r.smembers('fruits') {'avocado', 'mango', 'orange', 'strawberry'}Since we have seen how to store set data type values in Redis, let us now learn to store list data types. It can be done using the “lpush” function. Let us store a list of programming languages using this function.
programming_languages = ['python', 'C#', 'C++', 'C++', 'javascript'] r.lpush('languages', *programming_languages)To get all the elements in the list, we use the “lrange” function that helps us traverse the elements in the list. The “-1” denotes that the function is expected to return all the elements, but if you want to return only the first 3 elements of the list, you can do so by just replacing “-1” with “2”. Note that it is not “3” because it will traverse till the index is three and return four elements in total.
r.lrange('languages', 0, -1) ['javascript', 'C++', 'C++', 'C#', 'python'] r.lrange('languages', 0, 2) ['javascript', 'C++', 'C++']Now, let us try to save a nested object in Redis. For storing nested objects, we can make use of the “hset” function but it allows only one level of nesting.
r.hset('person', 'name', 'Ram') r.hget('person', 'name') RamIf you want to store deeply nested objects with different data types, serialization techniques like using json or pickle can be used. Let’s see this in action to better understand.
import json personal_information = { 'name': 'Rahul', 'age': 20, 'address':{ 'house_no': 189, 'flat_name': 'Golden Flower', 'area': 'Guindy' }, 'languages_known': ['english', 'hindi', 'tamil'] } r.set('personal_information', json.dumps(personal_information))To extract the information stored, we can directly use “get” function and then undo the stringification performed by json.
{'name': 'Rahul', 'age': 20, 'address': {'house_no': 189, 'flat_name': 'Golden Flower', 'area': 'Guindy'}, 'languages_known': ['english', 'hindi', 'tamil']}Since it is an in-memory data store, it is important that old key-value pairs get deleted or rather expired to make room for storing new data. For this, Redis has the key expiry option available. Let us now try to store a key-value pair with an expiration time. We can make use of the “setex” function to set the expiry for a key-value pair. It accepts the TTL in seconds. If you want to set the TTL in milliseconds, you can use the “psetex” function.
r.setex('lorem', 10, 'ipsum') # 10 seconds r.psetex('hello', 10, 'world') # 10 milli secondsTo know how much time is remaining for a key to expire, we can use the “ttl” function which will return the time remaining in seconds. Similarly, the “pttl” function will return the same but in milliseconds. If the key has expired, the returned value will be negative.
r.ttl('lorem') 2 r.pttl('lorem') -2Suppose we want to check if a key has expired or not, we can make use of the “exists” function that will return 1 if it is available and 0 if it has expired.
r.exists("lorem") 1 r.exists("hello") 0To delete a key-value pair in Redis, we can use the “delete” function. It accepts the key that you want to delete.
r.delete('lorem') ConclusionIn this article, we discussed and covered the following:
What is Redis?
Installing docker on windows
Running it on a docker container
Connecting using redis-py
Performing simple CRUD operations using redis-py
That’s it for this article. Hope you enjoyed reading this article and learned something new. Thanks for reading and happy learning!
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.
Related
Using The Replace() Method In Python
The replace() method is a built-in function in Python that allows you to replace a specific substring or character in a string with another substring or character. It takes two arguments, the first being the substring or character you want to replace, and the second being the new substring or character you want to replace it with. The replace() method returns a new string with the replaced characters.
Syntax for the Replace Method string.replace(old, new[, count])Where:
string: The string to be modified.
old: The substring or character to be replaced.
new: The new substring or character to replace the old one.
count (optional): The maximum number of occurrences to replace. If not specified, all occurrences will be replaced.
Examples of Python Replace MethodLet’s look at some examples to understand how the replace() method works in Python.
Example 1: Replace a Character in a StringIn this example, we will replace a character in a string with another character.
string = 'Hello World!' new_string = string.replace('o', '0') print(new_string)Output:
Hell0 W0rld!In the above example, we replaced all occurrences of the character ‘o’ with ‘0’ in the string ‘Hello World!’.
Example 2: Replace a Substring in a StringIn this example, we will replace a substring in a string with another substring.
string = 'Python is a popular programming language' new_string = string.replace('programming', 'scripting') print(new_string)Output:
Python is a popular scripting languageIn the above example, we replaced the substring ‘programming’ with ‘scripting’ in the string ‘Python is a popular programming language’.
Example 3: Replace Multiple Occurrences of a CharacterIn this example, we will replace multiple occurrences of a character in a string.
string = 'Mississippi' new_string = string.replace('s', 'z') print(new_string)Output:
MizzizzippiIn the above example, we replaced all occurrences of the character ‘s’ with ‘z’ in the string ‘Mississippi’.
Example 4: Replace a Character in a String with a LimitIn this example, we will replace a character in a string with a limit on the number of occurrences to replace.
string = 'Hello World!' new_string = string.replace('o', '0', 1) print(new_string)Output:
Hell0 World!In the above example, we replaced the first occurrence of the character ‘o’ with ‘0’ in the string ‘Hello World!’.
Example 5: Replace a Substring in a String with a LimitIn this example, we will replace a substring in a string with a limit on the number of occurrences to replace.
string = 'Python is a popular programming language' new_string = string.replace('language', 'tool', 1) print(new_string)Output:
Python is a popular programming toolIn the above example, we replaced the first occurrence of the substring ‘language’ with ‘tool’ in the string ‘Python is a popular programming language’.
ConclusionIn this article, we discussed what the replace() method is, how it works, and how to use it in Python. The replace() method is a powerful function that allows you to replace specific characters or substrings in a string with new characters or substrings. We hope that this guide has helped you understand the replace() method better and how to use it in your Python projects.
Python’s replace() method is an important tool to have in your arsenal as a developer, whether you’re working with text manipulation, data cleaning, or any other string-related tasks. The examples provided in this tutorial demonstrate various use cases and should serve as a solid foundation for utilizing the replace() method effectively in your own Python projects.
Write Better Python Functions Using Type Dispatch
Iteration 4
Objective:
Improve the documentation and avoid code repetition with the help of Type Dispatch.
Type Dispatch:
Type dispatch allows you to change the way a function behaves based upon the input types it receives. This is a prominent feature in some programming languages like Julia & Swift. All we have to do is add a decorator typedispatch before our function. Probably, it is easier to demonstrate than to explain.
Example for Type Dispatch:
Function definitions:
from chúng tôi import * from typing import List # Function to multiply two ndarrays @typedispatch def multiple(x:np.ndarray, y:np.ndarray ): return x * y # Function to multiply a List by an integer @typedispatch def multiple(lst:List, x:int): return [ x*val for val in lst]
Calling 1st function:
x = np.arange(1,3) print(f'x is {x}') y = np.array(10) print(f'y is {y}') print(f'Result of multiplying two numpy arrays: { multiple(x, y)}')
Calling 2nd function:
x = [1, 2] print(f'x is {x}') y = 10 print(f'y is {y}') print(f'Result of multiplying a List of integers by an integer: {multiple(x, y)}')
The life of a programmer can be made so much better if they don’t have to come up with different function names with no change in purpose (for various data types). If this doesn’t encourage you to use Type Dispatch whenever possible then I don’t know what will 🤷
We will be using fastcore package for implementing Type Dispatch to our use case. For more details on fastcore and Type Dispatch, check this awesome blog by Hamel Husain. Also check out fastai, which inspired me to write this post.
Code:
# Import Libraries import numpy as np import torch from PIL import Image as PILImage from pathlib import Path, PurePath from chúng tôi import * @typedispatch """Change ndarray to torch tensor. The ndarray would be of the shape (Height, Width, # of Channels) but pytorch tensor expects the shape as (# of Channels, Height, Width) before putting the Tensor on GPU if it's available. Args: arr[ndarray]: Ndarray which needs to be converted to torch tensor Returns: Torch tensor on GPU (if it's available) """ # Set Torch device to GPU if CUDA supported GPU is available device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') # Transpose the array before converting to tensor imgArr = arr.transpose(2, 0, 1) if chúng tôi == 3 else arr return torch.Tensor(imgArr).to(device) @typedispatch """Change image to torch tensor. The PIL image cast as numpy array with dtype as uint8, and then passed to to_imageTensor(arr: np.ndarray) function for converting numpy array to torch tensor. Args: image[PILImage.Image]: PIL Image which needs to be converted to torch tensor Returns: Torch tensor on GPU (if it's available) """ return to_imageTensor(np.asarray(image, np.uint8)) @typedispatch """Change image file to torch tensor. Read the image from disk as 3 channels (RGB) using PIL, and passed on to to_imageTensor(image: PILImage.Image) function for converting Image to torch tensor. Args: file[str, PurePath]: Image file name which needs to be converted to torch tensor Returns: Torch tensor on GPU (if it's available) Raises: Any error thrown while reading the image file, Mostly FileNotFoundError will be raised. """ try: img = PILImage.open(file).convert('RGB') except Exception as error: raise error return to_imageTensor(img) @typedispatch """For unsupported data types, raise TypeError. """ raise TypeError('Input must be of Type - String or Path or PIL Image or Numpy array')
What have we done?By utilizing the Type dispatch functionality, we managed to use the same name for all 3 functions, and each one’s behavior is differentiated by their input type. The function name is also shortened with the removal of input type. This makes the function name easier to remember.
By calling the function name, we can see what are the different input types supported by the function. Fastcore by default expects two input parameters since we have only it assigns the second one as an object. The second parameter will not have any impact on the function behavior.
to_imageTensor
# OUTPUT
With help of the inspecting module, we can access the input & output types of a particular function.
import inspect inspect.signature(to_imageTensor[np.ndarray])
The docstrings implemented in this iteration make the code more readable. The docstring of a particular input type can be accessed by calling doc along with the input type.
print(to_imageTensor[np.ndarray].__doc__)
As discussed in the last section, we managed to move the TypeError message to a separate function with input type as object.
Quick Check:Still working!
is_same_tensor(file_ToImageTensor(filename), to_imageTensor(filename))
Validating error message when the unsupported data type is passed to the function.
to_imageTensor([filename])
What can we improve?This is it. I believe we are done here.
A Beginners’ Guide To Image Similarity Using Python
If you believe that our Test Image is similar to our first reference image you are right. If you do believe otherwise then let’s find out together with the power of mathematics and programming.
Every image is stored in our computer in the form of numbers and a vector of such numbers that can completely describe our image is known as an Image Vector.
Euclidean Distance:
Euclidean Distance represents the distance between any two points in an n-dimensional space. Since we are representing our images as image vectors they are nothing but a point in an n-dimensional space and we are going to use the euclidean distance to find the distance between them.
Histogram:A histogram is a graphical display of numerical values. We are going to use the image vector for all three images and then find the euclidean distance between them. Based on the values returned the image with a lesser distance is more similar than the other.
To find the similarity between the two images we are going to use the following approach :
Read the image files as an array.
Since the image files are colored there are 3 channels for RGB values. We are going to flatten them such that each image is a single 1-D array.
Once we have our image files as an array we are going to generate a histogram for each image where for each index 0 – 255 we are going the count the occurrence of that pixel value in the image.
Once we have our histograms we are going to use the L2-Norm or Euclidean Distance to find the difference the two histograms.
Based on the distance between the histogram of our test image and the reference images we can find the image our test image is most similar to.
Coding for Image Similarity in Python Import the dependencies we are going to use from PIL import Image from collections import Counter import numpy as npWe are going to use NumPy for storing the image as a NumPy array, Image to read the image in terms of numerical values and Counter to count the number of times each pixel value (0-255) occurs in the images.
Reading the Image
We can see that out image has been successfully read as a 3-D array. In the next step, we need to flatten this 3-D array into a 1-Dimensional array.
flat_array_1 = array1.flatten() print(np.shape(flat_array_1)) >>> (245760, )We are going to do the same steps for the other two images. I will skip that here so that you can try your hands on it too.
Generating the Count-Histogram-Vector : RH1 = Counter(flat_array_1)The following line of code returns a dictionary where the key corresponds to the pixel value and the value of the key is the number of times that pixel is present in the image.
One limitation of Euclidean distance is that it requires all the vectors to be normalized i.e both the vectors need to be of the same dimensions. To ensure that our histogram vector is normalized we are going to use a for loop from 0-255 and generate our histogram with the value of the key if the key is present in the image else we append a 0.
H1 = [] for i in range(256): if i in RH1.keys(): H1.append(RH1[i]) else: H1.append(0)The above piece of code generates a vector of size (256, ) where each index corresponds to the pixel value and the value corresponds to the count of the pixel in that image.
We follow the same steps for the other two images and obtain their corresponding Count-Histogram-Vectors. At this point we have our final vectors for both the reference images and the test image and all we need to do is calculate the distances and predict.
Euclidean Distance Function : def L2Norm(H1,H2): distance =0 for i in range(len(H1)): distance += np.square(H1[i]-H2[i]) return np.sqrt(distance)The above function takes in two histograms and returns the euclidean distance between them.
Evaluation :Since we have everything we need to find the image similarities let us find out the distance between the test image and our first reference image.
dist_test_ref_1 = L2Norm(H1,test_H) print("The distance between Reference_Image_1 and Test Image is : {}".format(dist_test_ref_1)) >>> The distance between Reference_Image_1 and Test Image is : 9882.175468994668Let us now find out the distance between the test image and our second reference image.
dist_test_ref_2 = L2Norm(H2,test_H) print("The distance between Reference_Image_2 and Test Image is : {}".format(dist_test_ref_2)) >>> The distance between Reference_Image_2 and Test Image is : 137929.0223122023Optimizing Exploratory Data Analysis Using Functions In Python!
This article was published as a part of the Data Science Blogathon.
“The more, the merrier”.
It is a perfect saying for the amount of analysis done on any dataset.
As more and more opt for a career in Data Science, the more is the need to have a Fastrack way to guide each and everyone through the path. I learned python as the base to start and then gradually added skills that helped me grow in the data science domain.
In this post, I will be adding all the important steps and python functions you can use for Exploratory Data Analysis (EDA) on any dataset.
Okay, today’s plan is to run our fingers through data and figure out as much as we can but all in an optimized way. I am writing this article to share user-defined functions to help and shorten the EDA coding time.
The most important steps to follow in a project are:
Importing the dataData validation Column datatype
Imputing null/missing values
Data exploration (EDA) Univariate
Bivariate
Multivariate
Feature Engineering
Transformation/Scaling
Model building (applying machine-learning algorithms) and tuning
Score calculation
Index IntroductionUnivariate analysis
Bi-variate analysis
Multi-variate analysis
Helpful functions
Summary
IntroductionThe most important and time-consuming part of any analytics problem is understanding the data. It is better to spend time studying the data rather than coding the same thing again and again.
The functions we are going to build today are pretty general and you can adapt them as per your requirement.
The pseudo-code for a user-defined function in python is:
Function Definition: def func_name(parameters ): # function name and parameters "function_steps" function_commands return [return_value] Function call: func_name(parameters) Function for Univariate analysis: Categorical:Below function plots count plot for the feature being passed to the function.
def plot_cat(var, l=8,b=5): plt.figure(figsize = (l, b)) sns.countplot(df1[var], order = df1[var].value_counts().index) Continuous: For a simple distplot for continuous feature def plot_cont(var, l=8,b=5): plt.figure(figsize=(l, b)) sns.distplot(df1[var]) plt.xlabel(var)2. To view a detailed kde plot with all details:
# plot kde plot with median and Std values def plot_cont_kde(var, l=8,b=5): mini = df1[var].min() maxi = df1[var].max() ran = df1[var].max()-df1[var].min() mean = df1[var].mean() skew = df1[var].skew() kurt = df1[var].kurtosis() median = df1[var].median() st_dev = df1[var].std() points = mean-st_dev, mean+st_dev fig, axes=plt.subplots(1,2) sns.boxplot(data=df1,x=var, ax=axes[0]) sns.distplot(a=df1[var], ax=axes[1], color='#ff4125') sns.lineplot(points, [0,0], color = 'black', label = "std_dev") sns.scatterplot([mini, maxi], [0,0], color = 'orange', label = "min/max") sns.scatterplot([mean], [0], color = 'red', label = "mean") sns.scatterplot([median], [0], color = 'blue', label = "median") fig.set_size_inches(l,b) plt.title('std_dev = {}; kurtosis = {};nskew = {}; range = {}nmean = {}; median = {}'.format((round(points[0],2),round(points[1],2)), round(kurt,2),round(skew,2),(round(mini,2),round(maxi,2), round(ran,2)),round(mean,2), round(median,2))) Functions for Bi-variate analysis:The bi-variate analysis is very helpful in finding out correlation patterns and to test our hypothesis. This will help us infer and build different features to feed into our model.
Categorical-Categorical:
def BVA_categorical_plot(data, tar, cat): '''take data and two categorical variables, calculates the chi2 significance between the two variables and prints the result with countplot & CrossTab ''' #isolating the variables data = data[[cat,tar]][:] #forming a crosstab table = pd.crosstab(data[tar],data[cat],) f_obs = np.array([table.iloc[0][:].values, table.iloc[1][:].values]) #performing chi2 test from scipy.stats import chi2_contingency chi, p, dof, expected = chi2_contingency(f_obs) #checking whether results are significant if p<0.05: sig = True else: sig = False #plotting grouped plot sns.countplot(x=cat, hue=tar, data=data) plt.title("p-value = {}n difference significant? = {}n".format(round(p,8),sig)) #plotting percent stacked bar plot #sns.catplot(ax, kind='stacked') ax1 = data.groupby(cat)[tar].value_counts(normalize=True).unstack() ax1.plot(kind='bar', stacked='True',title=str(ax1)) int_level = data[cat].value_counts()Categorical-Continuous:
Here, I have used two functions, one to calculate z-value and the others to plot the relation between our features.
def TwoSampleZ(X1, X2, sigma1, sigma2, N1, N2): ''' function takes mean, standard dev., and no. of observations and returns: p-value calculated for 2-sampled Z-Test ''' from numpy import sqrt, abs, round from scipy.stats import norm ovr_sigma = sqrt(sigma1**2/N1 + sigma2**2/N2) z = (X1 - X2)/ovr_sigma pval = 2*(1 - norm.cdf(abs(z))) return pval def Bivariate_cont_cat(data, cont, cat, category): #creating 2 samples x1 = data[cont][data[cat]==category][:] # all categorical features x2 = data[cont][~(data[cat]==category)][:] # all continuous features #calculating descriptives n1, n2 = x1.shape[0], x2.shape[0] m1, m2 = x1.mean(), x2.mean() # calculates mean std1, std2 = x1.std(), x2.mean() # calculates standard deviation #calculating p-values z_p_val = TwoSampleZ(m1, m2, std1, std2, n1, n2) #table table = pd.pivot_table(data=data, values=cont, columns=cat, aggfunc = np.mean) #plotting plt.figure(figsize = (15,6), dpi=140) #barplot plt.subplot(1,2,1) sns.barplot([str(category),'not {}'.format(category)], [m1, m2]) plt.ylabel('mean {}'.format(cont)) plt.xlabel(cat) plt.title(' n z-test p-value = {}n {}'.format(z_p_val,table)) # boxplot plt.subplot(1,2,2) sns.boxplot(x=cat, y=cont, data=data) plt.title('categorical boxplot') Continuous-Continuous: #Defining a function to calculate correlation among columns: def corr_2_cols(Col1, Col2): res = pd.crosstab(df1[Col1],df1[Col2]) # res = df1.groupby([Col1, Col2]).size().unstack() res['perc'] = (res[res.columns[1]]/(res[res.columns[0]] + res[res.columns[1]])) return res Functions for Multi-variate analysis: def Grouped_Box_Plot(data, cont, cat1, cat2): #boxplot sns.boxplot(x=cat1, y=cont, hue=cat2, data=data, orient='v') plt.title('Boxplot') SummaryAll the above functions help us cut the time and reduce redundancy in our code.
There are times when you will be in need to change the type of plot or add more details in the same. You can alter any function as per your requirement. Do note “Always follow a structure to complete your EDA”. I have shared the steps above you should follow while working with the dataset.
-Rohit
Related
Update the detailed information about Implementing Particle Swarm Optimization Using Python on the Minhminhbmm.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!