Trending March 2024 # A Comprehensive Guide On Data Visualization In Python # Suggested April 2024 # Top 5 Popular

You are reading the article A Comprehensive Guide On Data Visualization In Python updated in March 2024 on the website Minhminhbmm.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested April 2024 A Comprehensive Guide On Data Visualization In Python

This article was published as a part of the Data Science Blogathon

Data visualization is the process of finding, interpreting, and comparing data so that it can communicate more clearly complex ideas, thus making it easier to identify once analysis of logical patterns.

Data visualization is important for many analytical tasks including data summaries, test data analysis, and model output analysis. One of the easiest ways to connect with other people is to see good.

Fortunately, Python has many libraries that provide useful tools for extracting data from data. The most popular of these are Matplotlib, Seaborn, Bokeh, Altair, etc.

Introduction

The ways we plan and visualize the details change quickly and become more and more difficult with each passing day. Due to the proliferation of social media, the availability of mobile devices, and the installation of digital services, data is available for any human activity using technology. The information produced is very important and enables us to analyze styles and patterns and to use big data to draw connections between events. Therefore, data recognition can be an effective way to present the end-user with 

comprehensible details in real-time.

Image 1

Data visualization can be important for strategic communication: it helps us interpret available data; identify patterns, tendencies, and inconsistencies; make decisions, and analyze existing processes. All told, it could have a profound effect on the business world. Every company has data, be it contacting customers and senior management or helping to manage the organization itself. Only through research and interpretation can this data be interpreted and converted into information. This article seeks to guide students through a series of basic indicators to help them understand the perception of data and its components and equips them with the tools and platforms they need to create interactive views and analyze data. It seeks to provide students with basic names and crashes courses on design principles that govern data visibility so that they can create and analyze market research reports.

Table of Contents

What is Data Visualization?

Importance of data visualization

Data Visualization Process

Basic principles for data visualization

Data visualization formats

Data Visualization in Python

Color Schemes for Visualization of Data in Python

Other tools for data visualization

Conclusion

End Notes

Data visualization is the practice of translating data into visual contexts, such as a map or graph, to make data easier for the human brain to understand and to draw comprehension from. The main goal of data viewing is to make it easier to identify patterns, styles, and vendors in large data sets. The term is often used in a unique way, including information drawings, information visuals, and mathematical diagrams.

Image 2

Data visualization is one of the steps in the data science process, which, once data has been collected, processed, and modeled, must be visualized to conclude. Data detection is also a feature of the broader data delivery (DPA) discipline, which aims to identify, retrieve, manage, format, and deliver data in a highly efficient manner.

Viewing data is important for almost every job. It can be used by teachers to demonstrate student test results, by computer science artificial intelligence (AI) developers, or by information sharing managers and stakeholders. It also plays an important role in big data projects. As businesses accumulated large data collections during the early years of big data, they needed a way to quickly and easily view all of their data. The viewing tools were naturally matched.

Importance of Data Visualization

We live in a time of visual information, and visual content plays an important role in every moment of our lives. Research conducted by SHIFT Disruptive Learning has shown that we usually process images 60,000 times faster than a table or text and that our brains do a better job of remembering them in the future. The study found that after three days, the analyzed studies retained between 10% and 20% of written or spoken information, compared to 65% of visual information.

The human brain can perceive imagery in just 13 milliseconds and store information, as long as it is associated with the concept. Our eyes can capture 36,000 visual messages per hour.  

40% of nerve fibers are connected to the retina.

All of this shows that people are better at processing visual information, which is embedded in our long-term memory. As a result, in reports and statements, visual representation using images is a more effective way of communicating information than text or table; and takes up very little space. This means that data visibility is more attractive, easier to interact with, and easier to remember.

Data Visualization Process

Several different fields are involved in the data recognition process, to facilitate or reveal existing relationships or discovering something new in a dataset.

1. Filtering and processing.

Refining and refining data transforms it into information by analyzing, interpreting, summarizing, comparing, and researching.

2. Translation & visual representation.

Creating visual representation by describing image sources, language, context, and word of introduction, all for the recipient.

3. Visualization and interpretation.

Finally, visual acuity is effective if it has a cognitive impact on 

knowledge construction.

Basic principles for data visualization

The purpose of seeing data is to help us understand 

something they do not represent. It is a way of telling stories and research results, too as data analysis and testing platform. So, you have a good understanding of how to create data recognition will help us to create meaning as well as easy-to-remember reports, infographics, and dashboards. Creating the right perspective helps us to solve problems and analyze subject material in detail. The first step in representing the information is trying to understand that data perception.

1. Preview: This ensures that viewers have more data comprehension, as their starting point for checking. This means giving them a visual summary of different types of data, describing their relationship at the same time. This strategy helps us to visualize the process of data, in all its different levels, simultaneously.

2. Zoom in and filter: The second step involves inserting the first so that viewers can understand the data basement. Zoom in / out enables us to select available data subsets that meet certain methods while maintaining the concept of position and context.

Data visualization formats 1. Bar Charts

Bar charts are one of the most popular ways to visualize data because it presents quickly set data 

an understandable format that allows viewers to see height and depth at a glance.

They are very diverse and are often used comparing different categories, analyzing changes over time, or comparing certain parts. The three variations on the bar chart are:

Vertical column:

The data is used chronologically, too it should be in left-to-right format.

Horizontal column:

It is used to visualize categories

Full stacked column:

Used to visualize the categories that together add up to 100%

Source: Netquest- A Comprehensive Guide to Data Visualization (Melisa Matias)

2. Histograms

Histograms represent flexibility in the form of bars, where the face of each bar is equal to the number of values ​​represented. They offer an overview of demographic or sample distribution with a particular aspect. The two differences in the histogram are:

Standing columns

Horizontal columns

Source: Netquest- A Comprehensive Guide to Data Visualization (Melisa Matias)

3. Pie charts

The pie chart contains a circle divided into categories, each representing a portion of the theme. They can be divided into no more than five data groups. They can be useful for comparing different or continuous data.

The two differences in the pie chart are:

Standard: Used to show relationships between components.

Donut: A variation of style that facilitates the inclusion of a whole value or design element in the center.

Source: Netquest- A Comprehensive Guide to Data Visualization (Melisa Matias)

4. Scatter Plot

Scatter plots sites use a point spread over the Cartesian integration plane to show the relationship between the two variables. They also help us determine whether the different data groups are related or not.

Source: Netquest- A Comprehensive Guide to Data Visualization (Melisa Matias)

5. Heat Maps

Source: Netquest- A Comprehensive Guide to Data Visualization (Melisa Matias)

6. Line Plot

This is used to display changes or trends in data over time. They are especially useful in showing relationships, speeding, slowing down, a

nd instability in the data set.

Source: Netquest- A Comprehensive Guide to Data Visualization (Melisa Matias)

Color Schemes for Data Visualization in Python

Color is one of the most powerful data resources visual acuity, and it is important if we are to understand the details correctly. Color can be used to separate elements, balance or represents values, and interacts with cultural symbols associated with a particular color. It rules our understanding again so that we can analyze it, we must first understand its three types:

Hue: This is what we usually think of when we upload a photo color. There is no order of colors; they can only be distinguished by their characteristics (blue, red, yellow, etc.).

Brightness: This is an average measure that describes the amount of light reflected in an object with another. Light is measured on a scale, and we can talk about bright and dark values ​​in one color.

Saturation

: this refers to the intensity of a given color. It varies according to light. Dark colors are less saturated, and when color is less saturated, they approach gray. In other words, it comes close to a neutral (empty) color. The following diagram provides a summary of the color application.

to Data Visualization (Melisa Matias)

Data Visualization in Python

We’ll start with a basic look at the details, then move on to chart planning and finally, we’ll create working charts. 

We will work with two data shares that will match the display we are showing in the article, data sets can be downloaded here

It is a description of the popularity of Internet search in three terms related to artificial intelligence (data science, machine learning, and in-depth learning). They were removed from a popular search engine.

There are two chúng tôi and chúng tôi files. The first one we will use in most studies includes data on the popularity of three words over time (from 2004 to now, 2023). In addition, I have added category variables (singular and zero) to show the functionality of charts that vary by category.

The chúng tôi file contains country-class preference data. We will use it in the final section of the article when working with maps.

Before we move on to the more sophisticated methods, let’s start with the most basic way of visualizing data. We will simply use pandas to look at the details and get an idea of ​​how it is being distributed.

The first thing we have to do is visualize a few examples to see which columns, what information they contain, how the numbers are written.



In the descriptive command, we will see how the data is distributed, size, minimum, mean.

df.describe()

With the information command, we will see what kind of data each column includes. We can find a column case that when viewed with a command of the head appears to be a number but if we look at the data following the values ​​of the string format, the variable will be written as a character unit.

df.info() Data Visualization in Python using Matplotlib

Matplotlib is the most basic library for viewing information about drawings. It includes as many graphs as we can think of. Just because it is basic does not mean that it is weak, many of the other viewing libraries we will be talking about are based on it.

Matplotlib charts are made up of two main elements, axes (lines separating the chart area) and a number (where we draw the X-axis and Y-axis). Now let’s build the simplest graph:

import matplotlib.pyplot as plt plt.plot(df['Mes'], df['data science'], label='data science')

We can make graphs of many variations on the same graph and compare them.

plt.plot(df['Mes'], df['data science'], label='data science') plt.plot(df['Mes'], df['machine learning'], label='machine learning') plt.plot(df['Mes'], df['deep learning'], label='deep learning') plt.xlabel('Date') plt.ylabel('Popularity') plt.title('Popularity of AI terms by date') plt.grid(True) plt.legend()

If you are working with Python from a terminal or script, after explaining the graph of the functions listed above use chúng tôi (). If working from Jupyter notebook, add% matplotlib to the queue at the beginning of the file and run it before creating a chart.

We can do many graphics in one number. This is best done by comparing charts or sharing information from several types of charts easily with a single image.

fig, axes = plt.subplots(2,2) axes[0, 0].hist(df['data science']) axes[0, 1].scatter(df['Mes'], df['data science']) axes[1, 0].plot(df['Mes'], df['machine learning']) axes[1, 1].plot(df['Mes'], df['deep learning'])

We can draw a graph with different styles of different points for each:

plt.plot(df['Mes'], df['data science'], 'r-') plt.plot(df['Mes'], df['data science']*2, 'bs') plt.plot(df['Mes'], df['data science']*3, 'g^')

Now let’s look at a few examples of different graphics we can make with Matplotlib. We start with the scatterplot:

plt.scatter(df['data science'], df['machine learning'])

With Bar chart:

plt.bar(df['Mes'], df['machine learning'], width=20)

With Histogram:

plt.hist(df['deep learning'], bins=15) Data Visualization in Python using Seaborn

Seaborn is a library based on Matplotlib. Basically what it offers us are beautiful drawings and works to create complex types of drawings with just one line of code.

We enter the library and start drawing style with chúng tôi (), without this command the graphics will still have the same style as Matplotlib. We show you one of the simplest graphics, scatterplot.

import seaborn as sns sns.set() sns.scatterplot(df['Mes'], df['data science'])

We can add details of more than two changes to the same graph. In this case, we use colors and sizes. We also create a separate graph depending on the category column value:

sns.relplot(x='Mes', y='deep learning', hue='data science', size='machine learning', col='categorical', data=df)

One of the most popular drawings provided by Seaborn is the heatmap. It is very common to use it to show all connections between variables in the dataset:

sns.heatmap(df.corr(), annot=True, fmt='.2f')

Another favorite is the pair plot which shows the relationship between all the variables. Be aware of this function if you have a large database, as it should show all data points as often as columns, meaning that by increasing the data size, the processing time is greatly increased.

sns.pairplot(df)

Now let’s make a pair plot showing charts divided into price range by category

sns.pairplot(df, hue='categorical')

A very informative joint plot graph that allows us to see the spread plot as well as the histogram of two types and see how they are distributed:

sns.jointplot(x='data science', y='machine learning', data=df)

Another interesting drawing is the VietnaminPlot:

sns.catplot(x='categorical', y='data science', kind='violin', data=df) Data Visualization in Python using Bokeh

Bokeh is a library that allows you to produce interactive graphics. We can send them to HTML text that we can share with anyone with a web browser.

It is a very useful library where we have the desire to look at things in drawings and want to be able to zoom in on a picture and walk around the picture. Or when we want to share it and allow someone else to test the data.

We start by entering the library and defining the file to save the graph:

from bokeh.plotting import figure, output_file, save output_file('data_science_popularity.html')

We draw what we want and save it to a file:

p = figure(title='data science', x_axis_label='Mes', y_axis_label='data science') p.line(df['Mes'], df['data science'], legend='popularity', line_width=2) save(p) Other Tools for Data Visualization

Some data visualization tools help in visualizing the data effectively and faster than the traditional python coding method. These are some of the examples:

Databox

Databox is a data recognition tool used by more than 15,000 businesses and marketing agencies. Databox pulls your data in one place to track real-time performance with attractive displays.

Databox is ideal for marketing groups that want to be quickly set up with dashboards. With a single 70+ combination and no need to code, it is a very easy tool to use.

Zoho Analytics

Zoho Analytics is probably one of the most popular BI tools on this list. One thing you can be sure of is that with Zoho analytics, you can upload your data securely. Additionally, you can use a variety of charts, tables, and objects to transform your data concisely.

Tableau

If you want to easily visualize and visualize data, then Tableau is a tool for visualizing your data. It helps you to create charts, maps, and all other technical graphics. To improve your visual presentation, you can also get a desktop app.

Additionally, if you are experiencing a problem with the installation of any third-party application, then it provides a “lock server” solution to help visualize online and mobile messaging applications.

You can check out my article on Analytics Vidhya for more information on trending Data Visualization Tools. Top 10 Data Visualization Tools.

Conclusion

With all these different libraries you may be wondering which library is right for your project. The quick answer is a library that lets you easily create the image you want.

In the initial stages of the project, with pandas and pandas profiling we will make a quick visualization to understand the data. If we need to visualize more details we can use simple graphs that we can find in the plots such as scatterplots or histograms.

End Notes

In this article, we discussed Data Visualization. Some basic formats of data visualization and some practical implementation of python libraries for data visualization. Finally, we concluded with some tools which can perform the data visualization in python effectively.

Thanks For Reading!

About Me:

Hey, I am Sharvari Raut. I love to write!

Connect with me on:

Image Source

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

Related

You're reading A Comprehensive Guide On Data Visualization In Python

A Comprehensive Guide On Kubernetes

This article was published as a part of the Data Science Blogathon.

Image-1

Introduction

Today, In this guide, we will dive in to learn about Kubernetes and use it to deploy and manage containers at scale.

Container and microservice architecture had used more to create modern apps. Kubernetes is open-source software that allows you to deploy and manage containers at scale. It divides containers into logical parts to make your application’s management, discovery, and scaling easier.

The main goal of this guide is to provide a complete overview of the Kubernetes ecosystem while keeping it basic and straightforward. It covers Kubernetes’ core ideas before applying them to a real-world scenario.

Even if you have no prior experience with Kubernetes, this article will serve as an excellent starting point for your journey.

So, without further ado, let’s get this learning started.

Why Kubernetes?

Before we go into the technical ideas, let us start with why a developer should use Kubernetes in the first place. Here are a few reasons why developers should use Kubernetes in their projects.

Portability

When using Kubernetes, moving containerized applications from development to production appears to be an easy process. Kubernetes enables developers to orchestrate containers in various environments, including on-premises infrastructure, public and hybrid clouds.

Scalability

Kubernetes simplifies the process of defining complex containerized applications and deploying them globally across multiple clusters of servers by reducing resources based on your desired state. Kubernetes automatically checks and maintains container health when horizontally scaling applications.

Extensibility

Kubernetes has a vast and ever-expanding collection of extensions and plugins created by developers and businesses that make it simple to add unique capabilities to your clusters such as security, monitoring, or management.

Concepts

Using Kubernetes necessitates an understanding of the various abstractions it employs to represent the state of the system. That is the focus of this section. We get acquainted with the essential concepts and provide you with a clearer picture of the overall architecture.

Pods

A Pod is a collection of multiple containers of application that share storage, a unique cluster IP address, and instructions for running them (e.g. ports, restart, container image, and failure policies).

They are the foundation of the Kubernetes platform. While creating a service or a deployment, Kubernetes creates a Pod with the container inside.

Each pod runs on the node where it is scheduled and remains there until it is terminated or deleted. If the node fails or stops, Kubernetes will automatically schedule identical Pods on the cluster’s other available Nodes.

Image-2

Node

A node is a worker machine in a Kubernetes cluster that can be virtual or physical depending on the cluster type. The master is in charge of each node. The master involuntary schedules pods across all nodes in the cluster, based on their available resources and current configuration.

Each node is required to run at least two services:

Kubelet is a process that communicates between the Kubernetes master and the node.

A container runtime is in charge of downloading and running a container image (Eg: Docker)

Image-3

Services

A Service is an abstraction that describes a logical set of Pods and the policies for accessing them. Services allow for the loose coupling of dependent Pods.

Even though each pod has a distinct IP-Address, those addresses are not visible to the outside world. As a result, a service enables your deployment to receive traffic from external sources.

We can expose services in a variety of ways:

ClusterIP (standard) – Only expose the port to the cluster’s internals.

NodePort – Use NAT to reveal the service on the same port on every node in the cluster

Loadbalancer – Create an external load balancer to export the service to a specified IP Address.

Image-4

Deployments

Deployments include a description of your application’s desired state. The deployment controller will process to ensure that the application’s current state matches that description.

A deployment automatically runs many replicates of your program and replaces any instances that fail or become unresponsive. Deployments help to know that your program is ready to serve user requests in this fashion.

Image-5

Installation

Before we dive into building our cluster, we must first install Kubernetes on our local workstation.

Docker Desktop

If you’re using Docker desktop on Windows or Mac, you may install Kubernetes directly from the user interface’s settings pane.

Others

If you are not using the Docker desktop, I recommend that you follow the official installation procedure for Kubectl and Minikube.

Basics

Now that we’ve covered the fundamental ideas. Let’s move on to the practical side of Kubernetes. This chapter will walk you through the fundamentals required to deploy apps in a cluster.

Creating cluster

When you launch Minikube, it immediately forms a cluster.

minikube start

After installation, the Docker desktop should also automatically construct a cluster. You may use the following commands to see if your cluster is up and running:

# Get information about the cluster kubectl cluster-info # Get all nodes of the cluster kubectl get nodes

Deploying an application:

Now that we’ve completed the installation and established our first cluster, we’re ready to deploy an application to Kubernetes.

kubectl create deployment nginx --image=nginx:latest

We use the create deployment command, passing inputs as the deployment name and the container image. This example deploys Nginx with one container and one replica.

Using the get deployments command, you may view your active deployments.

kubectl get deployments Information about deployments

Here are a few commands you may use to learn more about your Kubernetes deployments and pods.

Obtaining all of the pods

Using the kubectl get pods command, you can get a list of all running pods:

kubectl get pods

Detail description of a pod

Use describe command to get more detailed information about a pod.

kubectl describe pods

Logs of a pod

The data that your application would transmit to STDOUT becomes container logs. The following command will provide you access to those logs.

kubectl logs $POD_NAME

Note: You may find out the name of your pod by using the get pods or describe pods commands.

Execute command in Container

The kubectl exec command, which takes the pod name and the term to run as arguments, allows us to perform commands directly in our container.

kubectl exec $POD_NAME command

Let’s look at an example where we start a bash terminal in the container to see what I mean.

kubectl exec -it $POD_NAME bash Exposing app publicly

A service, as previously said, establishes a policy by which the deployment can be accessible. We’ll look at how this is achieved in this section and other alternatives you have when exposing your services to the public.

Developing a service:

We can build a service with the create-service command, which takes the port we wish to expose and the kind of port as parameters.

kubectl create service nodeport nginx --tcp=80:80

It will generate service for our Nginx deployment and expose our container’s port 80 to a port on our host computer.

On the host system, use the kubectl get services command to obtain the port:

Image By Author

As you can see, port 80 of the container had routed to port 31041 of my host machine. When you have the port, you may test your deployment by accessing your localhost on that port.

Deleting a service

kubectl delete service nginx

Scale up the app

Scaling your application up and down is a breeze with Kubernetes. By using this command, you may alter the number of replicas, and Kubernetes will generate and maintain everything for you.

kubectl scale deployments/nginx --replicas=5

This command will replicate our Nginx service to a maximum of five replicas.

This way of application deployment works well for tiny one-container apps but lacks the overview and reusability required for larger applications. YAML files are helpful in this situation.

YAML files allow you to specify your deployment, services, and pods using a markup language, making them more reusable and scaleable. The following chapters will go over Yaml files in detail.

Kubernetes object in YAML

Every object in Kubernetes had expressed as a declarative YAML object that specifies what and how it should run. These files had used frequently to promote the reusability of resource configurations such as deployments, services, and volumes, among others.

This section will walk you through the fundamentals of YAML and how to acquire a list of all available parameters and characteristics for a Kubernetes object. We glance through the deployment and service files to understand the syntax and how it had deployed.

Parameters of different objects

There are numerous Kubernetes objects, and it is difficult to remember every setting. That’s where the explain command comes in.

You can also acquire documentation for a specific field by using the syntax:

kubectl explain deployment.spec.replicas

Deployment file

For ease of reusability and changeability, more sophisticated deployments are typically written in YAML.

The basic file structure is as follows:

apiVersion: apps/v1 kind: Deployment metadata: # The name and label of your deployment name: mongodb-deployment labels: app: mongo spec: # How many copies of each pod do you want replicas: 3 # Which pods are managed by this deployment selector: matchLabels: app: mongo # Regular pod configuration / Defines containers, volumes and environment variable template: metadata: # label the pod labels: app: mongo spec: containers: - name: mongo image: mongo:4.2 ports: - containerPort: 27017

There are several crucial sections in the YAML file:

apiVersion – Specifies the API version.

kind – The Kubernetes object type defined in the file (e.g. deployment, service, persistent volume, …)

metadata – A description of your YAML component that includes the component’s name, labels, and other information.

spec – Specifies the attributes of your deployment, such as replicas and resource constraints.

template – The deployment file’s pod configuration.

Now that you understand the basic format, you can use the apply command to deploy the file.

Service file

Service files are structured similarly to deployments, with slight variations in the parameters.

apiVersion: v1 kind: Service metadata: name: mongo spec: selector: app: mongo ports: - port: 27017 targetPort: 27017 type: LoadBalancer Storage

When the container restarts or pod deletion, its entire file system gets deleted. It is a good sign since it keeps your stateless application from getting clogged up with unnecessary data. In other circumstances, persisting your file system’s data is critical for your application.

There are several types of storage available:

The container file system stores the data of a single container till its existence.

Volumes allow you to save data and share it between containers as long as the pod is active.

Data had saved even if the pod gets erased or restarted using persistent volumes. They’re your Kubernetes cluster’s long-term storage.

Volumes

Volumes allow you to save, exchange, and preserve data amongst numerous containers throughout the pod. It is helpful if you have pods with many containers that communicate data.

In Kubernetes, there are two phases to using a volume:

The volume had defined by the pod.

The container use volume mounts to add the volume to a given filesystem path.

You can add a volume to your pod by using the syntax:

apiVersion: v1 kind: Pod metadata: name: nginx spec: containers: - name: nginx image: nginx volumeMounts: - name: nginx-storage mountPath: /etc/nginx volumes: - name: nginx-storage emptyDir: {}

Here volumes tag is used to provide a volume mounted to a particular directory of the container filesystem (in this case, /etc/nginx).

Persistent Volumes

These are nearly identical to conventional volumes, with unique difference data had preserved even if the pod gets erased. That is why they are employed for long-term data storing needs, such as a database.

A Persistent Volume Claim (PVC) object, which connects to backend storage volumes via a series of abstractions, is the most typical way to define a persistent volume.

Example of YAML Configuration file.

apiVersion: v1 kind: PersistentVolumeClaim metadata: name: pv-claim labels: app: sampleAppName spec: accessModes: - ReadWriteOnce resources: requests: storage: 20Gi

There are more options to save your data in Kubernetes, and you may automate as much of the process as feasible. Here’s a list of a few interesting subjects to look into.

Compute Resources

In consideration of container orchestration, managing computes resources for your containers and applications is critical.

When your containers have a set number of resources, the scheduler can make wise decisions about which node to place the pod. You will also have fewer resource contention issues with diverse deployments.

In the following two parts, we will go through two types of resource definitions in depth.

Requests

Limits

Secrets

Secrets in Kubernetes allow you to securely store and manage sensitive data such as passwords, API tokens, and SSH keys.

To use a secret in your pod, you must first refer to it. It can happen in many different ways:

Using an environment variable and as a file on a drive mounted to a container.

When kubelet pulls a picture from a private registry.

Creating a secret

Secrets had created using either the kubelet command tool or by declaring a secret Kubernetes object in YAML.

Using the kubelet

Kubelet allows you to create secrets with a create command that requires only the data and the secret name. The data gets entered using a file or a literal.

kubectl create secret generic admin-credentials --from-literal=user=poweruser --from-literal=password='test123'

Using a file, the same functionality would look like this.

kubectl create secret generic admin-credentials–from-file=./username.txt –from-file=./password.txt

Making use of definition files

Secrets, like other Kubernetes objects, can be declared in a YAML file.

apiVersion: v1 kind: Secret metadata: name: secret-apikey data: apikey: YWRtaW4=

Your sensitive information is stored in the secret as a key-value pair, with apiKey as the key and YWRtaW4= as the base decoded value.

Using the apply command, you can now generate the secret.

kubectl apply -f secret.yaml

Use the stringData attribute instead if you wish to give plain data and let Kubernetes handle the encoding.

apiVersion: v1 kind: Secret metadata: name: plaintext-secret stringData: password: test

ImagePullSecrets

If you’re pulling an image from a private registry, you may need to authenticate first. When all of your nodes need to pull a specific picture, an ImagePullSecrets file maintains the authentication info and makes it available to them.

apiVersion: v1 kind: Pod metadata: name: private-image spec: containers: - name: privateapp image: gabrieltanner/graphqltesting imagePullSecrets: - name: authentification-secret Namespaces

Namespaces are virtual clusters had used to manage large projects and allocate cluster resources to many users. They offer a variety of names and can be nested within one another.

Managing and using namespaces with kubectl is simple. This section will walk you through the most common namespace actions and commands.

Look at the existing Namespaces

You can use the kubectl get namespaces command to see all of your cluster’s presently accessible namespaces.

kubectl get namespaces # Output NAME STATUS AGE default Active 32d docker Active 32d kube-public Active 32d kube-system Active 32d

Creating Namespace

Namespaces can be created with the kubectl CLI or by using YAML to create a Kubernetes object.

kubectl create namespace testnamespace # Output namespace/testnamespace created

The same functionality may be achieved with a YAML file.

apiVersion: v1 kind: Namespace metadata: name: testnamespace

The kubectl apply command can then be used to apply the configuration file.

kubectl apply -f testNamespace.yaml

Namespace Filtering

When a new object had created in Kubernetes without a custom namespace property, it adds to the default namespace.

You can do this if you want to construct your item in a different workspace.

kubectl create deployment --image=nginx nginx --namespace=testnamespace

You may now use the get command to filter for your deployment.

kubectl get deployment --namespace=testnamespace

Change Namespace

You’ve now learned how to construct objects in a namespace other than the default. However, adding the namespace to each command you want to run takes time and returns an error.

As a result, you can use the set-context command to change the default context to which instructions had applied.

kubectl config set-context $(kubectl config current-context) --namespace=testnamespace

The get-context command can be used to validate the modifications.

kubectl config get-contexts # Output CURRENT NAME CLUSTER AUTHINFO NAMESPACE * Default Default Default testnamespace Kubernetes with Docker Compose

For individuals coming from the Docker community, writing Docker Compose files rather than Kubernetes objects may be simple. Kompose comes into play in this situation. It uses a simple CLI to convert or deploy your docker-compose file to Kubernetes (command-line interface).

How to Install Kompose

It is easy and quickly deployed on all three mature operating systems.

To install Kompose on Linux or Mac, curl the binaries.

# Linux # macOS chmod +x kompose sudo mv ./kompose /usr/local/bin/kompose

Deploying using Kompose

Kompose deploys Docker Compose files on Kubernetes using existing Docker Compose files. Consider the following compose file as an example.

version: "2" services: redis-master: image: chúng tôi ports: - "6379" redis-slave: image: gcr.io/google_samples/gb-redisslave:v1 ports: - "6379" environment: - GET_HOSTS_FROM=dns frontend: image: gcr.io/google-samples/gb-frontend:v4 ports: - "80:80" environment: - GET_HOSTS_FROM=dns labels: kompose.service.type: LoadBalancer

Kompose, like Docker Compose, lets us deploy our setup with a single command.

kompose up

You should now be able to see the resources that had produced.

kubectl get deployment,svc,pods,pvc

Converting Kompose

Kompose can also turn your existing Docker Compose file into the Kubernetes object you need.

kompose convert

The apply command had used to deploy your application.

kubectl apply -f filenames Application Deployment

Now that you’ve mastered the theory and all of Kubernetes’ core ideas, it’s time to put what you’ve learned into practice. This chapter will show you how to use Kubernetes to deploy a backend application.

This tutorial’s specific application is a GraphQL boilerplate for the chúng tôi backend framework.

First, let’s clone the repository.

Images to a Registry

We must first push the images to a publicly accessible Image Registry before starting the construction of Kubernetes objects. It can be a public registry like DockerHub or a private registry of your own.

Visit this post for additional information on creating your own private Docker Image.

To push the image, include the image tag in your Compose file along with the registry you want to move.

version: '3' services: nodejs: build: context: ./ dockerfile: Dockerfile image: gabrieltanner.dev/nestgraphql restart: always environment: - DATABASE_HOST=mongo - PORT=3000 ports: - '3000:3000' depends_on: [mongo] mongo: image: mongo ports: - '27017:27017' volumes: - mongo_data:/data/db volumes: mongo_data: {}

I used a private registry that I had previously set up, but DockerHub would work just as well.

Creating Kubernetes objects

Now that you’ve published your image to a registry, we’ll write our Kubernetes objects.

To begin, create a new directory in which to save the deployments.

mkdir deployments cd deployments touch mongo.yaml touch nestjs.yaml

It is how the MongoDB service and deployment will look.

apiVersion: v1 kind: Service metadata: name: mongo spec: selector: app: mongo ports: - port: 27017 targetPort: 27017 --- apiVersion: apps/v1 kind: Deployment metadata: name: mongo spec: selector: matchLabels: app: mongo template: metadata: labels: app: mongo spec: containers: - name: mongo image: mongo ports: - containerPort: 27017

A deployment object with a single MongoDB container called mongo had included in the file. It also comes with a service that allows the Kubernetes network to use port 27017.

Because the container requires some additional settings, such as environment variables and imagePullSecrets, the chúng tôi Kubernetes object is a little more complicated.

A load balancer helps the service that makes the port available on the host machine.

Deploy the application

Now that the Kubernetes object files are ready. Let us use kubectl to deploy them.

kubectl apply -f mongo.yaml kubectl apply -f nestjs.yaml

On localhost/graphql, you should now view the GraphQL playground.

Congratulations, you’ve just deployed your first Kubernetes application.

Image-6

You persevered to the end! I hope this guide has given you a better understanding of Kubernetes and the way to use it to improve your developer process, with better production-grade solutions.

Kubernetes was created using Google’s ten years of expertise running containerized apps at scale. It has already been adopted by the top public cloud suppliers and technology providers and is now being adopted by the majority of software manufacturers and companies. It even resulted in the formation of the Cloud Native Computing Foundation (CNCF) in 2024, which was the first project to graduate under CNCF and began streamlining the container ecosystem alongside other container-related projects like CNI, Containers, Envoy, Fluentd, gRPC, Jagger, Linkerd, and Prometheus. Its immaculate design, cooperation with industry leaders, making it open source, and always being open to ideas and contributions may be the main reasons for its popularity and endorsement at such a high level.

Share this with other developers, if you find it useful.

To know more about Kubernetes, Check out the links below

Learn basic tenets from our blog.

References

Image-1 – Photo by  Ian Taylor On Unsplash

Image-6 – Photo by  Xan Griffin On Unsplash

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion. 

Related

Comprehensive Guide On Tcp/Ip Model

What is TCP/IP Model?

Start Your Free Software Development Course

Web development, programming languages, Software testing & others

Understanding TCP/IP Model?

The United States Defense Department initially developed the Internet Protocol Suite during the 1970s. It connects heterogeneous systems and contains a popular set of communication protocols. TCP and IP are the most widely used protocols and differ from the OSI model.

How does TCP/IP work?

Below are a few points explaining the working of TCP/IP:

1. Network Access Layer

Here, the OSI model’s physical layer and data link layer combine to form the network access layer. It allows the transmission of data, physically, through the protocols and hardware elements of the layer. The availability of ARP is measured at layer 3 and summed up by layer 2 protocols.

2. Internet Layer

IP: Stands for “Internet Protocol,” and it is in charge of packet distribution. This distribution is achieved between the source and the destination through the IP addresses in the packet headers. IPv4 and IPv6 are the most widely used versions. All current websites use IPv4, while IPv6 is largely growing in numbers.

ICMP: Stands for “Internet Control Message Protocol.” All information regarding the network program is scripted here; it is measured to sum up with IP datagrams.

ARP: Stands for “Address Resolution Protocol.” ARP determines the hardware address from the specified internet protocol address. The major classifications of ARP are: Reverse ARP, Gratuitous ARP, Proxy ARP, and Inverse ARP.

3. Host-to-Host Layer

It is much equivalent to the OSI model transport layer. All the complexities of data are shielded from the upper layers. The key protocols here are:

User Datagram Protocol (UDP): It is very cost-effective and can be used when security is not a major factor. UDP is a connectionless protocol.

4. Process Layer

All the functions of the top three layers of the OSI model, i.e., Application Layer, Session Layer, and Presentation layer, execute herein. It is responsible for controlling user interface specifications and node-to-node communication. The most commonly used protocols are: HTTP SNMP, NTP,NFS, HTTPS, FTP, TFTP, Telnet, SSH, SMTP, DNS, DHCP, NFS, X Window, and LPD.

HTTP and HTTPS: Stand for Hypertext Transfer Protocol. These HTTP and HTTPS protocols manage server and browser communication. SSL and HTTP intermix herein. It is a systematic protocol for browser forms (sign in, validation, bank transactions).

SSH: Stands for Secure Shell and is very much similar to Telnet. It is terminal emulation software. The primary reason for preferring SSH is its encrypted connection. Moreover, it is a highly secure network.

NTP: Stands for Network Time Protocol. It harmonizes the clocks in the computer systems into a single standard time zone. NTP plays a crucial role in bank-oriented transactions.

Advantages of TCP/IP Model

Deployable for network-oriented problems.

The model allows communication between heterogeneous networks.

It is an open network protocol suite, which makes it available to an individual or an institute.

A scalable client-oriented architecture allows network additions without current services.

For every system on the network, there is an IP value.

Scope of TCP/IP Model:

In the communication world, the base unit is packets. These packets are built using TCP/IP protocols. Every operating system has several sole ethics coded keen on its functioning of the TCP/IP stack. OS fingerprinting works on this basis, By swot up these exclusive ethics, values like MTU and MSS. It has been whispered previously to identify the irregular; there is a need first to recognize what is usual.

Thus, TCP/IP is a powerful network communication protocol and program that allows access to remote terminals and computers through internet systems.

Recommended Articles

For related articles to the subject, please visit the following links:

Variables In Python: A Complete Guide (Updated 2023)

In Python, you can store values to variables.

The value of a variable can then be changed throughout the execution of the program.

For example, let’s create a bunch of variables:

x = 10 name = "Alice" age = 58

Why variables?

As you can imagine, storing data is essential to a computer program.

For instance, a typical game application keeps track of some kind of score. Behind the scenes, the score is a variable that is updated based on certain actions.

This is a comprehensive guide on variables in Python. You are going to learn important concepts, such as:

Variable Assignment

Variable Types

References and Identity

Naming conventions

All the theory is backed up by great examples.

Variable Assignment

To create a variable in Python:

Come up with a name for the variable.

Use the assignment operator (=) to give it a value.

For example, let’s create a variable called name and assign the name “Alice” to it:

name = "Alice"

Once you have created a variable in your program, you can use it. This is the whole point of variables in Python.

For example, let’s display the value of the variable name:

print(name)

Output:

Alice

Now you can also change the value of the name variable.

For instance, let’s turn the name into Bob:

name = "Bob" print(name)

Output:

Bob

As you can see, now the name is no longer Alice. Instead, it was changed to Bob. It will remain Bob until you change it again.

In this example, you learned how to create variables with the assignment operator.

Chained Assignment

Sometimes when you create multiple variables all with the same value, you introduce unnecessary repetition in your code.

To fix this, Python supports what is called a chained assignment.

A chained assignment allows you to assign the same value to multiple variables in a single expression.

To demonstrate this, let’s create three variables on separate lines:

x = 0.0 y = 0.0 z = 0.0

However, because the value of each variable is the same, you could have used a chained assignment like this:

x = y = z = 0.0

Let’s print these variables to see that they are indeed 0.0 each:

print(x, y, z)

Output:

0.0 0.0 0.0

Now you understand the basics of variables in Python.

Next, let’s talk about the types of variables there are in Python.

Variables Types in Python

Variables are by no means a Python-only concept. Instead, variables are broadly used in each and every programming language as one of the building blocks of code.

In many other programming languages, variables are statically typed.

This means a variable is restricted to a single data type. In other words, no value of a different data type can be assigned to it.

As an example, if you declare an integer variable, you cannot assign a string to it.

This is what is meant by static typing.

However, Python does not have this restriction!

In Python, variables are dynamically typed.

In other words, the data type of a variable is open to change.

For example, let’s create an integer variable and turn it into a string on the next line:

x = 10 x = "Test" print(x)

Output:

Test

As you can see, this change of data type causes no problems in Python.

In some other languages, doing this kind of change would cause errors in your code.

Object Identity

In Python, every single object is automatically associated with a unique identifier number, or ID for short.

In a Python program, it is guaranteed there can be no two objects with the same ID.

Occasionally, it can be useful to be able to check the ID of the object.

To do this, use the id() function.

For example, let’s store two integer objects into variables and check their identity:

a = 10 b = 20 print(id(a)) print(id(b))

Output:

9789280 9789600

As you can see, the two IDs are different.

In other words, the integer objects 10 and 20 are stored in different memory addresses.

Object References

Although this is a beginner-friendly guide, we need to take a behind-the-scenes look at variable assignments to truly understand what happens.

This is important to understand right off the get-go because:

It causes strange behavior if you do not know how it works.

It works differently in some common programming languages.

Objects in Python

In the previous chapter, you already heard the word “object”.

But what is an object? Is it the same as a variable?

Python is an object-oriented programming language. This means practically everything in Python is an object.

You are going to hear this a lot.

But what does it mean?

Consider this piece of code:

print(1000)

Apparently, this displays the number 1000 in the console.

But behind the scenes, there is a lot more going on.

When you run this piece of code, the Python interpreter:

Creates an integer object to the memory.

Assigns a value of 1000 to the object.

Displays the value of the object in the console.

How about when we assign a value to a variable?

In Python, a variable is just a symbolic name for an object behind the scenes.

When you assign an object to a variable, you use the variable as a reference to the object somewhere in memory.

In other words, the variable itself is not an object. Instead, it is a pointer to an object.

All of this probably sounds confusing to you. Let’s see an example with an illustration to help to understand it better.

For instance, let’s create a variable x and assign 1000 to it:

x = 1000

Behind the scenes:

An integer object is created with a value of 1000. This object is stored somewhere in the memory.

In addition, a variable x is created to reference the integer object. The x acts as a middleman between you and the actual object in memory. Without x you could not access the integer object.

Here is an illustration of the situation:

The variable x points to an integer object in memory.

But here is where it gets interesting.

Consider creating another variable y where you assign the variable x.

y = x

This does not copy the value of x to a y! Instead, it creates a new symbolic name for the same object the variable x points to.

Here is a behind-the-scenes look at the situation:

To verify these variables refer to the same object, let’s see their identities using the id() function:

print(id(x)) print(id(y))

Output:

140106690603920 140106690603920

As you can see, the IDs match.

In other words, the variables x and y are both aliases to the same object in memory.

But now, let’s assign a new value to one of the variables:

y = 5000

This creates a new integer object somewhere else in the memory and makes y point to it.

If you print the identities of the variables x and y again, you can see that now they reference different objects in memory:

print(id(x)) print(id(y))

Output:

140508061916048 140508061916016

As expected, the IDs were different.

To take home, Python variable assignment means creating a reference to an object in the memory. The variable itself is not an object. Instead, it is a name via which the object can be accessed.

Notice that in Python tutorials, variables are often called objects for short. However, they are still only references to the objects. If someone says “x is an integer object” they mean “x is a variable that refers to an integer object in memory”.

Variable Naming

Thus far you have mostly used short variable names such as x or y.

But variable names can and usually should be longer than that.

In this part, you learn how to create descriptive and valid variable names in Python.

Why Does Variable Naming Matter?

Understanding code is a brain-heavy task.

As a programmer, your task is always to write understandable code.

This is beneficial to yourself and your teammates or project members.

Writing clean and understandable code starts with variable naming.

When it comes to variable naming, you need to be descriptive. The variables should not be cryptic. Instead, the variable names should express themselves.

Before talking about the best naming conventions of Python variables, let’s briefly discuss the restrictions and rules.

Variable Naming Rules and Restrictions

In Python, a variable name can:

Contain as many characters as you want.

Consist of both lowercase and uppercase letters.

Consist of digits.

Contain the underscore (_).

In Python, a variable cannot:

Start with a digit.

Contain special characters (except for underscore).

Be a reserved keyword.

Here are some valid and invalid variable name examples:

# Valid names x = 10 VERSION10 = "test" distance_to_house = 100.00 # Invalid names 5_letters = "Hello" my name = "Artturi" first&second = "Alice, Bob" for = "test"

With the variable naming rules out of the way, let’s talk about the naming conventions.

Variable Naming Conventions

There are many different blueprints for naming variables in Python.

There is not a single agreed way to name variables.

But there is a bunch of naming conventions that are commonly used.

In this part, you are going to see a couple of commonly used naming conventions in Python.

When it comes to variable naming, the most important part is to be clear and descriptive with the name of the variable.

When your variables are self-descriptive, you or one of your teammates can understand the code easily. This makes it easier to manage and build on top of the existing code.

Do not be afraid to use longer names and combinations of different words if needed.

For example, here is a great example of a descriptive variable naming:

distanceToHouse = 100.00

And here is a really bad example:

dth = 100.00

It is easy to understand the first variable. On the other hand, it is impossible to understand what the second variable is without context.

This is a great example of a long variable name that does its job really well.

When you need to combine multiple words in a variable name, there are many ways to do it.

For example, these all are valid variable names in Python:

distancetohouse = 100.00 DISTANCETOHOUSE = 100.00 distanceToHouse = 100.00 DistanceToHouse = 100.00 distance_to_house = 100.00

But for some of these names, there is room for improvement.

For example, take a look at the first two names:

distancetohouse = 100.00 DISTANCETOHOUSE = 100.00

Because you cannot use a space in a variable name, the consecutive words are not visually separated.

To “separate” the multiple words in a variable name, there are a bunch of commonly used conventions.

Camel Case. Starting from the second word, every consecutive word starts with a capital letter.

For example, distanceToHouse

Pascal Case. Each separate word starts with a capital letter, including the first one.

For example, DistanceToHouse

Snake Case. Words are separated by underscores.

For example, distance_to_house

In Python, you commonly see the 1st or 3rd approach being used with variable names.

At the end of the day, naming conventions boil down to personal preference.

In the Python community, there is no single naming convention that everyone would follow.

Thus, you should stick with one that is visually most appealing to you. At least do not use different naming conventions in the same project.

At some point, you should also read the Style Guide for Python to learn more. However, when you are learning about variables in Python, it is too early to worry too much about conventions.

Read next: Operators in Python

Conclusion

In this guide, you learned how to use variables in Python.

To recap, every computer program needs to be able to store values so that it can use/modify them later.

In Python, you can store values to variables.

For example, let’s create a variable that holds a number:

x = 10

Behind the scenes, a variable is a symbolic name that points to an object somewhere in the memory. In other words, the variable itself is not the object but rather the “key” to it.

When you give a name to a variable, be specific. A good variable name reveals what the variable is about. This in turn makes the code easier to read and manage. Do not be afraid to use names that combine multiple words.

For instance, here is a good variable name:

distanceToHome = 100.00

Read next: Operators in Python

A Simple Guide To Perform A Comprehensive Content Audit

No matter why you have a website, it needs to be filled with great content.

Without good content, you might as well not have a website at all.

But how exactly do you know when you have good content?

You might read through a piece of content and think it’s perfectly fine, but there’s a more reliable way of figuring it out.

If you’re wondering if your content is performing well, there’s a good chance it’s time for a content audit to check for sure.

By following the right steps, knowing what to look for, and what you’re hoping to get out of your content audit, you can look forward to creating a better website.

What Is a Content Audit?

At some point, every website will need a content audit.

A content audit gives you the opportunity to review closely all of the content on your website and evaluate how it’s working for you and your current goals.

This helps show you:

What content is good.

What needs to be improved.

What should just be tossed away.

What your content goals for the future should look like.

There are also some types of websites that are more in need of content audits than others.

If you have a relatively new website where all of your content is still fresh, you won’t really be in need of a content audit for a while.

Older sites have a lot more to gain from having a content audit done, as well as websites that have a large amount of content.

This makes websites like a news site a great contender for audits. The size of a website will also affect how often a content audit is necessary.

What Is the Purpose of Content Audits?

Content is known for being a great digital marketing investment because it will continue to work for you long into the future, but that doesn’t mean that it doesn’t require some upkeep from time to time.

What worked for your website at one point might not anymore, so it only makes sense to go back and review it.

Improve Organic Ranking

If you aren’t ranking highly, it could be a problem with your content.

Some of the content you have might not be SEO-friendly, and although it might be valuable content to have, there’s no way for it to rank highly.

If the content you have is already good, optimizing it to be more SEO-friendly can be a simple change that makes a big difference in your rankings.

Revitalize Older Content

Even the best content gets old at some point.

After a while, you might end up missing out on important keywords, having content with broken links, outdated information, among other issues.

If older content isn’t performing well, that doesn’t mean it can’t serve a new purpose for your website.

Giving new life to some of your older content can give you the same effect as having something totally brand new, without requiring you to put in the amount of work that an entirely new piece of content would need.

Get Rid of Irrelevant Content

Some content that’s great at the moment only benefits you for a short while.

While you might find older content on your website that can be updated to be more useful, sometimes it has just become irrelevant.

When this is the case, you don’t have to keep it if it’s only taking up space.

Eliminate Similar & Duplicate Content

In addition to unimportant pages, you can also find duplicate content to get rid of during a content audit.

Duplicate content can often occur by accident and wasn’t created to try and cheat the system, but regardless of why you have it, you can be penalized by search engines for it.

If you do find that you have extremely similar or duplicate content, but you can’t get rid of it, you can fix the problem by canonicalizing your preferred URL.

Plan for the Future

When you go through the content you currently have, you might end up seeing some gaps that need to be filled.

When you realize you’re missing out on important information and topics your audience needs, this is the time to make up for that.

You’ll be able to realize what’s lacking in your website to create more useful content in the future.

How to Perform a Content Audit

A content audit at first glance might seem likely simply reading through your website’s content, but there’s much more to it than that.

For an effective content audit, you’ll need to rely heavily on online tools to get the data you need.

So, before you get started with a content audit, it’s important to know exactly what you’ll need to be doing beforehand.

1. Know Your Reason

If you’re going through the effort of performing a content audit, you’re not doing it for nothing.

There must be some goal that’s driving you to do this.

Not everyone will have the same reason for having a content audit, although many of the reasons might seem similar, so what you’ll want to look for might vary.

2. Use Screaming Frog to Index Your URLs

One tool that you should always use during an audit is Screaming Frog.

This tool will allow you to create an inventory of the content you have on your website by gathering URLs from your sitemap.

If you have fewer than 500 pages to audit, you can even get away with using the free version.

This is one of the easiest ways of getting all of your content together to begin your content audit.

3. Incorporate Google Analytics Data

After you’ve made an inventory of your website’s content, you’ll need to see how it’s performing.

For this, Google Analytics can give you all the information you need.

This can give you valuable insights as to how people feel about your content, such as how long they stick around for it and how many pages they’re viewing per session.

4. Examine Your Findings

The data you get from Google Analytics will make it easier for you to figure out what your next move will be.

After reviewing your findings, it might be clear what’s holding your content down.

The solution may not be obvious, but by looking closely at what your data tells you and researching, you can figure it out with a little bit of effort.

For example, if you have one great, high-quality piece of content that doesn’t get many views, it might just need to be updated slightly and reshared.

5. Make a Plan

Finally, you should figure out what the necessary changes will be and how you’ll go about making them.

If you have a long list of changes that need to be implemented, consider which ones are a priority and which ones can be fixed over time.

Planning for the future might include not just the changes to be made on existing content, but the arrangements for creating new content in the future.

Finals Thoughts

Content audits might seem intimidating, but they are key to making sure all of the content on your website is working for you and not against you.

Performing a content audit doesn’t mean that you’ve been making huge mistakes with your content.

Such an audit is simply maintenance that even websites with the best content need to do.

Getting into this can seem overwhelming, but with the right help, an audit will leave you feeling more confident in your content and will help guide your next steps.

More Resources:

Exploring Data Visualization In Altair: An Interesting Alternative To Seaborn

This article was published as a part of the Data Science Blogathon

Data Visualization is important to uncover the hidden trends and patterns in the data by converting them to visuals. For visualizing any form of data, we all might have used pivot tables and charts like bar charts, histograms, pie charts, scatter plots, line charts, map-based charts, etc., at some point in time. These are easy to understand and help us convey the exact information. Based on a detailed data analysis, we can decide how to best make use of the data at hand. This helps us to make informed decisions.

Now, if you are a Data Science or Machine Learning beginner, you surely must have tried Matplotlib and Seaborn for your data visualizations. Undoubtedly these are the two most commonly used powerful open-source Python data visualization libraries for Data Analysis.

Seaborn is based on Matplotlib and provides a high-level interface for building informative statistical visualizations. However, there is an alternative to Seaborn. This library is called ‘Altair’, an open-source Python library built for statistical data visualization. According to the official documentation, it is based on the Vega and Vega-lite language. Using Altair we can create interactive data visualizations through bar chart, histogram, scatter plot and bubble chart, grid plot and error chart, etc. similar to the Seaborn plots.

While Matplotlib library is imperative in syntax done, and the machine decides the how part of it. This gives the user freedom to focus on interpreting the data rather than being caught up in writing the correct syntax. The only downside of this declarative approach could be that the user has lesser control over customizing the visualization which is ok for most of the users unfamiliar with the coding part.

Installing Seaborn and Altair

To install these libraries from PyPi, use the following commands

pip install altair pip install seaborn Importing Basic libraries and dataset

As always, we import Pandas and NumPy libraries to handle the dataset, Matplotlib and Seaborn along with the newly installed library Altair for building the visualizations.

#importing required libraries import pandas as pd import numpy as np import seaborn as sns Import matplotlib.pyplot as plt import altair as alt

We will use the ‘mpg’ or the ‘miles per gallon’ dataset from the seaborn dataset library to generate these different plots. This famous dataset contains 398 samples and 9 attributes for automotive models of various brands. Let us explore the dataset more.

#importing dataset df = sns.load_dataset('mpg') df.shape #dataset column names df.keys()

Output

‘acceleration’, ‘model_year’, ‘origin’, ‘name’],

dtype=’object’)

#checking datatypes df.dtypes #checking dataset df.head()

This dataset is simple and has a nice blend of both categorical and numerical features. We can now plot our charts for comparison.

Scatter & Bubble plots in Seaborn and Altair

We will start with simple scatter and bubble plots. We will use the ‘mpg’ and ‘horsepower’ variables for these.

For Seaborn scatterplot, we can use either the relplot command and pass ‘scatter’ as the kind of plot

sns.relplot(y='mpg',x='horsepower',data=df,kind='scatter',size='displacement',hue='origin',aspect=1.2);

or we can directly use the scatterplot command.

sns.scatterplot(data=df, x="horsepower", y="mpg", size="displacement", hue='origin',legend=True)

whereas for Altair, we use the following syntax

alt.Chart(df).mark_point().encode(alt.Y('mpg'),alt.X('horsepower'),alt.Color('origin'),alt.OpacityValue(0.7),size='displacement')

s using another attribute ‘origin’ and control the size of the points using an additional variable ‘displacement’ for both libraries. In Seaborn, we can control the aspect ratio of the plot using the ‘aspect’ setting. However, in Altair, we can also control the opacity value of the point by passing a value between 0 to 1(1 being perfectly opaque). To convert a scatter plot in Seaborn to a bubble plot, simply pass a value for ‘sizes’ which denotes the smallest and biggest size of bubbles in the chart. For Altair, we simply pass (filled=True) for generating the bubble plot.

sns.scatterplot(data=df, x="horsepower", y="mpg", size="displacement", hue='origin',legend=True, sizes=(10, 500)) alt.Chart(df).mark_point(filled=True).encode( x='horsepower', y='mpg', size='displacement', color='origin' )

With the above scatter plots, we can understand the relationship between ‘horsepower’ and ‘mpg’ variables i.e., lower ‘horsepower’ vehicles seem to have a higher ‘mpg’. The syntax for both plots is similar and can be customized to display the values.

Line plots in Seaborn and Altair

Now, we plot line charts for ‘acceleration’ vs ‘horsepower’ attributes. The syntax for the line plots is quite simple for both. We pass DataFrame as data, the above two variables as x and y while the ‘origin’ as the legend color.

Seaborn-

sns.lineplot(data=df, x='horsepower', y='acceleration',hue='origin')

Altair-

alt.Chart(df).mark_line().encode( alt.X('horsepower'), alt.Y('acceleration'), alt.Color('origin')     )

Here we can understand that ‘usa’ vehicles have a higher range of ‘horsepower’ whereas the other two ‘japan’ and ‘europe’ have a narrower range of ‘horsepower’. Again, both graphs provide the same information nicely and look equally good. Let us move to the next one.

Bar plots & Count plots in Seaborn and Altair

In the next set of visualizations, we will plot a basic bar plot and count plot. This time, we will add a chart title as well. We will use the ‘cylinders’ and ‘mpg’ attributes as x and y for the plot.

For the Seaborn plot, we pass the above two features along with the Dataframe. To customize the color, we choose a palette=’magma_r’ from Seaborn’s predefined color palette.

sns.catplot(x='cylinders', y='mpg', hue="origin", kind="bar", data=df, palette='magma_r')

In the Altair bar plot, we pass df, x and y and specify the color based on the ‘origin’ feature. Here we can customize the size of the bars by passing a value in the ‘mark_bar’ command as shown below.

plot=alt.Chart(df).mark_bar(size=40).encode( alt.X('cylinders'), alt.Y('mpg'), alt.Color('origin') ) plot.properties(title='cylinders vs mpg')

From the above bar plots, we can see that vehicles with 4 cylinders seem to be the most efficient for ‘mpg’ values.

Here is the syntax for count plots,

Seaborn- We use the FacetGrid command to display multiple plots on a grid based on the variable ‘origin’.

g = sns.FacetGrid(df, col="cylinders", height=4,aspect=.5,hue='origin',palette='magma_r') g.map(sns.countplot, "origin", order = df['origin'].value_counts().index)

Altair- We use the ‘mark_bar’ command again but pass the ‘count()’ for cylinders column as y to generate the count plot.

alt.Chart(df).mark_bar().encode( x='origin', y='count()', column='cylinders:Q', color=alt.Color('origin') ).properties( width=100, height=100 )

From these two count plots, we can easily understand that ‘japan’ has (3,4,6) cylinder vehicles, ‘europe’ has (4,5,6) cylinder vehicles and ‘usa’ has (4,6,8) cylinder vehicles. From a syntax point of view, the libraries require inputs for the data source, x, y to plot. The output looks equally pleasing for both the libraries. Let us try a couple of more plots and compare them.

Histogram

In this set of visualizations, we will plot the basic histogram plots. In Seaborn, we use the distplot command and pass the name of the dataframe, name of the column to be plotted. We can also adjust the height and width of the plot using the ‘aspect’ setting which is a ratio of width to height.

Seaborn sns.distplot(df, x='model_year', aspect=1.2) Altair alt.Chart(df).mark_bar().encode( alt.X("model_year:Q", bin=True), y='count()', ).configure_mark( opacity=0.7, color='cyan' )

In this set of visualizations, the selected default bins are different for both libraries, and hence the plots look slightly different. We can get the same plot in Seaborn by adjusting the bin sizes.

sns.displot(df, x='model_year',bins=[70,72,74,76,78,80,82], aspect=1.2)

Now the plots look similar. However, in both the plots we can see that the maximum number of vehicles was after ’76 and prominently in the year ’82. Additionally, we used a configure command to modify the color and opacity of the bars, which sort of acts like a theme in the case of the Altair plot.

Strip plots using both Libraries

The next set of visualizations are the strip plots.

For Seaborn, we will use the stripplot command and pass the entire DataFrame and variables ‘cylinders’, ‘horsepower’ to x and y respectively.

ax = sns.stripplot(data=df, y= ‘horsepower’, x= ‘cylinders’)

For the Altair plot, we use the mark_tick command to generate the strip plot with the same variables.

alt.Chart(df).mark_tick(filled=True).encode( x='horsepower:Q', y='cylinders:O', color='origin' )

From the above plots, we can clearly see the scatter of the categorical variable ‘cylinders’ for different ‘origin’. Both the charts seem to be equally effective in conveying the relationship between the number of cylinders. For the Altair plot, you will find that the x and y columns have been interchanged in the syntax to avoid a taller and narrower-looking plot.

Interactive plots

We now come to the final set of visualization in this comparison. These are the interactive plots. Altair scores when it comes to interactive plots. The syntax is simpler as compared to Bokeh, Plotly, and Dash libraries. Seaborn, on the other hand, does not provide interactivity to any charts. This might be a letdown if you want to filter out data inside the plot itself and focus on a region/area of interest in the plot. To set up an interactive chart in Altair, we define a selection with an ‘interval’ kind of selection i.e. between two values on the chart. Then we define the active points for columns using the earlier defined selection. Next, we specify the type of chart to be shown for the selection (plotted below the main chart) and pass the ‘select’ as the filter for the displayed values.

select = alt.selection(type='interval') values = alt.Chart(df).mark_point().encode( x='horsepower:Q', y='mpg:Q', color=alt.condition(select, 'origin:N', alt.value('lightgray')) ).add_selection( select ) bars = alt.Chart(df).mark_bar().encode( y='origin:N', color='origin:N', x='count(origin):Q' ).transform_filter( select ) values & bars

For the interactive plot, we can easily visualize the count of samples for the selected area. This is useful when there are too many samples/points in one area of the chart and we want to visualize their details to understand the underlying data better.

Additional points to consider while using Altair Pie Chart & Donut Chart

Unfortunately, Altair does not support pie charts. Here is where Seaborn gets an edge i.e. you can utilize the matplotlib functionality to generate a pie chart with the Seaborn library.

Plotting grids, themes, and customizing plot sizes

Both these libraries also allow customizing of the plots in terms of generating multiple plots, manipulating the aspect ratio or the size of the figure as well as support different themes to be set for colors and backgrounds to modify the look and feel of the charts.

Advanced plots Conclusion

I hope you enjoyed reading this comparison. If you have not tried Altair before, do give it a try for building some beautiful plots in your next data visualization project!

Author Bio

Devashree has an chúng tôi degree in Information Technology from Germany and a Data Science background. As an Engineer, she enjoys working with numbers and uncovering hidden insights in diverse datasets from different sectors to build beautiful visualizations to try and solve interesting real-world machine learning problems.

In her spare time, she loves to cook, read & write, discover new Python-Machine Learning libraries or participate in coding competitions.

You can follow her on LinkedIn, GitHub, Kaggle, Medium, Twitter.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

Related

Update the detailed information about A Comprehensive Guide On Data Visualization In Python on the Minhminhbmm.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!