Trending March 2024 # A Simple Guide To Perform A Comprehensive Content Audit # Suggested April 2024 # Top 11 Popular

You are reading the article A Simple Guide To Perform A Comprehensive Content Audit updated in March 2024 on the website We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested April 2024 A Simple Guide To Perform A Comprehensive Content Audit

No matter why you have a website, it needs to be filled with great content.

Without good content, you might as well not have a website at all.

But how exactly do you know when you have good content?

You might read through a piece of content and think it’s perfectly fine, but there’s a more reliable way of figuring it out.

If you’re wondering if your content is performing well, there’s a good chance it’s time for a content audit to check for sure.

By following the right steps, knowing what to look for, and what you’re hoping to get out of your content audit, you can look forward to creating a better website.

What Is a Content Audit?

At some point, every website will need a content audit.

A content audit gives you the opportunity to review closely all of the content on your website and evaluate how it’s working for you and your current goals.

This helps show you:

What content is good.

What needs to be improved.

What should just be tossed away.

What your content goals for the future should look like.

There are also some types of websites that are more in need of content audits than others.

If you have a relatively new website where all of your content is still fresh, you won’t really be in need of a content audit for a while.

Older sites have a lot more to gain from having a content audit done, as well as websites that have a large amount of content.

This makes websites like a news site a great contender for audits. The size of a website will also affect how often a content audit is necessary.

What Is the Purpose of Content Audits?

Content is known for being a great digital marketing investment because it will continue to work for you long into the future, but that doesn’t mean that it doesn’t require some upkeep from time to time.

What worked for your website at one point might not anymore, so it only makes sense to go back and review it.

Improve Organic Ranking

If you aren’t ranking highly, it could be a problem with your content.

Some of the content you have might not be SEO-friendly, and although it might be valuable content to have, there’s no way for it to rank highly.

If the content you have is already good, optimizing it to be more SEO-friendly can be a simple change that makes a big difference in your rankings.

Revitalize Older Content

Even the best content gets old at some point.

After a while, you might end up missing out on important keywords, having content with broken links, outdated information, among other issues.

If older content isn’t performing well, that doesn’t mean it can’t serve a new purpose for your website.

Giving new life to some of your older content can give you the same effect as having something totally brand new, without requiring you to put in the amount of work that an entirely new piece of content would need.

Get Rid of Irrelevant Content

Some content that’s great at the moment only benefits you for a short while.

While you might find older content on your website that can be updated to be more useful, sometimes it has just become irrelevant.

When this is the case, you don’t have to keep it if it’s only taking up space.

Eliminate Similar & Duplicate Content

In addition to unimportant pages, you can also find duplicate content to get rid of during a content audit.

Duplicate content can often occur by accident and wasn’t created to try and cheat the system, but regardless of why you have it, you can be penalized by search engines for it.

If you do find that you have extremely similar or duplicate content, but you can’t get rid of it, you can fix the problem by canonicalizing your preferred URL.

Plan for the Future

When you go through the content you currently have, you might end up seeing some gaps that need to be filled.

When you realize you’re missing out on important information and topics your audience needs, this is the time to make up for that.

You’ll be able to realize what’s lacking in your website to create more useful content in the future.

How to Perform a Content Audit

A content audit at first glance might seem likely simply reading through your website’s content, but there’s much more to it than that.

For an effective content audit, you’ll need to rely heavily on online tools to get the data you need.

So, before you get started with a content audit, it’s important to know exactly what you’ll need to be doing beforehand.

1. Know Your Reason

If you’re going through the effort of performing a content audit, you’re not doing it for nothing.

There must be some goal that’s driving you to do this.

Not everyone will have the same reason for having a content audit, although many of the reasons might seem similar, so what you’ll want to look for might vary.

2. Use Screaming Frog to Index Your URLs

One tool that you should always use during an audit is Screaming Frog.

This tool will allow you to create an inventory of the content you have on your website by gathering URLs from your sitemap.

If you have fewer than 500 pages to audit, you can even get away with using the free version.

This is one of the easiest ways of getting all of your content together to begin your content audit.

3. Incorporate Google Analytics Data

After you’ve made an inventory of your website’s content, you’ll need to see how it’s performing.

For this, Google Analytics can give you all the information you need.

This can give you valuable insights as to how people feel about your content, such as how long they stick around for it and how many pages they’re viewing per session.

4. Examine Your Findings

The data you get from Google Analytics will make it easier for you to figure out what your next move will be.

After reviewing your findings, it might be clear what’s holding your content down.

The solution may not be obvious, but by looking closely at what your data tells you and researching, you can figure it out with a little bit of effort.

For example, if you have one great, high-quality piece of content that doesn’t get many views, it might just need to be updated slightly and reshared.

5. Make a Plan

Finally, you should figure out what the necessary changes will be and how you’ll go about making them.

If you have a long list of changes that need to be implemented, consider which ones are a priority and which ones can be fixed over time.

Planning for the future might include not just the changes to be made on existing content, but the arrangements for creating new content in the future.

Finals Thoughts

Content audits might seem intimidating, but they are key to making sure all of the content on your website is working for you and not against you.

Performing a content audit doesn’t mean that you’ve been making huge mistakes with your content.

Such an audit is simply maintenance that even websites with the best content need to do.

Getting into this can seem overwhelming, but with the right help, an audit will leave you feeling more confident in your content and will help guide your next steps.

More Resources:

You're reading A Simple Guide To Perform A Comprehensive Content Audit

A Comprehensive Guide On Kubernetes

This article was published as a part of the Data Science Blogathon.



Today, In this guide, we will dive in to learn about Kubernetes and use it to deploy and manage containers at scale.

Container and microservice architecture had used more to create modern apps. Kubernetes is open-source software that allows you to deploy and manage containers at scale. It divides containers into logical parts to make your application’s management, discovery, and scaling easier.

The main goal of this guide is to provide a complete overview of the Kubernetes ecosystem while keeping it basic and straightforward. It covers Kubernetes’ core ideas before applying them to a real-world scenario.

Even if you have no prior experience with Kubernetes, this article will serve as an excellent starting point for your journey.

So, without further ado, let’s get this learning started.

Why Kubernetes?

Before we go into the technical ideas, let us start with why a developer should use Kubernetes in the first place. Here are a few reasons why developers should use Kubernetes in their projects.


When using Kubernetes, moving containerized applications from development to production appears to be an easy process. Kubernetes enables developers to orchestrate containers in various environments, including on-premises infrastructure, public and hybrid clouds.


Kubernetes simplifies the process of defining complex containerized applications and deploying them globally across multiple clusters of servers by reducing resources based on your desired state. Kubernetes automatically checks and maintains container health when horizontally scaling applications.


Kubernetes has a vast and ever-expanding collection of extensions and plugins created by developers and businesses that make it simple to add unique capabilities to your clusters such as security, monitoring, or management.


Using Kubernetes necessitates an understanding of the various abstractions it employs to represent the state of the system. That is the focus of this section. We get acquainted with the essential concepts and provide you with a clearer picture of the overall architecture.


A Pod is a collection of multiple containers of application that share storage, a unique cluster IP address, and instructions for running them (e.g. ports, restart, container image, and failure policies).

They are the foundation of the Kubernetes platform. While creating a service or a deployment, Kubernetes creates a Pod with the container inside.

Each pod runs on the node where it is scheduled and remains there until it is terminated or deleted. If the node fails or stops, Kubernetes will automatically schedule identical Pods on the cluster’s other available Nodes.



A node is a worker machine in a Kubernetes cluster that can be virtual or physical depending on the cluster type. The master is in charge of each node. The master involuntary schedules pods across all nodes in the cluster, based on their available resources and current configuration.

Each node is required to run at least two services:

Kubelet is a process that communicates between the Kubernetes master and the node.

A container runtime is in charge of downloading and running a container image (Eg: Docker)



A Service is an abstraction that describes a logical set of Pods and the policies for accessing them. Services allow for the loose coupling of dependent Pods.

Even though each pod has a distinct IP-Address, those addresses are not visible to the outside world. As a result, a service enables your deployment to receive traffic from external sources.

We can expose services in a variety of ways:

ClusterIP (standard) – Only expose the port to the cluster’s internals.

NodePort – Use NAT to reveal the service on the same port on every node in the cluster

Loadbalancer – Create an external load balancer to export the service to a specified IP Address.



Deployments include a description of your application’s desired state. The deployment controller will process to ensure that the application’s current state matches that description.

A deployment automatically runs many replicates of your program and replaces any instances that fail or become unresponsive. Deployments help to know that your program is ready to serve user requests in this fashion.



Before we dive into building our cluster, we must first install Kubernetes on our local workstation.

Docker Desktop

If you’re using Docker desktop on Windows or Mac, you may install Kubernetes directly from the user interface’s settings pane.


If you are not using the Docker desktop, I recommend that you follow the official installation procedure for Kubectl and Minikube.


Now that we’ve covered the fundamental ideas. Let’s move on to the practical side of Kubernetes. This chapter will walk you through the fundamentals required to deploy apps in a cluster.

Creating cluster

When you launch Minikube, it immediately forms a cluster.

minikube start

After installation, the Docker desktop should also automatically construct a cluster. You may use the following commands to see if your cluster is up and running:

# Get information about the cluster kubectl cluster-info # Get all nodes of the cluster kubectl get nodes

Deploying an application:

Now that we’ve completed the installation and established our first cluster, we’re ready to deploy an application to Kubernetes.

kubectl create deployment nginx --image=nginx:latest

We use the create deployment command, passing inputs as the deployment name and the container image. This example deploys Nginx with one container and one replica.

Using the get deployments command, you may view your active deployments.

kubectl get deployments Information about deployments

Here are a few commands you may use to learn more about your Kubernetes deployments and pods.

Obtaining all of the pods

Using the kubectl get pods command, you can get a list of all running pods:

kubectl get pods

Detail description of a pod

Use describe command to get more detailed information about a pod.

kubectl describe pods

Logs of a pod

The data that your application would transmit to STDOUT becomes container logs. The following command will provide you access to those logs.

kubectl logs $POD_NAME

Note: You may find out the name of your pod by using the get pods or describe pods commands.

Execute command in Container

The kubectl exec command, which takes the pod name and the term to run as arguments, allows us to perform commands directly in our container.

kubectl exec $POD_NAME command

Let’s look at an example where we start a bash terminal in the container to see what I mean.

kubectl exec -it $POD_NAME bash Exposing app publicly

A service, as previously said, establishes a policy by which the deployment can be accessible. We’ll look at how this is achieved in this section and other alternatives you have when exposing your services to the public.

Developing a service:

We can build a service with the create-service command, which takes the port we wish to expose and the kind of port as parameters.

kubectl create service nodeport nginx --tcp=80:80

It will generate service for our Nginx deployment and expose our container’s port 80 to a port on our host computer.

On the host system, use the kubectl get services command to obtain the port:

Image By Author

As you can see, port 80 of the container had routed to port 31041 of my host machine. When you have the port, you may test your deployment by accessing your localhost on that port.

Deleting a service

kubectl delete service nginx

Scale up the app

Scaling your application up and down is a breeze with Kubernetes. By using this command, you may alter the number of replicas, and Kubernetes will generate and maintain everything for you.

kubectl scale deployments/nginx --replicas=5

This command will replicate our Nginx service to a maximum of five replicas.

This way of application deployment works well for tiny one-container apps but lacks the overview and reusability required for larger applications. YAML files are helpful in this situation.

YAML files allow you to specify your deployment, services, and pods using a markup language, making them more reusable and scaleable. The following chapters will go over Yaml files in detail.

Kubernetes object in YAML

Every object in Kubernetes had expressed as a declarative YAML object that specifies what and how it should run. These files had used frequently to promote the reusability of resource configurations such as deployments, services, and volumes, among others.

This section will walk you through the fundamentals of YAML and how to acquire a list of all available parameters and characteristics for a Kubernetes object. We glance through the deployment and service files to understand the syntax and how it had deployed.

Parameters of different objects

There are numerous Kubernetes objects, and it is difficult to remember every setting. That’s where the explain command comes in.

You can also acquire documentation for a specific field by using the syntax:

kubectl explain deployment.spec.replicas

Deployment file

For ease of reusability and changeability, more sophisticated deployments are typically written in YAML.

The basic file structure is as follows:

apiVersion: apps/v1 kind: Deployment metadata: # The name and label of your deployment name: mongodb-deployment labels: app: mongo spec: # How many copies of each pod do you want replicas: 3 # Which pods are managed by this deployment selector: matchLabels: app: mongo # Regular pod configuration / Defines containers, volumes and environment variable template: metadata: # label the pod labels: app: mongo spec: containers: - name: mongo image: mongo:4.2 ports: - containerPort: 27017

There are several crucial sections in the YAML file:

apiVersion – Specifies the API version.

kind – The Kubernetes object type defined in the file (e.g. deployment, service, persistent volume, …)

metadata – A description of your YAML component that includes the component’s name, labels, and other information.

spec – Specifies the attributes of your deployment, such as replicas and resource constraints.

template – The deployment file’s pod configuration.

Now that you understand the basic format, you can use the apply command to deploy the file.

Service file

Service files are structured similarly to deployments, with slight variations in the parameters.

apiVersion: v1 kind: Service metadata: name: mongo spec: selector: app: mongo ports: - port: 27017 targetPort: 27017 type: LoadBalancer Storage

When the container restarts or pod deletion, its entire file system gets deleted. It is a good sign since it keeps your stateless application from getting clogged up with unnecessary data. In other circumstances, persisting your file system’s data is critical for your application.

There are several types of storage available:

The container file system stores the data of a single container till its existence.

Volumes allow you to save data and share it between containers as long as the pod is active.

Data had saved even if the pod gets erased or restarted using persistent volumes. They’re your Kubernetes cluster’s long-term storage.


Volumes allow you to save, exchange, and preserve data amongst numerous containers throughout the pod. It is helpful if you have pods with many containers that communicate data.

In Kubernetes, there are two phases to using a volume:

The volume had defined by the pod.

The container use volume mounts to add the volume to a given filesystem path.

You can add a volume to your pod by using the syntax:

apiVersion: v1 kind: Pod metadata: name: nginx spec: containers: - name: nginx image: nginx volumeMounts: - name: nginx-storage mountPath: /etc/nginx volumes: - name: nginx-storage emptyDir: {}

Here volumes tag is used to provide a volume mounted to a particular directory of the container filesystem (in this case, /etc/nginx).

Persistent Volumes

These are nearly identical to conventional volumes, with unique difference data had preserved even if the pod gets erased. That is why they are employed for long-term data storing needs, such as a database.

A Persistent Volume Claim (PVC) object, which connects to backend storage volumes via a series of abstractions, is the most typical way to define a persistent volume.

Example of YAML Configuration file.

apiVersion: v1 kind: PersistentVolumeClaim metadata: name: pv-claim labels: app: sampleAppName spec: accessModes: - ReadWriteOnce resources: requests: storage: 20Gi

There are more options to save your data in Kubernetes, and you may automate as much of the process as feasible. Here’s a list of a few interesting subjects to look into.

Compute Resources

In consideration of container orchestration, managing computes resources for your containers and applications is critical.

When your containers have a set number of resources, the scheduler can make wise decisions about which node to place the pod. You will also have fewer resource contention issues with diverse deployments.

In the following two parts, we will go through two types of resource definitions in depth.




Secrets in Kubernetes allow you to securely store and manage sensitive data such as passwords, API tokens, and SSH keys.

To use a secret in your pod, you must first refer to it. It can happen in many different ways:

Using an environment variable and as a file on a drive mounted to a container.

When kubelet pulls a picture from a private registry.

Creating a secret

Secrets had created using either the kubelet command tool or by declaring a secret Kubernetes object in YAML.

Using the kubelet

Kubelet allows you to create secrets with a create command that requires only the data and the secret name. The data gets entered using a file or a literal.

kubectl create secret generic admin-credentials --from-literal=user=poweruser --from-literal=password='test123'

Using a file, the same functionality would look like this.

kubectl create secret generic admin-credentials–from-file=./username.txt –from-file=./password.txt

Making use of definition files

Secrets, like other Kubernetes objects, can be declared in a YAML file.

apiVersion: v1 kind: Secret metadata: name: secret-apikey data: apikey: YWRtaW4=

Your sensitive information is stored in the secret as a key-value pair, with apiKey as the key and YWRtaW4= as the base decoded value.

Using the apply command, you can now generate the secret.

kubectl apply -f secret.yaml

Use the stringData attribute instead if you wish to give plain data and let Kubernetes handle the encoding.

apiVersion: v1 kind: Secret metadata: name: plaintext-secret stringData: password: test


If you’re pulling an image from a private registry, you may need to authenticate first. When all of your nodes need to pull a specific picture, an ImagePullSecrets file maintains the authentication info and makes it available to them.

apiVersion: v1 kind: Pod metadata: name: private-image spec: containers: - name: privateapp image: gabrieltanner/graphqltesting imagePullSecrets: - name: authentification-secret Namespaces

Namespaces are virtual clusters had used to manage large projects and allocate cluster resources to many users. They offer a variety of names and can be nested within one another.

Managing and using namespaces with kubectl is simple. This section will walk you through the most common namespace actions and commands.

Look at the existing Namespaces

You can use the kubectl get namespaces command to see all of your cluster’s presently accessible namespaces.

kubectl get namespaces # Output NAME STATUS AGE default Active 32d docker Active 32d kube-public Active 32d kube-system Active 32d

Creating Namespace

Namespaces can be created with the kubectl CLI or by using YAML to create a Kubernetes object.

kubectl create namespace testnamespace # Output namespace/testnamespace created

The same functionality may be achieved with a YAML file.

apiVersion: v1 kind: Namespace metadata: name: testnamespace

The kubectl apply command can then be used to apply the configuration file.

kubectl apply -f testNamespace.yaml

Namespace Filtering

When a new object had created in Kubernetes without a custom namespace property, it adds to the default namespace.

You can do this if you want to construct your item in a different workspace.

kubectl create deployment --image=nginx nginx --namespace=testnamespace

You may now use the get command to filter for your deployment.

kubectl get deployment --namespace=testnamespace

Change Namespace

You’ve now learned how to construct objects in a namespace other than the default. However, adding the namespace to each command you want to run takes time and returns an error.

As a result, you can use the set-context command to change the default context to which instructions had applied.

kubectl config set-context $(kubectl config current-context) --namespace=testnamespace

The get-context command can be used to validate the modifications.

kubectl config get-contexts # Output CURRENT NAME CLUSTER AUTHINFO NAMESPACE * Default Default Default testnamespace Kubernetes with Docker Compose

For individuals coming from the Docker community, writing Docker Compose files rather than Kubernetes objects may be simple. Kompose comes into play in this situation. It uses a simple CLI to convert or deploy your docker-compose file to Kubernetes (command-line interface).

How to Install Kompose

It is easy and quickly deployed on all three mature operating systems.

To install Kompose on Linux or Mac, curl the binaries.

# Linux # macOS chmod +x kompose sudo mv ./kompose /usr/local/bin/kompose

Deploying using Kompose

Kompose deploys Docker Compose files on Kubernetes using existing Docker Compose files. Consider the following compose file as an example.

version: "2" services: redis-master: image: chúng tôi ports: - "6379" redis-slave: image: ports: - "6379" environment: - GET_HOSTS_FROM=dns frontend: image: ports: - "80:80" environment: - GET_HOSTS_FROM=dns labels: kompose.service.type: LoadBalancer

Kompose, like Docker Compose, lets us deploy our setup with a single command.

kompose up

You should now be able to see the resources that had produced.

kubectl get deployment,svc,pods,pvc

Converting Kompose

Kompose can also turn your existing Docker Compose file into the Kubernetes object you need.

kompose convert

The apply command had used to deploy your application.

kubectl apply -f filenames Application Deployment

Now that you’ve mastered the theory and all of Kubernetes’ core ideas, it’s time to put what you’ve learned into practice. This chapter will show you how to use Kubernetes to deploy a backend application.

This tutorial’s specific application is a GraphQL boilerplate for the chúng tôi backend framework.

First, let’s clone the repository.

Images to a Registry

We must first push the images to a publicly accessible Image Registry before starting the construction of Kubernetes objects. It can be a public registry like DockerHub or a private registry of your own.

Visit this post for additional information on creating your own private Docker Image.

To push the image, include the image tag in your Compose file along with the registry you want to move.

version: '3' services: nodejs: build: context: ./ dockerfile: Dockerfile image: restart: always environment: - DATABASE_HOST=mongo - PORT=3000 ports: - '3000:3000' depends_on: [mongo] mongo: image: mongo ports: - '27017:27017' volumes: - mongo_data:/data/db volumes: mongo_data: {}

I used a private registry that I had previously set up, but DockerHub would work just as well.

Creating Kubernetes objects

Now that you’ve published your image to a registry, we’ll write our Kubernetes objects.

To begin, create a new directory in which to save the deployments.

mkdir deployments cd deployments touch mongo.yaml touch nestjs.yaml

It is how the MongoDB service and deployment will look.

apiVersion: v1 kind: Service metadata: name: mongo spec: selector: app: mongo ports: - port: 27017 targetPort: 27017 --- apiVersion: apps/v1 kind: Deployment metadata: name: mongo spec: selector: matchLabels: app: mongo template: metadata: labels: app: mongo spec: containers: - name: mongo image: mongo ports: - containerPort: 27017

A deployment object with a single MongoDB container called mongo had included in the file. It also comes with a service that allows the Kubernetes network to use port 27017.

Because the container requires some additional settings, such as environment variables and imagePullSecrets, the chúng tôi Kubernetes object is a little more complicated.

A load balancer helps the service that makes the port available on the host machine.

Deploy the application

Now that the Kubernetes object files are ready. Let us use kubectl to deploy them.

kubectl apply -f mongo.yaml kubectl apply -f nestjs.yaml

On localhost/graphql, you should now view the GraphQL playground.

Congratulations, you’ve just deployed your first Kubernetes application.


You persevered to the end! I hope this guide has given you a better understanding of Kubernetes and the way to use it to improve your developer process, with better production-grade solutions.

Kubernetes was created using Google’s ten years of expertise running containerized apps at scale. It has already been adopted by the top public cloud suppliers and technology providers and is now being adopted by the majority of software manufacturers and companies. It even resulted in the formation of the Cloud Native Computing Foundation (CNCF) in 2024, which was the first project to graduate under CNCF and began streamlining the container ecosystem alongside other container-related projects like CNI, Containers, Envoy, Fluentd, gRPC, Jagger, Linkerd, and Prometheus. Its immaculate design, cooperation with industry leaders, making it open source, and always being open to ideas and contributions may be the main reasons for its popularity and endorsement at such a high level.

Share this with other developers, if you find it useful.

To know more about Kubernetes, Check out the links below

Learn basic tenets from our blog.


Image-1 – Photo by  Ian Taylor On Unsplash

Image-6 – Photo by  Xan Griffin On Unsplash

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion. 


A Comprehensive Guide To Time Series Analysis And Forecasting

This article was published as a part of the Data Science Blogathon.

Time Series Analysis and Forecasting is a very pronounced and powerful study in data science, data analytics and Artificial Intelligence. It helps u changing time. For example, let us suppose you have visited a clinic due to some chest pain and want to get an (ECG) test done to see if your heart is healthy functioning. The ECG graph produced is a time-series data where your Heart Rate Variability (HRV) with respect to time is plotted, analysing which the doctor can suggest crucial measures to take care of your heart and reduce the risk of stroke or heart attacks. Time Series is used widely in healthcare analytics, geospatial analysis, weather forecasting, and to forecast the future of data that changes continuously with time!

What is Ti ries Analysis in Machine Learning?

Time-series analysis is the process of extracting useful information from time-series data to forecast and gain insights from it. It consists of a series of data that varies with time, hence continuous and non-static in nature. It may vary from hours to minutes and even seconds (milliseconds to microseconds). Due to its non-static and continuous nature, working with time-series data is indeed difficult even today!

As time-series data consists of a series of observations taken in sequences of time, it is entirely non-static in nature.

Time Series – Analysis Vs. Forecasting

Time series data analysis is the scientific extraction of useful information from time-series data to gather insights from it. It consists of a series of data that varies with time. It is non-static in nature. Likewise, it may vary from hours to minutes and even seconds (milliseconds to microseconds). Due to its continuous and non-static nature, working with time-series data is challenging!

As time-series data consists of a series of observations taken in sequences of time, it is entirely non-static in nature.

Time Series Analysis and Time Series Forecasting are the two studies that, most of the time, are used interchangeably. Although, there is a very thin line between this two. The naming to be given is based on analysing and summarizing reports from existing time-series data or predicting the future trends from it.

Thus, it’s a descriptive Vs. predictive strategy based on your time-series problem statement.

In a nutshell, time series analysis is the study of patterns and trends in a time-series data frame by descriptive and inferential statistical methods. Whereas, time series forecasting involves forecasting and extrapolating future trends or values based on old data points (supervised time-series forecasting), clustering them into groups, and predicting future patterns (unsupervised time-series forecasting).

The Time Series Integrants

Any time-series problem or data can be broken down or decomposed into several integrants, which can be useful for performing analysis and forecasting. Transforming time series into a series of integrants is called Time Series Decomposition.

A quick thing worth mentioning is that the integrants are broken further into 2 types-

1. Systematic — components that can be used for predictive modelling and occur recurrently. Level, Trend, and Seasonality come under this category.

2. Non-systematic — components that cannot be used for predictive modelling directly. Noise comes under this category.

The original time series data is hence split or decomposed into 5 parts-

1. Level — The most common integrant in every time series data is the level. It is nothing but the mean or average value in the time series. It has 0 variances when plotted against itself.

2. Trend — The linear movement or drift of the time series which may be increasing, decreasing or neutral. Trends are observable over positive(increasing) and negative(decreasing) and even linear slopes over the entire range of time.

3. Seasonality — Seasonality is something that repeats over a lapse of time, say a year. An easy way to get an idea about seasonality- seasons, like summer, winter, spring, and monsoon, which come and go in cycles throughout a specified period of time. However, in terms of data science, seasonality is the integrant that repeats at a similar frequency.

Note — If seasonality doesn’t occur at the same frequency, we call it a cycle. A cycle does not have any predefined and fixed signal or frequency is very uncertain, in terms of probability. It may sometimes be random, which poses a great challenge in forecasting.

4. Noise — A irregularity or noise is a randomly occurring integrant, and it’s optional and arrives under observation if and only if the features are not correlated with each other and, most importantly, variance is the similar across the series. Noise can lead to dirty and messy data and hinder forecasting, hence noise removal or at least reduction is a very important part of the time series data pre-processing stage.

5. Cyclicity — A particular time-series pattern that repeats itself after a large gap or interval of time, like months, years, or even decades.

The Time Series Forecasting Applications

Time series analysis and forecasting are done on automating a variety of tasks, such as-

Weather Forecasting

Anomaly Forecasting

Sales Forecasting

Stock Market Analysis

ECG Analysis

Risk Analysis

and many more!

Time Series Components Combinatorics

A time-series model can be represented by 2 methodologies-

The Additive Methodology — 

When the time series trend is a linear relationship between integrants, i.e., the frequency (width) and amplitude(height) of the series are the same, the additive rule is applied.

Additive methodology is used when we have a time series where seasonal variation is linear or constant over timestamps.

It can be represented as follows-

y(t) or x(t) = level + trend + seasonality + noise

where the model y(multivariate) or x(univariate) is a function of time t.

The Multiplicative Methodology — 

When the time series is not a linear relationship between integrants, then modelling is done following the multiplicative rule.

The multiplicative methodology is used when we have a time series where seasonal variation increases with time — which may be exponential or quadratic.

It is represented as-

y(t) or x(t)= Level * Trend * Seasonality * Noise

Deep-Dive into Supervised Time-Series Forecasting

Supervised learning is the most used domain-specific machine learning, and hence we will focus on supervised time series forecasting.

This will contain various detailed topics to ensure that readers at the end will know how to-

Load time series data and use descriptive statistics to explore it

Scale and normalize time series data for further modelling

Extracting useful features from time-series data (Feature Engineering)

Checking the stationarity of the time series to reduce it

ARIMA and Grid-search ARIMA models for time-series forecasting

Heading to deep learning methods for more complex time-series forecasting (LSTM and bi-LSTMs)

So without further ado, let’s begin!

Load Time Series Data and Use Descriptive Statistics to Explore it

For the easy and quick understanding and analysis of time-series data, we will work on the famous toy dataset named ‘Daily Female Births Dataset’.

Get the dataset downloaded from here.

Importing necessary libraries and loading the data –

import numpy import pandas import statmodels import matplotlib.pyplot as plt import seaborn as sns data = pd.read_csv(‘daily-total-female-births-in-cal.csv’, parse_dates = True, header = 0, squeeze=True) data.head()

This is the output we get-

1959–01–01 35 1959–01–02 32 1959–01–03 30 1959–01–04 31 1959–01–05 44 Name: Daily total female births in California, 1959, dtype: int64

Note —Remember, it is required to use ‘parse_dates’ because it converts dates to datetime objects that can be parsed, header=0 which ensures the column named is stored for easy reference, and squeeze=True which converts the data frame of single object elements into a scalar.

Exploring the Time-Series Data –

print(data.size) #output-365

(a) Carry out some descriptive statistics —


Output —

count 365.000000 mean 41.980822 std 7.348257 min 23.000000 25% 37.000000 50% 42.000000 75% 46.000000 max 73.000000

(b) A look at the time-series distribution plot —

pyplot.plot(series) Scale and Normalize Time Series Data for Further Modelling

A normalized data scales the numeric features in the training data in the range of 0 and 1 so that gradient descent and loss optimization is fast and efficient and converges quickly to the local minima. Interchangeably known as feature scaling, it is crucial for any ML problem statement.

Let’s see how we can achieve normalization in time-series data.

For this purpose, let’s pick a highly fluctuating time-series data — the minimum daily temperatures data. Grab it here!

Let’s have a look at the extreme fluctuating nature of the data —


To normalize a feature, Scikit-learn’s MinMaxScaler is too handy! If you want to generate original data points after prediction, an inverse_transform() function is also provided by this awesome built-in function!

Here goes the normalization code —

# import necessary libraries import pandas from sklearn.preprocessing import MinMaxScaler # load and sanity check the data data = read_csv(‘daily-minimum-temperatures-in-me.csv’, parse_dates = True, header = 0, squeeze=True, index_col=0) print(data.head()) #convert data into matrix of row-col vectors values = data.values values = values.reshape((len(values), 1)) # feature scaling scaler = MinMaxScaler(feature_range=(0, 1)) #fit the scaler with the train data to get min-max values scaler = print(‘Min: %f, Max: %f’ % (scaler.data_min_, scaler.data_max_)) # normalize the data and sanity check normalized = scaler.transform(values) for i in range(5): print(normalized[i]) # inverse transform to obtain original values original_matrix= scaler.inverse_transform(normalized) for i in range(5): print(original_matrix[i])

Let’s have a look at what we got –

See how the values have scaled!

Note — In our case, our data does not have outliers present and hence a MinMaxScaler solves the purpose well. In the case where you have an unsupervised learning approach, and your data contains outliers, it is better to go for standardization, which is more robust than normalization, as normalization scales the data close to the mean which doesn’t handle or include outliers leading to a poor model. Standardization, on the other hand, takes large intervals with a standard deviation value of 1 and a mean of 0, thus outlier handling is robust.

More on that here!

Extracting Useful Features from Time-Series Data (Feature Engineering)

Framing data into a supervised learning problem simply deals with the task of handling and extracting useful features and discarding irrelevant features to make the model robust and cost-efficient.

We already know that supervised learning problems have 2 types of features — the independents (x) and dependent/target(y). Hence, how better the target value is achieved depends on how well we choose and engineer the independent features.

You must know by now that time-series data has two columns, timestamp, and its respective value. So, it is very self-explanatory that in the time series problem, the independent feature is time and the dependent feature is value.

Now let us look at what are the features that need to be engineered into these input and output values so that the inherent relationship between these two variables is established to make the forecasting as good as possible.

The features which are extremely important to model the relationship between the input and output variables in a time series are —

1. Descriptive Statistical Features — Quite straightforward as it sounds, calculating the statistical details and summary of any data is extremely important. Mean, Median, Standard Deviation, Quantiles, and min-max values. These come extremely handy while in tasks such as outlier detection, scaling and normalization, recognizing the distribution, etc.

2. Window Statistic Features — Window features are a statistical summary of different statistical operations upon a fixed window size of previous timestamps. There are, in general, 2 ways to extract descriptive statistics from windows. They are

(a) Rolling Window Statistics: The rolling window focuses on calculating rolling means or what we conventionally call Moving Average, and often other statistical operations. This calculates summary statistics (mostly mean) across values within a specific sliding window, and then we can assign these as features in our dataset.

Let, the mean at timestamp t-1 is x and t-2 be y, so we find the average of x and y to predict the value at timestamp t+1. The rolling window hence takes a mean of 2 values to predict the 3rd value. After that is done, the window shifts to the next set of values, and hence the mean is calculated for each window consisting of 2 values. We use rolling window statistics more often when the recent data is more important for forecasting and not previous data.

Let’s see how we can calculate moving or rolling average with a rolling window —

from pandas import DataFrame from pandas import concat df = DataFrame(data.values) tshifts = df.shift(1) rwin = tshifts.rolling(window=2) moving_avg = rwin.mean() joined_df = concat([moving_avg, df], axis=1) joined_df.columns = [‘mean(t-2,t-1)’, ‘t+1’] print(joined_df.head(5))

Let’s have a look at what we got —

(b) Expanding Window Statistics: Almost similar to the rolling window, expanding windows takes into account an extra habit of extracting the predicted value as well as all the previous observations, each time it expands. This is beneficial when the previous data is equally important for forecasting as well as the recent data.

Let’s have a quick look at expanding window code-

window = tshifts.expanding() joined_df2 = concat([rwin.mean(),df.shift(-1)], axis=1) joined_df2.columns = ['mean', 't+1'] print(joined_df2.head(5))

Let’s have a look at what we got -

3. Lag Features — Lag is simply predicting the value at timestamp t+1, provided we know the value at the previous timestamp, say, t-1. It’s simply distance or lag between two values at 2 different timestamps.

4. Datetime Features — This is simply the conversion of time into its specific components like a month, or day, along with the value of temperature for better forecasting. By doing this, we can gather specific information about the month and day at a particular timestamp for each record.

5. Timestamp Decomposition — Timestamp decomposition includes breaking down the timestamp into subset columns of timestamp for storing unique and special timestamps. Before Diwali or, say, Christmas, the sale of crackers and Santa-caps, fruit-cakes increases exponentially more than at other times of the year. So storing such a special timestamp by decomposing the original timestamp into subsets is useful for forecasting.

Time-series Data Stationary Checks

So, let’s first digest what stationary time-series data is!

Stationary, as the term suggests, is consistent. In time-series, the data if it does not contain seasonality or trends is termed stationary. Any other time-series data that has a specific trend or seasonality, are, thus, non-stationary.

Can you recall, that amongst the two time-series data we worked on, the childbirths data had no trend or seasonality and is stationary. Whereas, the average daily temperatures data, has a seasonality factor and drifts, and hence, it’s non-stationary and hard to model!

Stationarity in time-series is noticeable in 3 types —

(a) Trend Stationary — This kind of time-series data possesses no trend.

(b) Seasonality Stationary — This kind of time-series data possesses no seasonality factor.

(c) Strictly Stationary — The time-series data is strictly consistent with almost no variance to drifts.

Now that we know what stationarity in time series is, how can we check for the same?

Vision is everything. A quick visualization of your time-series data at hand can give a quick eye review of whether the data can be stationary or not. Next in the line comes the statistical summary. A clear look into the summary statistics of the data like min, max, variance, deviation, mean, quantiles, etc. can be very helpful to recognize drifts or shifts in data.

Lets POC this!

So, we take stationary data, which is the handy childbirths data we worked on earlier. However, for the non-stationary data, let’s take the famous airline-passenger data, which is simply the number of airline passengers per month, and prove how they are stationary and non-stationary.

Case 1 — Stationary Proof

import pandas as pd import matplotlib.pyplot as plt data = pd.read_csv(‘daily-total-female-births.csv’, parse_dates = True, header = 0, squeeze=True) data.hist()

Output —

As I said, vision! Look how the visualization itself speaks that it’s a Gaussian Distribution. Hence, stationary!

More curious? Let’s get solid math proof!

X = data.values seq = round(len(X) / 2) x1, x2 = X[0:seq], X[seq:] meanx1, meanx2 = x1.mean(), x2.mean() varx1, varx2 = x1.var(), x2.var() print(‘meanx1=%f, meanx2=%f’ % (meanx1, meanx2)) print(‘variancex1=%f, variancex2=%f’ % (varx1, varx2))

Output —

meanx1=39.763736, meanx2=44.185792 variancex1=49.213410, variancex2=48.708651

The mean and variances linger around each other, which clearly shows the data is invariant and hence, stationary! Great.

Case 2— Non-Stationary Proof

import pandas as pd import matplotlib.pyplot as plt data = pd.read_csv(‘international-airline-passengers.csv’, parse_dates = True, header = 0, squeeze=True) data.hist()

Output —

The graph pretty much gives a seasonal taste. Moreover, it is too distorted for a Gaussian tag. Let’s now quickly get the mean-variance gaps.

X = data.values seq = round(len(X) / 2) x1, x2 = X[0:seq], X[seq:] meanx1, meanx2 = x1.mean(), x2.mean() varx1, varx2 = x1.var(), x2.var() print(‘meanx1=%f, meanx2=%f’ % (meanx1, meanx2)) print(‘variancex1=%f, variancex2=%f’ % (varx1, varx2))

Output —

meanx1=182.902778, meanx2=377.694444 variancex1=2244.087770, variancex2=7367.962191

Alright, the value gap between mean and variances are pretty self-explanatory to pick the non-stationary kind.

ARMA, ARIMA, and SARIMAX Models for Time-Series Forecasting

A very traditional yet remarkable ‘machine-learning’ way of forecasting a time series is the ARMA (Auto-Regressive Moving Average) and Auto Regressive Integrated Moving Average Model commonly called ARIMA statistical models.

Other than these 2 traditional approaches, we have SARIMA (Seasonal Auto-Regressive Integrated Moving Average) and Grid-Search ARIMA, which we will see too!

So, let’s explore the models, one by one!


The ARMA model is an assembly of 2 statistical models — the AR or Auto-Regressive model and Moving Average.

The Auto-Regressive Model estimates any dependent variable value y(t) at a given timestamp t on the basis of lags. Look at the formula below for a better understanding —

Here, y(t) = predicted value at timestamp t, α = intercept term, β = coefficient of lag, and, y(t-1) = time-series lag at timestamp t-1.

So α and β are the model estimators that estimate y(t).

The Moving Average Model plays a similar role, but it does not take the past predicted forecasts into account, as said earlier in rolling average. It rather uses the lagged forecast errors in previously predicted values to predict the future values, as shown in the formula below.

Let’s see how both the AR and MA models perform on the International-Airline-Passengers data.

AR model

AR_model = ARIMA(indexedDataset_logScale, order=(2,1,0)) AR_results = plt.plot(datasetLogDiffShifting) plt.plot(AR_results.fittedvalues, color='red') plt.title('RSS: %.4f'%sum((AR_results.fittedvalues - datasetLogDiffShifting['#Passengers'])**2))

The RSS or sum of squares residual is 1.5023 in the case of the AR model, which is kind of dissatisfactory as AR doesn’t capture non-stationarity well enough.

MA Model

MA_model = ARIMA(indexedDataset_logScale, order=(0,1,2)) MA_results = plt.plot(datasetLogDiffShifting) plt.plot(MA_results.fittedvalues, color='red') plt.title('RSS: %.4f'%sum((MA_results.fittedvalues - datasetLogDiffShifting['#Passengers'])**2))

The MA model shows similar results to AR, differing by a very small amount. We know our data is non-stationary, so let’s make this RSS score better by the non-stationarity handler AR+I+MA!


Along with the squashed use of the AR and MA model used earlier, ARIMA uses a special concept of Integration(I) with the purpose of differentiating some observations in order to make non-stationary data stationary, for better forecasting. So, it’s obviously better than its predecessor ARMA which could only handle stationary data.

What the differencing factor does is, that it takes into account the difference in predicted values between two timestamps (t and t+1, for example). Doing this helps in achieving a constant mean rather than a highly fluctuating ‘non-stationary’ mean.

Let’s fit the same data with ARIMA and see how well it performs!

ARIMA_model = ARIMA(indexedDataset_logScale, order=(2,1,2)) ARIMA_results = plt.plot(datasetLogDiffShifting) plt.plot(ARIMA_results.fittedvalues, color='red') plt.title('RSS: %.4f'%sum((ARIMA_results.fittedvalues - datasetLogDiffShifting['#Passengers'])**2))

Great! The graph itself speaks how ARIMA fits our data in a well and generalized fashion compared to the ARMA! Also, observe how the RSS has dropped to 1.0292 from 1.5023 or 1.4721.


Designed and developed as a beautiful extension to the ARIMA, SARIMAX or, Seasonal Auto-Regressive Integrated Moving Average with eXogenous factors is a better player than ARIMA in case of highly seasonal time series. There are 4 seasonal components that SARIMAX takes into account.

They are -

 1. Seasonal Autoregressive Component

 2. Seasonal Moving Average Component

 3. Seasonal Integrity Order Component

 4. Seasonal Periodicity


If you are more of a theory conscious person like me, do read more on this here, as getting into the details of the formula is beyond the scope of this article!

Now, let’s see how well SARIMAX performs on seasonal time-series data like the International-Airline-Passengers data.

from statsmodels.tsa.statespace.sarimax import SARIMAX SARIMAX_model=SARIMAX(train['#Passengers'],order=(1,1,1),seasonal_order=(1,0,0,12)) preds=SARIMAX_results.predict(start,end,typ='levels').rename('SARIMAX Predictions') test['#Passengers'].plot(legend=True,figsize=(8,5)) preds.plot(legend=True)

Look how beautifully SARIMAX handles seasonal time series!

Heading to DL Methods for Complex Time-Series Forecasting

One of the very common features of time-series data is the long-term dependency factor. It is obvious that many time-series forecasting works on previous records (the future is forecasted based on previous records, which may be far behind). Hence, ordinary traditional machine learning models like ARIMA, ARMA, or SARIMAX are not capable of capturing long-term dependencies, which makes them poor guys in sequence-dependent time series problems.

To address such an issue, a massively intelligent and robust neural network architecture was proposed which can extraordinarily handle sequence dependence. It was known as Recurrent Neural Networks or RNN.


RNN was designed to work on sequential data like time series. However, a very remarkable pitfall of RNN was that it couldn’t handle long-term dependencies. For a problem where you want to forecast a time series based on a huge number of previous records, RNN forgets the maximum of the previous records which occurred much earlier, and only learns sequences of recent data fed to its neural network. So, RNN was observed to not be up to the mark for NSP (Next Sequence Prediction) tasks in NLP and time series.

To address this issue of not capturing long-term dependencies, a powerful variant of RNN was developed, known as LSTM (Long Short Term Memory) Networks. Unlike RNN, which could only capture short-term sequences/dependencies, LSTM, as its name suggests was observed to learn long as well as short term dependencies. Hence, it was a great success for modelling and forecasting time series data!

Note — Since explaining the architecture of LSTM will be beyond the size of this blog, I recommend you to head over to my article where I explained LSTM in detail!

Let us now take our Airline Passengers’ data and see how well RNN and LSTM work on it!

Imports —

import numpy as np import pandas as pd import tensorflow as tf import matplotlib.pyplot as plt import sklearn.preprocessing from sklearn.metrics import r2_score from keras.layers import Dense, Dropout, SimpleRNN, LSTM from keras.models import Sequential

Scaling the data to make it stationary for better forecasting — 

minmax_scaler = sklearn.preprocessing.MinMaxScaler() data['Passengers'] = minmax_scaler.fit_transform(data['Passengers'].values.reshape(-1,1)) data.head()

Scaled data — 

Train, test splits (80–20 ratio) — 

split = int(len(data[‘Passengers’])*0.8) x_train,y_train,x_test,y_test = np.array(x[:split]),np.array(y[:split]), np.array(x[split:]), np.array(y[split:]) #reshaping data to original shape x_train = np.reshape(x_train, (split, 20, 1)) x_test = np.reshape(x_test, (x_test.shape[0], 20, 1))

RNN Model — 

model = Sequential() model.add(SimpleRNN(40, activation="tanh", return_sequences=True, input_shape=(x_train.shape[1],1))) model.add(Dropout(0.15)) model.add(SimpleRNN(50, return_sequences=True, activation="tanh")) model.add(Dropout(0.1)) #remove overfitting model.add(SimpleRNN(10, activation="tanh")) model.add(Dense(1)) model.summary()

Complie it, fit it and predict—, y_train, epochs=15, batch_size=50) preds = model.predict(x_test)

Let me

Pretty much accurate!

LSTM Model — 

model = Sequential() model.add(LSTM(100, activation="ReLU", return_sequences=True, input_shape=(x_train.shape[1], 1))) model.add(Dropout(0.2)) model.add(LSTM(80, activation="ReLU", return_sequences=True)) model.add(Dropout(0.2)) model.add(LSTM(50, activation="ReLU", return_sequences=True)) model.add(Dropout(0.2)) model.add(LSTM(30, activation="ReLU")) model.add(Dense(1)) model.summary()

Complie it, fit it and predict—, y_train, epochs=15, batch_size=50) preds = model.predict(x_test)

Let me show you a picture of how well the model predicts —

Here, we can easily observe that RNN does the job better than LSTMs. As it is clearly seen that LSTM works great in training data but bad invalidation/test data, which shows a sign of overfitting!

Hence, try to use LSTM only where there is a need for long-term dependency learning otherwise RNN works good enough.


Cheers on reaching the end of the guide and learning pretty interesting kinds of stuff about Time Series. From this guide, you successfully learned the basics of time series, got a brief idea of the difference between Time Series Analysis and Forecasting subdomains of Time Series, a crisp mathematical intuition on Time Series analysis and forecasting techniques and explored how to work on Time Series problems in Machine Learning and Deep Learning to solve complex problems.

Hope you had fun exploring Time Series with Machine Learning and Deep Learning along with intuition! If you are a curious learner and want to “not” stop learning more, head over to this awesome notebook on time series provided by TensorFlow!

Feel free to follow me on Medium and GitHub for more articles and notebooks on Machine & Deep Learning! Connect with me on LinkedIn if you want to discuss anything regarding this article!

Happy Learning!

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.


A Comprehensive Guide On Data Visualization In Python

This article was published as a part of the Data Science Blogathon

Data visualization is the process of finding, interpreting, and comparing data so that it can communicate more clearly complex ideas, thus making it easier to identify once analysis of logical patterns.

Data visualization is important for many analytical tasks including data summaries, test data analysis, and model output analysis. One of the easiest ways to connect with other people is to see good.

Fortunately, Python has many libraries that provide useful tools for extracting data from data. The most popular of these are Matplotlib, Seaborn, Bokeh, Altair, etc.


The ways we plan and visualize the details change quickly and become more and more difficult with each passing day. Due to the proliferation of social media, the availability of mobile devices, and the installation of digital services, data is available for any human activity using technology. The information produced is very important and enables us to analyze styles and patterns and to use big data to draw connections between events. Therefore, data recognition can be an effective way to present the end-user with 

comprehensible details in real-time.

Image 1

Data visualization can be important for strategic communication: it helps us interpret available data; identify patterns, tendencies, and inconsistencies; make decisions, and analyze existing processes. All told, it could have a profound effect on the business world. Every company has data, be it contacting customers and senior management or helping to manage the organization itself. Only through research and interpretation can this data be interpreted and converted into information. This article seeks to guide students through a series of basic indicators to help them understand the perception of data and its components and equips them with the tools and platforms they need to create interactive views and analyze data. It seeks to provide students with basic names and crashes courses on design principles that govern data visibility so that they can create and analyze market research reports.

Table of Contents

What is Data Visualization?

Importance of data visualization

Data Visualization Process

Basic principles for data visualization

Data visualization formats

Data Visualization in Python

Color Schemes for Visualization of Data in Python

Other tools for data visualization


End Notes

Data visualization is the practice of translating data into visual contexts, such as a map or graph, to make data easier for the human brain to understand and to draw comprehension from. The main goal of data viewing is to make it easier to identify patterns, styles, and vendors in large data sets. The term is often used in a unique way, including information drawings, information visuals, and mathematical diagrams.

Image 2

Data visualization is one of the steps in the data science process, which, once data has been collected, processed, and modeled, must be visualized to conclude. Data detection is also a feature of the broader data delivery (DPA) discipline, which aims to identify, retrieve, manage, format, and deliver data in a highly efficient manner.

Viewing data is important for almost every job. It can be used by teachers to demonstrate student test results, by computer science artificial intelligence (AI) developers, or by information sharing managers and stakeholders. It also plays an important role in big data projects. As businesses accumulated large data collections during the early years of big data, they needed a way to quickly and easily view all of their data. The viewing tools were naturally matched.

Importance of Data Visualization

We live in a time of visual information, and visual content plays an important role in every moment of our lives. Research conducted by SHIFT Disruptive Learning has shown that we usually process images 60,000 times faster than a table or text and that our brains do a better job of remembering them in the future. The study found that after three days, the analyzed studies retained between 10% and 20% of written or spoken information, compared to 65% of visual information.

The human brain can perceive imagery in just 13 milliseconds and store information, as long as it is associated with the concept. Our eyes can capture 36,000 visual messages per hour.  

40% of nerve fibers are connected to the retina.

All of this shows that people are better at processing visual information, which is embedded in our long-term memory. As a result, in reports and statements, visual representation using images is a more effective way of communicating information than text or table; and takes up very little space. This means that data visibility is more attractive, easier to interact with, and easier to remember.

Data Visualization Process

Several different fields are involved in the data recognition process, to facilitate or reveal existing relationships or discovering something new in a dataset.

1. Filtering and processing.

Refining and refining data transforms it into information by analyzing, interpreting, summarizing, comparing, and researching.

2. Translation & visual representation.

Creating visual representation by describing image sources, language, context, and word of introduction, all for the recipient.

3. Visualization and interpretation.

Finally, visual acuity is effective if it has a cognitive impact on 

knowledge construction.

Basic principles for data visualization

The purpose of seeing data is to help us understand 

something they do not represent. It is a way of telling stories and research results, too as data analysis and testing platform. So, you have a good understanding of how to create data recognition will help us to create meaning as well as easy-to-remember reports, infographics, and dashboards. Creating the right perspective helps us to solve problems and analyze subject material in detail. The first step in representing the information is trying to understand that data perception.

1. Preview: This ensures that viewers have more data comprehension, as their starting point for checking. This means giving them a visual summary of different types of data, describing their relationship at the same time. This strategy helps us to visualize the process of data, in all its different levels, simultaneously.

2. Zoom in and filter: The second step involves inserting the first so that viewers can understand the data basement. Zoom in / out enables us to select available data subsets that meet certain methods while maintaining the concept of position and context.

Data visualization formats 1. Bar Charts

Bar charts are one of the most popular ways to visualize data because it presents quickly set data 

an understandable format that allows viewers to see height and depth at a glance.

They are very diverse and are often used comparing different categories, analyzing changes over time, or comparing certain parts. The three variations on the bar chart are:

Vertical column:

The data is used chronologically, too it should be in left-to-right format.

Horizontal column:

It is used to visualize categories

Full stacked column:

Used to visualize the categories that together add up to 100%

Source: Netquest- A Comprehensive Guide to Data Visualization (Melisa Matias)

2. Histograms

Histograms represent flexibility in the form of bars, where the face of each bar is equal to the number of values ​​represented. They offer an overview of demographic or sample distribution with a particular aspect. The two differences in the histogram are:

Standing columns

Horizontal columns

Source: Netquest- A Comprehensive Guide to Data Visualization (Melisa Matias)

3. Pie charts

The pie chart contains a circle divided into categories, each representing a portion of the theme. They can be divided into no more than five data groups. They can be useful for comparing different or continuous data.

The two differences in the pie chart are:

Standard: Used to show relationships between components.

Donut: A variation of style that facilitates the inclusion of a whole value or design element in the center.

Source: Netquest- A Comprehensive Guide to Data Visualization (Melisa Matias)

4. Scatter Plot

Scatter plots sites use a point spread over the Cartesian integration plane to show the relationship between the two variables. They also help us determine whether the different data groups are related or not.

Source: Netquest- A Comprehensive Guide to Data Visualization (Melisa Matias)

5. Heat Maps

Source: Netquest- A Comprehensive Guide to Data Visualization (Melisa Matias)

6. Line Plot

This is used to display changes or trends in data over time. They are especially useful in showing relationships, speeding, slowing down, a

nd instability in the data set.

Source: Netquest- A Comprehensive Guide to Data Visualization (Melisa Matias)

Color Schemes for Data Visualization in Python

Color is one of the most powerful data resources visual acuity, and it is important if we are to understand the details correctly. Color can be used to separate elements, balance or represents values, and interacts with cultural symbols associated with a particular color. It rules our understanding again so that we can analyze it, we must first understand its three types:

Hue: This is what we usually think of when we upload a photo color. There is no order of colors; they can only be distinguished by their characteristics (blue, red, yellow, etc.).

Brightness: This is an average measure that describes the amount of light reflected in an object with another. Light is measured on a scale, and we can talk about bright and dark values ​​in one color.


: this refers to the intensity of a given color. It varies according to light. Dark colors are less saturated, and when color is less saturated, they approach gray. In other words, it comes close to a neutral (empty) color. The following diagram provides a summary of the color application.

to Data Visualization (Melisa Matias)

Data Visualization in Python

We’ll start with a basic look at the details, then move on to chart planning and finally, we’ll create working charts. 

We will work with two data shares that will match the display we are showing in the article, data sets can be downloaded here

It is a description of the popularity of Internet search in three terms related to artificial intelligence (data science, machine learning, and in-depth learning). They were removed from a popular search engine.

There are two chúng tôi and chúng tôi files. The first one we will use in most studies includes data on the popularity of three words over time (from 2004 to now, 2023). In addition, I have added category variables (singular and zero) to show the functionality of charts that vary by category.

The chúng tôi file contains country-class preference data. We will use it in the final section of the article when working with maps.

Before we move on to the more sophisticated methods, let’s start with the most basic way of visualizing data. We will simply use pandas to look at the details and get an idea of ​​how it is being distributed.

The first thing we have to do is visualize a few examples to see which columns, what information they contain, how the numbers are written.

In the descriptive command, we will see how the data is distributed, size, minimum, mean.


With the information command, we will see what kind of data each column includes. We can find a column case that when viewed with a command of the head appears to be a number but if we look at the data following the values ​​of the string format, the variable will be written as a character unit. Data Visualization in Python using Matplotlib

Matplotlib is the most basic library for viewing information about drawings. It includes as many graphs as we can think of. Just because it is basic does not mean that it is weak, many of the other viewing libraries we will be talking about are based on it.

Matplotlib charts are made up of two main elements, axes (lines separating the chart area) and a number (where we draw the X-axis and Y-axis). Now let’s build the simplest graph:

import matplotlib.pyplot as plt plt.plot(df['Mes'], df['data science'], label='data science')

We can make graphs of many variations on the same graph and compare them.

plt.plot(df['Mes'], df['data science'], label='data science') plt.plot(df['Mes'], df['machine learning'], label='machine learning') plt.plot(df['Mes'], df['deep learning'], label='deep learning') plt.xlabel('Date') plt.ylabel('Popularity') plt.title('Popularity of AI terms by date') plt.grid(True) plt.legend()

If you are working with Python from a terminal or script, after explaining the graph of the functions listed above use chúng tôi (). If working from Jupyter notebook, add% matplotlib to the queue at the beginning of the file and run it before creating a chart.

We can do many graphics in one number. This is best done by comparing charts or sharing information from several types of charts easily with a single image.

fig, axes = plt.subplots(2,2) axes[0, 0].hist(df['data science']) axes[0, 1].scatter(df['Mes'], df['data science']) axes[1, 0].plot(df['Mes'], df['machine learning']) axes[1, 1].plot(df['Mes'], df['deep learning'])

We can draw a graph with different styles of different points for each:

plt.plot(df['Mes'], df['data science'], 'r-') plt.plot(df['Mes'], df['data science']*2, 'bs') plt.plot(df['Mes'], df['data science']*3, 'g^')

Now let’s look at a few examples of different graphics we can make with Matplotlib. We start with the scatterplot:

plt.scatter(df['data science'], df['machine learning'])

With Bar chart:['Mes'], df['machine learning'], width=20)

With Histogram:

plt.hist(df['deep learning'], bins=15) Data Visualization in Python using Seaborn

Seaborn is a library based on Matplotlib. Basically what it offers us are beautiful drawings and works to create complex types of drawings with just one line of code.

We enter the library and start drawing style with chúng tôi (), without this command the graphics will still have the same style as Matplotlib. We show you one of the simplest graphics, scatterplot.

import seaborn as sns sns.set() sns.scatterplot(df['Mes'], df['data science'])

We can add details of more than two changes to the same graph. In this case, we use colors and sizes. We also create a separate graph depending on the category column value:

sns.relplot(x='Mes', y='deep learning', hue='data science', size='machine learning', col='categorical', data=df)

One of the most popular drawings provided by Seaborn is the heatmap. It is very common to use it to show all connections between variables in the dataset:

sns.heatmap(df.corr(), annot=True, fmt='.2f')

Another favorite is the pair plot which shows the relationship between all the variables. Be aware of this function if you have a large database, as it should show all data points as often as columns, meaning that by increasing the data size, the processing time is greatly increased.


Now let’s make a pair plot showing charts divided into price range by category

sns.pairplot(df, hue='categorical')

A very informative joint plot graph that allows us to see the spread plot as well as the histogram of two types and see how they are distributed:

sns.jointplot(x='data science', y='machine learning', data=df)

Another interesting drawing is the VietnaminPlot:

sns.catplot(x='categorical', y='data science', kind='violin', data=df) Data Visualization in Python using Bokeh

Bokeh is a library that allows you to produce interactive graphics. We can send them to HTML text that we can share with anyone with a web browser.

It is a very useful library where we have the desire to look at things in drawings and want to be able to zoom in on a picture and walk around the picture. Or when we want to share it and allow someone else to test the data.

We start by entering the library and defining the file to save the graph:

from bokeh.plotting import figure, output_file, save output_file('data_science_popularity.html')

We draw what we want and save it to a file:

p = figure(title='data science', x_axis_label='Mes', y_axis_label='data science') p.line(df['Mes'], df['data science'], legend='popularity', line_width=2) save(p) Other Tools for Data Visualization

Some data visualization tools help in visualizing the data effectively and faster than the traditional python coding method. These are some of the examples:


Databox is a data recognition tool used by more than 15,000 businesses and marketing agencies. Databox pulls your data in one place to track real-time performance with attractive displays.

Databox is ideal for marketing groups that want to be quickly set up with dashboards. With a single 70+ combination and no need to code, it is a very easy tool to use.

Zoho Analytics

Zoho Analytics is probably one of the most popular BI tools on this list. One thing you can be sure of is that with Zoho analytics, you can upload your data securely. Additionally, you can use a variety of charts, tables, and objects to transform your data concisely.


If you want to easily visualize and visualize data, then Tableau is a tool for visualizing your data. It helps you to create charts, maps, and all other technical graphics. To improve your visual presentation, you can also get a desktop app.

Additionally, if you are experiencing a problem with the installation of any third-party application, then it provides a “lock server” solution to help visualize online and mobile messaging applications.

You can check out my article on Analytics Vidhya for more information on trending Data Visualization Tools. Top 10 Data Visualization Tools.


With all these different libraries you may be wondering which library is right for your project. The quick answer is a library that lets you easily create the image you want.

In the initial stages of the project, with pandas and pandas profiling we will make a quick visualization to understand the data. If we need to visualize more details we can use simple graphs that we can find in the plots such as scatterplots or histograms.

End Notes

In this article, we discussed Data Visualization. Some basic formats of data visualization and some practical implementation of python libraries for data visualization. Finally, we concluded with some tools which can perform the data visualization in python effectively.

Thanks For Reading!

About Me:

Hey, I am Sharvari Raut. I love to write!

Connect with me on:

Image Source

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.


Demystifying Bert: A Comprehensive Guide To The Groundbreaking Nlp Framework


Google’s BERT has transformed the Natural Language Processing (NLP) landscape

Learn what BERT is, how it works, the seismic impact it has made, among other things

We’ll also implement BERT in Python to give you a hands-on learning experience

Introduction to the World of BERT

Picture this – you’re working on a really cool data science project and have applied the latest state-of-the-art library to get a pretty good result. And boom! A few days later, there’s a new state-of-the-art framework in town that has the potential to further improve your model.

That is not a hypothetical scenario – it’s the reality (and thrill) of working in the field of Natural Language Processing (NLP)! The last two years have been mind-blowing in terms of breakthroughs. I get to grips with one framework and another one, potentially even better, comes along.

Google’s BERT is one such NLP framework. I’d stick my neck out and say it’s perhaps the most influential one in recent times (and we’ll see why pretty soon).

It’s not an exaggeration to say that BERT has significantly altered the NLP landscape. Imagine using a single model that is trained on a large unlabelled dataset to achieve State-of-the-Art results on 11 individual NLP tasks. And all of this with little fine-tuning. That’s BERT! It’s a tectonic shift in how we design NLP models.

BERT has inspired many recent NLP architectures, training approaches and language models, such as Google’s TransformerXL, OpenAI’s GPT-2, XLNet, ERNIE2.0, RoBERTa, etc.

I aim to give you a comprehensive guide to not only BERT but also what impact it has had and how this is going to affect the future of NLP research. And yes, there’s a lot of Python code to work on, too!

Note: In this article, we are going to talk a lot about Transformers. If you aren’t familiar with it, feel free to read this article first – How do Transformers Work in NLP? A Guide to the Latest State-of-the-Art Models.

Table of Contents

What is BERT?

From Word2vec to BERT: NLP’s quest for learning language representations

How Does BERT Work? A Look Under the Hood

Using BERT for Text Classification (Python Code)

Beyond BERT: Current State-of-the-Art in NLP

What is BERT?

You’ve heard about BERT, you’ve read about how incredible it is, and how it’s potentially changing the NLP landscape. But what is BERT in the first place?

Here’s how the research team behind BERT describes the NLP framework:

“BERT stands for Bidirectional Encoder Representations from Transformers. It is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of NLP tasks.”

That sounds way too complex as a starting point. But it does summarize what BERT does pretty well so let’s break it down.

First, it’s easy to get that BERT stands for Bidirectional Encoder Representations from Transformers. Each word here has a meaning to it and we will encounter that one by one in this article. For now, the key takeaway from this line is – BERT is based on the Transformer architecture.

Second, BERT is pre-trained on a large corpus of unlabelled text including the entire Wikipedia(that’s 2,500 million words!) and Book Corpus (800 million words).

This pre-training step is half the magic behind BERT’s success. This is because as we train a model on a large text corpus, our model starts to pick up the deeper and intimate understandings of how the language works. This knowledge is the swiss army knife that is useful for almost any NLP task.

Third, BERT is a “deeply bidirectional” model. Bidirectional means that BERT learns information from both the left and the right side of a token’s context during the training phase.

The bidirectionality of a model is important for truly understanding the meaning of a language. Let’s see an example to illustrate this. There are two sentences in this example and both of them involve the word “bank”:

BERT captures both the left and right context

If we try to predict the nature of the word “bank” by only taking either the left or the right context, then we will be making an error in at least one of the two given examples.

One way to deal with this is to consider both the left and the right context before making a prediction. That’s exactly what BERT does! We will see later in the article how this is achieved.

And finally, the most impressive aspect of BERT. We can fine-tune it by adding just a couple of additional output layers to create state-of-the-art models for a variety of NLP tasks.

From Word2Vec to BERT: NLP’s Quest for Learning Language Representations

“One of the biggest challenges in natural language processing is the shortage of training data. Because NLP is a diversified field with many distinct tasks, most task-specific datasets contain only a few thousand or a few hundred thousand human-labelled training examples.” – Google AI

Word2Vec and GloVe

The quest for learning language representations by pre-training models on large unlabelled text data started from word embeddings like Word2Vec and GloVe. These embeddings changed the way we performed NLP tasks. We now had embeddings that could capture contextual relationships among words.

These embeddings were used to train models on downstream NLP tasks and make better predictions. This could be done even with less task-specific data by utilizing the additional information from the embeddings itself.

One limitation of these embeddings was the use of very shallow Language Models. This meant there was a limit to the amount of information they could capture and this motivated the use of deeper and more complex language models (layers of LSTMs and GRUs).

Another key limitation was that these models did not take the context of the word into account. Let’s take the above “bank” example. The same word has different meanings in different contexts, right? However, an embedding like Word2Vec will give the same vector for “bank” in both the contexts.

That’s valuable information we are losing.

Enter ELMO and ULMFiT

ELMo was the NLP community’s response to the problem of Polysemy – same words having different meanings based on their context. From training shallow feed-forward networks (Word2vec), we graduated to training word embeddings using layers of complex Bi-directional LSTM architectures. This meant that the same word can have multiple ELMO embeddings based on the context it is in.

ULMFiT took this a step further. This framework could train language models that could be fine-tuned to provide excellent results even with fewer data (less than 100 examples) on a variety of document classification tasks. It is safe to say that ULMFiT cracked the code to transfer learning in NLP.

This is when we established the golden formula for transfer learning in NLP:

Transfer Learning in NLP = Pre-Training and Fine-Tuning

Most of the NLP breakthroughs that followed ULMFIT tweaked components of the above equation and gained state-of-the-art benchmarks.

OpenAI’s GPT

OpenAI’s GPT extended the methods of pre-training and fine-tuning that were introduced by ULMFiT and ELMo. GPT essentially replaced the LSTM-based architecture for Language Modeling with a Transformer-based architecture.

The GPT model could be fine-tuned to multiple NLP tasks beyond document classification, such as common sense reasoning, semantic similarity, and reading comprehension.

GPT also emphasized the importance of the Transformer framework, which has a simpler architecture and can train faster than an LSTM-based model. It is also able to learn complex patterns in the data by using the Attention mechanism.

OpenAI’s GPT validated the robustness and usefulness of the Transformer architecture by achieving multiple State-of-the-Arts.

And this is how Transformer inspired BERT and all the following breakthroughs in NLP.

Now, there were some other crucial breakthroughs and research outcomes that we haven’t mentioned yet, such as semi-supervised sequence learning. This is because they are slightly out of the scope of this article but feel free to read the linked paper to know more about it.

Moving onto BERT

So, the new approach to solving NLP tasks became a 2-step process:

Train a language model on a large unlabelled text corpus (unsupervised or semi-supervised)

Fine-tune this large model to specific NLP tasks to utilize the large repository of knowledge this model has gained (supervised)

With that context, let’s understand how BERT takes over from here to build a model that will become a benchmark of excellence in NLP for a long time.

How Does BERT Work? A Look Under the Hood

Let’s look a bit closely at BERT and understand why it is such an effective method to model language. We’ve already seen what BERT can do earlier – but how does it do it? We’ll answer this pertinent question in this section.

1. BERT’s Architecture

The BERT architecture builds on top of Transformer. We currently have two variants available:


The BERT Base architecture has the same model size as OpenAI’s GPT for comparison purposes. All of these Transformer layers are Encoder-only blocks.

If your understanding of the underlying architecture of the Transformer is hazy, I will recommend that you read about it here.

Now that we know the overall architecture of BERT, let’s see what kind of text processing steps are required before we get to the model building phase.

2. Text Preprocessing

The developers behind BERT have added a specific set of rules to represent the input text for the model. Many of these are creative design choices that make the model even better.

For starters, every input embedding is a combination of 3 embeddings:

Position Embeddings: BERT learns and uses positional embeddings to express the position of words in a sentence. These are added to overcome the limitation of Transformer which, unlike an RNN, is not able to capture “sequence” or “order” information

Segment Embeddings: BERT can also take sentence pairs as inputs for tasks (Question-Answering). That’s why it learns a unique embedding for the first and the second sentences to help the model distinguish between them. In the above example, all the tokens marked as EA belong to sentence A (and similarly for EB)

Token Embeddings: These are the embeddings learned for the specific token from the WordPiece token vocabulary

For a given token, its input representation is constructed by summing the corresponding token, segment, and position embeddings.

Such a comprehensive embedding scheme contains a lot of useful information for the model.

These combinations of preprocessing steps make BERT so versatile. This implies that without making any major change in the model’s architecture, we can easily train it on multiple kinds of NLP tasks.

3. Pre-training Tasks

BERT is pre-trained on two NLP tasks:

Masked Language Modeling

Next Sentence Prediction

Let’s understand both of these tasks in a little more detail!

a. Masked Language Modeling (Bi-directionality)

Need for Bi-directionality

BERT is designed as a deeply bidirectional model. The network effectively captures information from both the right and left context of a token from the first layer itself and all the way through to the last layer.

Traditionally, we had language models either trained to predict the next word in a sentence (right-to-left context used in GPT) or language models that were trained on a left-to-right context. This made our models susceptible to errors due to loss in information.

Predicting the word in a sequence

ELMo tried to deal with this problem by training two LSTM language models on left-to-right and right-to-left contexts and shallowly concatenating them. Even though it greatly improved upon existing techniques, it wasn’t enough.

“Intuitively, it is reasonable to believe that a deep bidirectional model is strictly more powerful than either a left-to-right model or the shallow concatenation of a left-to-right and a right-to-left model.” – BERT

That’s where BERT greatly improves upon both GPT and ELMo. Look at the below image:

The arrows indicate the information flow from one layer to the next. The green boxes at the top indicate the final contextualized representation of each input word.

It’s evident from the above image: BERT is bi-directional, GPT is unidirectional (information flows only from left-to-right), and ELMO is shallowly bidirectional.

This is where the Masked Language Model comes into the picture.

About Masked Language Models

Let’s say we have a sentence – “I love to read data science blogs on Analytics Vidhya”. We want to train a bi-directional language model. Instead of trying to predict the next word in the sequence, we can build a model to predict a missing word from within the sequence itself.

Let’s replace “Analytics” with “[MASK]”. This is a token to denote that the token is missing. We’ll then train the model in such a way that it should be able to predict “Analytics” as the missing token: “I love to read data science blogs on [MASK] Vidhya.”

This is the crux of a Masked Language Model. The authors of BERT also include some caveats to further improve this technique:

To prevent the model from focusing too much on a particular position or tokens that are masked, the researchers randomly masked 15% of the words

The masked words were not always replaced by the masked tokens [MASK] because the [MASK] token would never appear during fine-tuning

So, the researchers used the below technique:

80% of the time the words were replaced with the masked token [MASK]

10% of the time the words were replaced with random words

10% of the time the words were left unchanged

I have shown how to implement a Masked Language Model in Python in one of my previous articles here:

b. Next Sentence Prediction

Masked Language Models (MLMs) learn to understand the relationship between words. Additionally, BERT is also trained on the task of Next Sentence Prediction for tasks that require an understanding of the relationship between sentences.

A good example of such a task would be question answering systems.

The task is simple. Given two sentences – A and B, is B the actual next sentence that comes after A in the corpus, or just a random sentence?

Since it is a binary classification task, the data can be easily generated from any corpus by splitting it into sentence pairs. Just like MLMs, the authors have added some caveats here too. Let’s take this with an example:

Consider that we have a text dataset of 100,000 sentences. So, there will be 50,000 training examples or pairs of sentences as the training data.

For 50% of the pairs, the second sentence would actually be the next sentence to the first sentence

For the remaining 50% of the pairs, the second sentence would be a random sentence from the corpus

The labels for the first case would be ‘IsNext’ and ‘NotNext’ for the second case

And this is how BERT is able to become a true task-agnostic model. It combines both the Masked Language Model (MLM) and the Next Sentence Prediction (NSP) pre-training tasks.

Implementing BERT for Text Classification in Python

One of the most potent ways would be fine-tuning it on your own task and task-specific data. We can then use the embeddings from BERT as embeddings for our text documents.

In this section, we will learn how to use BERT’s embeddings for our NLP task. We’ll take up the concept of fine-tuning an entire BERT model in one of the future articles.

For extracting embeddings from BERT, we will use a really useful open source project called Bert-as-Service:

Running BERT can be a painstaking process since it requires a lot of code and installing multiple packages. That’s why this open-source project is so helpful because it lets us use BERT to extract encodings for each sentence in just two lines of code.

Installing BERT-As-Service

BERT-As-Service works in a simple way. It creates a BERT server which we can access using the Python code in our notebook. Every time we send it a sentence as a list, it will send the embeddings for all the sentences.

We can install the server and client via pip. They can be installed separately or even on different machines:

pip install bert-serving-server

# server

pip install bert-serving-client

# client, independent of `bert-serving-server`

Also, since running BERT is a GPU intensive task, I’d suggest installing the bert-serving-server on a cloud-based GPU or some other machine that has high compute capacity.

Now, go back to your terminal and download a model listed below. Then, uncompress the zip file into some folder, say /tmp/english_L-12_H-768_A-12/.

Here’s a list of the released pre-trained BERT models:

We’ll download BERT Uncased and then decompress the zip file:

Once we have all the files extracted in a folder, it’s time to start the BERT service:

bert-serving-start -model_dir uncased_L-12_H-768_A-12/ -num_worker=2 -max_seq_len 50

You can now simply call the BERT-As-Service from your Python code (using the client library). Let’s just jump into code!

Open a new Jupyter notebook and try to fetch embeddings for the sentence: “I love data science and analytics vidhya”.

View the code on Gist.

Here, the IP address is the IP of your server or cloud. This field is not required if used on the same computer.

The shape of the returned embedding would be (1,768) as there is only a single sentence which is represented by 768 hidden units in BERT’s architecture.

Problem Statement: Classifying Hate Speech on Twitter

Let’s take up a real-world dataset and see how effective BERT is. We’ll be working with a dataset consisting of a collection of tweets that are classified as being “hate speech” or not.

For the sake of simplicity, we say a tweet contains hate speech if it has a racist or sexist sentiment associated with it. So, the task is to classify racist or sexist tweets from other tweets.

You can download the dataset and read more about the problem statement on the DataHack platform.

We will use BERT to extract embeddings from each tweet in the dataset and then use these embeddings to train a text classification model.

Here is how the overall structure of the project looks like:

Let’s look at the code now:

You’ll be familiar with how most people tweet. There are many random symbols and numbers (aka chat language!). Our dataset is no different. We need to preprocess it before passing it through BERT:

Python Code:

Now that the dataset is clean, it’s time to split it into training and validation set:

View the code on Gist.

Let’s get the embeddings for all the tweets in the training and validation sets:

View the code on Gist.

It’s model building time! Let’s train the classification model:

View the code on Gist.

Check the classification accuracy:

View the code on Gist.

Even with such a small dataset, we easily get a classification accuracy of around 95%. That’s damn impressive.

In the next article, I plan to take a BERT model and fine-tune it fully on a new dataset and compare its performance.

Beyond BERT: Current State-of-the-Art in NLP

BERT has inspired great interest in the field of NLP, especially the application of the Transformer for NLP tasks. This has led to a spurt in the number of research labs and organizations that started experimenting with different aspects of pre-training, transformers and fine-tuning.

Many of these projects outperformed BERT on multiple NLP tasks. Some of the most interesting developments were RoBERTa, which was Facebook AI’s improvement over BERT and DistilBERT, which is a compact and faster version of BERT.

You can read more about these amazing developments regarding State-of-the-Art NLP in this article.


A Comprehensive Guide To Understanding Image Steganography Techniques And Types


In today’s digital world, privacy and data protection have become increasingly important. With sensitive information being transmitted online every day, ensuring that it stays out of the wrong hands is crucial.

Understanding Steganography And Image Steganography

Steganography is the practice of hiding information within other non-secret media, and image steganography specifically involves concealing data within images using various techniques.

Definition And History Of Steganography

‘Steganography’ word is derived from the Greek words “steganos” (covered) and “graphein” (writing), is the art and practice of hiding information within other data or media files so that it remains undetected. In contrast to cryptography, which encrypts messages to make them unreadable without a decryption key, steganography aims at concealing the very existence of secret communication by embedding it within ordinary-looking carrier files.

Purpose And Applications Of Image Steganography Types And Techniques Of Image Steganography

There are various types and techniques of image steganography, including spatial domain steganography, transform domain steganography, compressed domain steganography, least significant bit (LSB) technique, pixel value differencing (PVD) technique, spread spectrum technique, and randomized embedding technique.

Spatial Domain Steganography

Alters pixel values in images to embed hidden data, commonly using Least Significant Bit (LSB) substitution. It operates directly on the raw bits of a digital image without applying mathematical transforms. Visual cryptography can also be employed for hiding messages within images.

Transform Domain Steganography

Manipulates frequency information in images, providing a more robust system for embedding secret data that resists steganalysis techniques. Examples include Discrete Cosine Transform (DCT), Discrete Fourier Transform (DFT), and Wavelet-based steganography, with DCT often used in JPEG compression and Wavelet-based steganography providing better performance in adapting to different signal types.

Compressed Domain Steganography

Hides information within the compressed data of an image file to reduce file size and detection difficulty. It involves embedding the covert message in the least significant bits or reserved areas of compressed data. The challenge lies in preserving image quality and avoiding degradation due to multiple compressions.

Least Significant Bit (LSB) Technique

Changes the least significant bits of an image’s color channel to hide information without significantly altering the image’s appearance. It is easy to implement and undetectable to the human eye but has limited capacity for hiding information. Variations include randomizing pixels containing hidden data or using multiple color channels.

Pixel Value Differencing (PVD) Technique

Identifies and modifies pixels with small value differences to encode information in both grayscale and color images. It requires precise changes to pixel values, and using it on highly compressed or low-quality images may result in artifacts or distortion revealing the presence of hidden data.

Spread Spectrum Technique Randomized Embedding Technique

Uses randomization to hide secret data in images, making detection difficult with algorithms like the F5 algorithm that use frequency domain analysis and randomness. It shuffles the position of each bit within an image, creating a modified version of the original image that contains hidden information. It is useful in various applications, including forensic investigations.

Evaluations, Trends, And Future Research

This section will discuss the current state of image steganography research, emerging trends and developments in the field, potential future applications, as well as provide examples of image steganography and their techniques.

Current State Of Image Steganography Research

Image steganography research focuses on developing new techniques for concealing and extracting information from digital images, improving capacity and robustness against detection. Areas of interest include deep learning algorithms for steganalysis and examining security risks posed by image steganography on social media and other online platforms. Challenges remain, such as embedding larger amounts of data without degrading image quality.

Emerging Trends And Developments

Advanced algorithms − Researchers are developing complex mathematical models to hide data in ways difficult for unauthorized individuals to detect.

AI-powered steganography − AI tools have proven effective at hiding information without detection, holding promise for future cybersecurity applications.

Steganalysis − Researchers are developing sophisticated software programs to identify hidden data within images, enhancing detection capabilities.

Potential Future Applications

Data protection in industries − Image steganography techniques may be used to protect sensitive data in finance, healthcare, government agencies, and legal offices.

Social media security − Users can share confidential information with trusted contacts on social media platforms without drawing unwanted attention using steganographic techniques.

Intellectual property protection − Image recognition software could benefit from steganographic algorithms by embedding metadata in digital images to prevent theft and verify ownership rights.

Examples Of Image Steganography And Their Techniques

Image steganography techniques can be used to conceal information in a variety of ways. Here are some examples of image steganography and the techniques used

Embedded Text − This technique involves hiding text within an image by changing individual pixels’ color values. The least significant bit (LSB) method is commonly used to embed text, as it allows small amounts of data to be hidden without altering the overall appearance of the image.

Image Steganography Tools − There are various tools available online that employ steganography techniques for hiding images or other data within other files’ metadata.

Video Steganography − The process of embedding a message within a digital video file is known as video steganography. Videos frequently have messages embedded using methods like Frame Differencing and Discrete Cosine Transform (DCT).

Spatial Domain Techniques − In spatial domain techniques, the confidential message is embedded into an image pixel’s color value by manipulating its least significant bit (LSB) or pixel value differencing (PVD).

Compressed Domain Techniques − In compressed domain techniques, data is hidden within the compression process itself by inserting additional data into the quantization tables of JPEG compression.


In conclusion, image steganography is a vital tool for ensuring data privacy and security in today’s digital world. This comprehensive guide has provided insights into the different types and techniques of this practice, ranging from spatial to compressed domain steganography.

The LSB technique, PVD technique, spread spectrum technique, and randomized embedding technique were also explored in-depth. Staganography will continue to be essential in protecting sensitive information from hackers as technology develops at an unparalleled rate.

With the knowledge you’ve gained from this guide, you’re now equipped with the necessary tools to understand how covert channels can be used for secret communication through digital media using image processing algorithms such as DCT and frequency domain analysis. By understanding these concepts and applying them effectively in your work or personal life, you can ensure that your data stays protected while online!

Update the detailed information about A Simple Guide To Perform A Comprehensive Content Audit on the website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!