# Trending February 2024 # Comprehensive Guide To Matlab Count # Suggested March 2024 # Top 7 Popular

You are reading the article Comprehensive Guide To Matlab Count updated in February 2024 on the website Minhminhbmm.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested March 2024 Comprehensive Guide To Matlab Count

Introduction to Matlab Count

To find the presence of a particular event in the Matlab program count command is used. If an input is a string then by using this command we can find out how many times a specific character occurs in the input string. We can apply count command on array also. If the input is set of numbers in the form of the array then by using count command we can find out how much time a particular number present in the array. The count command is used in two ways depending upon the parameter list.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

How does Count work in Matlab?

The count command is used in two ways .one is case sensitive and the other one is not. Mostly this command is used in strings operations where we need to find out the occurrence of characters. But input string one character may be present multiple times and in both the cases upper case and lower case. Therefore the second way is used to ignore the case of alphabets.

Step 1: Accept the input string.

Step 2: Declare variable to store count and apply the command count.

Step 3: Display the result.

Syntax:

Variable name = count (input, ‘event’, ‘ignoreCase’, true)

Examples of Matlab Count

The example of the following are given below:

Example #1

Let us consider input string as “blueberry is blue” and we need to find out the occurrence of “blue” in a given string. Table 1illustrate the Matlab code, for example, chúng tôi output is 2 because blue occurs two times one in a single string and second is along with other string (blueberry).

Code:

c= count( input , ‘blue’)

Output:

Example #2

Let us consider input string as “Blueberry is blue” and we need to find out the occurrence of “blue” in the given string. Here the output is 1 because blue occurs two times one in a single string and second is along with other string (blueberry). But in uppercase therefore if we use the second method to write count command then the output will be 2. Table 2(a) illustrates the Matlab code for example 2 by using the first approach and .table 2(b) illustrates the Matlab code for example 2 by using the second approach.

c= count( input , ‘blue’)

Output:

Code:

c= count( input , ‘blue’ ,IgnoreCase’ ,true)

Output:

Example #3

Let us consider input string in multidimensional array as “ one, five, eleven, five, four, ten, one, four, three ”; “ one, zero, ten, zero, ten, one, two, ten, one, eight ” and we need to find out occurrence of “ one ” in given string. Here the output is different for different rows. The elements of two rows separated by “; ”. Table 3 illustrates the Matlab code for example 3.

Code:

Input = [“ one, five, eleven, five, four, ten, one, four, three ”; “ one, zero, ten, zero, ten, one, two, ten, one, eight ”]

Output:

Example #4

Let us consider input string in multidimensional array as “ one, five, eleven, five, four, ten, one, four, three ”; “ one, zero, ten, zero, ten, one, two, ten, one, eight ” and we need to find out occurrence of two events simultaneously .“ one ”,” two” in given string. Here the output is different for different rows. The elements of two rows separated by “; ”. Tables 4 illustrate the Matlab code, for example, chúng tôi output is 2 and 4 because in the first string ‘one’ occurs two times and ‘two’ occurs zero times. And in the second string ‘one’ occurs three times and ‘ two ’ occurs one time.

Code:

Input = [“ one, five, eleven, five, four, ten, one, four, three ”; “ one, zero, ten, zero, ten, one, two, ten, one, eight ”] c=count(input , [“one”,“two”]

Output:

Conclusion

As we have seen in the above example count command is used in multiple ways .we can apply this command one dimensional as well as multidimensional arrays .mostly count command is used in string operation to count the occurrence of alphabets in any manner.

Recommended Articles

This is a guide to Matlab Count. Here we discuss the introduction, How does Count work in Matlab and Examples along with the Syntax and the codes & outputs. You can also go through our other related articles to learn more–

You're reading Comprehensive Guide To Matlab Count

## A Comprehensive Guide On Kubernetes

This article was published as a part of the Data Science Blogathon.

Image-1

Introduction

Today, In this guide, we will dive in to learn about Kubernetes and use it to deploy and manage containers at scale.

Container and microservice architecture had used more to create modern apps. Kubernetes is open-source software that allows you to deploy and manage containers at scale. It divides containers into logical parts to make your application’s management, discovery, and scaling easier.

The main goal of this guide is to provide a complete overview of the Kubernetes ecosystem while keeping it basic and straightforward. It covers Kubernetes’ core ideas before applying them to a real-world scenario.

Even if you have no prior experience with Kubernetes, this article will serve as an excellent starting point for your journey.

So, without further ado, let’s get this learning started.

Why Kubernetes?

Before we go into the technical ideas, let us start with why a developer should use Kubernetes in the first place. Here are a few reasons why developers should use Kubernetes in their projects.

Portability

When using Kubernetes, moving containerized applications from development to production appears to be an easy process. Kubernetes enables developers to orchestrate containers in various environments, including on-premises infrastructure, public and hybrid clouds.

Scalability

Kubernetes simplifies the process of defining complex containerized applications and deploying them globally across multiple clusters of servers by reducing resources based on your desired state. Kubernetes automatically checks and maintains container health when horizontally scaling applications.

Extensibility

Kubernetes has a vast and ever-expanding collection of extensions and plugins created by developers and businesses that make it simple to add unique capabilities to your clusters such as security, monitoring, or management.

Concepts

Using Kubernetes necessitates an understanding of the various abstractions it employs to represent the state of the system. That is the focus of this section. We get acquainted with the essential concepts and provide you with a clearer picture of the overall architecture.

Pods

A Pod is a collection of multiple containers of application that share storage, a unique cluster IP address, and instructions for running them (e.g. ports, restart, container image, and failure policies).

They are the foundation of the Kubernetes platform. While creating a service or a deployment, Kubernetes creates a Pod with the container inside.

Each pod runs on the node where it is scheduled and remains there until it is terminated or deleted. If the node fails or stops, Kubernetes will automatically schedule identical Pods on the cluster’s other available Nodes.

Image-2

Node

A node is a worker machine in a Kubernetes cluster that can be virtual or physical depending on the cluster type. The master is in charge of each node. The master involuntary schedules pods across all nodes in the cluster, based on their available resources and current configuration.

Each node is required to run at least two services:

Kubelet is a process that communicates between the Kubernetes master and the node.

Image-3

Services

A Service is an abstraction that describes a logical set of Pods and the policies for accessing them. Services allow for the loose coupling of dependent Pods.

Even though each pod has a distinct IP-Address, those addresses are not visible to the outside world. As a result, a service enables your deployment to receive traffic from external sources.

We can expose services in a variety of ways:

ClusterIP (standard) – Only expose the port to the cluster’s internals.

NodePort – Use NAT to reveal the service on the same port on every node in the cluster

Image-4

Deployments

Deployments include a description of your application’s desired state. The deployment controller will process to ensure that the application’s current state matches that description.

A deployment automatically runs many replicates of your program and replaces any instances that fail or become unresponsive. Deployments help to know that your program is ready to serve user requests in this fashion.

Image-5

Installation

Before we dive into building our cluster, we must first install Kubernetes on our local workstation.

Docker Desktop

If you’re using Docker desktop on Windows or Mac, you may install Kubernetes directly from the user interface’s settings pane.

Others

If you are not using the Docker desktop, I recommend that you follow the official installation procedure for Kubectl and Minikube.

Basics

Now that we’ve covered the fundamental ideas. Let’s move on to the practical side of Kubernetes. This chapter will walk you through the fundamentals required to deploy apps in a cluster.

Creating cluster

When you launch Minikube, it immediately forms a cluster.

minikube start

After installation, the Docker desktop should also automatically construct a cluster. You may use the following commands to see if your cluster is up and running:

# Get information about the cluster kubectl cluster-info # Get all nodes of the cluster kubectl get nodes

Deploying an application:

Now that we’ve completed the installation and established our first cluster, we’re ready to deploy an application to Kubernetes.

kubectl create deployment nginx --image=nginx:latest

We use the create deployment command, passing inputs as the deployment name and the container image. This example deploys Nginx with one container and one replica.

Using the get deployments command, you may view your active deployments.

kubectl get deployments Information about deployments

Obtaining all of the pods

Using the kubectl get pods command, you can get a list of all running pods:

kubectl get pods

Detail description of a pod

Use describe command to get more detailed information about a pod.

kubectl describe pods

Logs of a pod

The data that your application would transmit to STDOUT becomes container logs. The following command will provide you access to those logs.

kubectl logs \$POD_NAME

Note: You may find out the name of your pod by using the get pods or describe pods commands.

Execute command in Container

The kubectl exec command, which takes the pod name and the term to run as arguments, allows us to perform commands directly in our container.

kubectl exec \$POD_NAME command

Let’s look at an example where we start a bash terminal in the container to see what I mean.

kubectl exec -it \$POD_NAME bash Exposing app publicly

A service, as previously said, establishes a policy by which the deployment can be accessible. We’ll look at how this is achieved in this section and other alternatives you have when exposing your services to the public.

Developing a service:

We can build a service with the create-service command, which takes the port we wish to expose and the kind of port as parameters.

kubectl create service nodeport nginx --tcp=80:80

It will generate service for our Nginx deployment and expose our container’s port 80 to a port on our host computer.

On the host system, use the kubectl get services command to obtain the port:

Image By Author

As you can see, port 80 of the container had routed to port 31041 of my host machine. When you have the port, you may test your deployment by accessing your localhost on that port.

Deleting a service

kubectl delete service nginx

Scale up the app

Scaling your application up and down is a breeze with Kubernetes. By using this command, you may alter the number of replicas, and Kubernetes will generate and maintain everything for you.

kubectl scale deployments/nginx --replicas=5

This command will replicate our Nginx service to a maximum of five replicas.

This way of application deployment works well for tiny one-container apps but lacks the overview and reusability required for larger applications. YAML files are helpful in this situation.

YAML files allow you to specify your deployment, services, and pods using a markup language, making them more reusable and scaleable. The following chapters will go over Yaml files in detail.

Kubernetes object in YAML

Every object in Kubernetes had expressed as a declarative YAML object that specifies what and how it should run. These files had used frequently to promote the reusability of resource configurations such as deployments, services, and volumes, among others.

This section will walk you through the fundamentals of YAML and how to acquire a list of all available parameters and characteristics for a Kubernetes object. We glance through the deployment and service files to understand the syntax and how it had deployed.

Parameters of different objects

There are numerous Kubernetes objects, and it is difficult to remember every setting. That’s where the explain command comes in.

You can also acquire documentation for a specific field by using the syntax:

kubectl explain deployment.spec.replicas

Deployment file

For ease of reusability and changeability, more sophisticated deployments are typically written in YAML.

The basic file structure is as follows:

apiVersion: apps/v1 kind: Deployment metadata: # The name and label of your deployment name: mongodb-deployment labels: app: mongo spec: # How many copies of each pod do you want replicas: 3 # Which pods are managed by this deployment selector: matchLabels: app: mongo # Regular pod configuration / Defines containers, volumes and environment variable template: metadata: # label the pod labels: app: mongo spec: containers: - name: mongo image: mongo:4.2 ports: - containerPort: 27017

There are several crucial sections in the YAML file:

apiVersion – Specifies the API version.

kind – The Kubernetes object type defined in the file (e.g. deployment, service, persistent volume, …)

metadata – A description of your YAML component that includes the component’s name, labels, and other information.

spec – Specifies the attributes of your deployment, such as replicas and resource constraints.

template – The deployment file’s pod configuration.

Now that you understand the basic format, you can use the apply command to deploy the file.

Service file

Service files are structured similarly to deployments, with slight variations in the parameters.

apiVersion: v1 kind: Service metadata: name: mongo spec: selector: app: mongo ports: - port: 27017 targetPort: 27017 type: LoadBalancer Storage

When the container restarts or pod deletion, its entire file system gets deleted. It is a good sign since it keeps your stateless application from getting clogged up with unnecessary data. In other circumstances, persisting your file system’s data is critical for your application.

There are several types of storage available:

The container file system stores the data of a single container till its existence.

Volumes allow you to save data and share it between containers as long as the pod is active.

Data had saved even if the pod gets erased or restarted using persistent volumes. They’re your Kubernetes cluster’s long-term storage.

Volumes

Volumes allow you to save, exchange, and preserve data amongst numerous containers throughout the pod. It is helpful if you have pods with many containers that communicate data.

In Kubernetes, there are two phases to using a volume:

The volume had defined by the pod.

The container use volume mounts to add the volume to a given filesystem path.

You can add a volume to your pod by using the syntax:

apiVersion: v1 kind: Pod metadata: name: nginx spec: containers: - name: nginx image: nginx volumeMounts: - name: nginx-storage mountPath: /etc/nginx volumes: - name: nginx-storage emptyDir: {}

Here volumes tag is used to provide a volume mounted to a particular directory of the container filesystem (in this case, /etc/nginx).

Persistent Volumes

These are nearly identical to conventional volumes, with unique difference data had preserved even if the pod gets erased. That is why they are employed for long-term data storing needs, such as a database.

A Persistent Volume Claim (PVC) object, which connects to backend storage volumes via a series of abstractions, is the most typical way to define a persistent volume.

Example of YAML Configuration file.

apiVersion: v1 kind: PersistentVolumeClaim metadata: name: pv-claim labels: app: sampleAppName spec: accessModes: - ReadWriteOnce resources: requests: storage: 20Gi

There are more options to save your data in Kubernetes, and you may automate as much of the process as feasible. Here’s a list of a few interesting subjects to look into.

Compute Resources

In consideration of container orchestration, managing computes resources for your containers and applications is critical.

When your containers have a set number of resources, the scheduler can make wise decisions about which node to place the pod. You will also have fewer resource contention issues with diverse deployments.

In the following two parts, we will go through two types of resource definitions in depth.

Requests

Limits

Secrets

Secrets in Kubernetes allow you to securely store and manage sensitive data such as passwords, API tokens, and SSH keys.

To use a secret in your pod, you must first refer to it. It can happen in many different ways:

Using an environment variable and as a file on a drive mounted to a container.

When kubelet pulls a picture from a private registry.

Creating a secret

Secrets had created using either the kubelet command tool or by declaring a secret Kubernetes object in YAML.

Using the kubelet

Kubelet allows you to create secrets with a create command that requires only the data and the secret name. The data gets entered using a file or a literal.

Using a file, the same functionality would look like this.

Making use of definition files

Secrets, like other Kubernetes objects, can be declared in a YAML file.

apiVersion: v1 kind: Secret metadata: name: secret-apikey data: apikey: YWRtaW4=

Your sensitive information is stored in the secret as a key-value pair, with apiKey as the key and YWRtaW4= as the base decoded value.

Using the apply command, you can now generate the secret.

kubectl apply -f secret.yaml

Use the stringData attribute instead if you wish to give plain data and let Kubernetes handle the encoding.

ImagePullSecrets

If you’re pulling an image from a private registry, you may need to authenticate first. When all of your nodes need to pull a specific picture, an ImagePullSecrets file maintains the authentication info and makes it available to them.

apiVersion: v1 kind: Pod metadata: name: private-image spec: containers: - name: privateapp image: gabrieltanner/graphqltesting imagePullSecrets: - name: authentification-secret Namespaces

Namespaces are virtual clusters had used to manage large projects and allocate cluster resources to many users. They offer a variety of names and can be nested within one another.

Managing and using namespaces with kubectl is simple. This section will walk you through the most common namespace actions and commands.

Look at the existing Namespaces

You can use the kubectl get namespaces command to see all of your cluster’s presently accessible namespaces.

kubectl get namespaces # Output NAME STATUS AGE default Active 32d docker Active 32d kube-public Active 32d kube-system Active 32d

Creating Namespace

Namespaces can be created with the kubectl CLI or by using YAML to create a Kubernetes object.

kubectl create namespace testnamespace # Output namespace/testnamespace created

The same functionality may be achieved with a YAML file.

apiVersion: v1 kind: Namespace metadata: name: testnamespace

The kubectl apply command can then be used to apply the configuration file.

kubectl apply -f testNamespace.yaml

Namespace Filtering

When a new object had created in Kubernetes without a custom namespace property, it adds to the default namespace.

You can do this if you want to construct your item in a different workspace.

kubectl create deployment --image=nginx nginx --namespace=testnamespace

You may now use the get command to filter for your deployment.

kubectl get deployment --namespace=testnamespace

Change Namespace

You’ve now learned how to construct objects in a namespace other than the default. However, adding the namespace to each command you want to run takes time and returns an error.

As a result, you can use the set-context command to change the default context to which instructions had applied.

kubectl config set-context \$(kubectl config current-context) --namespace=testnamespace

The get-context command can be used to validate the modifications.

kubectl config get-contexts # Output CURRENT NAME CLUSTER AUTHINFO NAMESPACE * Default Default Default testnamespace Kubernetes with Docker Compose

For individuals coming from the Docker community, writing Docker Compose files rather than Kubernetes objects may be simple. Kompose comes into play in this situation. It uses a simple CLI to convert or deploy your docker-compose file to Kubernetes (command-line interface).

How to Install Kompose

It is easy and quickly deployed on all three mature operating systems.

To install Kompose on Linux or Mac, curl the binaries.

# Linux # macOS chmod +x kompose sudo mv ./kompose /usr/local/bin/kompose

Deploying using Kompose

Kompose deploys Docker Compose files on Kubernetes using existing Docker Compose files. Consider the following compose file as an example.

version: "2" services: redis-master: image: chúng tôi ports: - "6379" redis-slave: image: gcr.io/google_samples/gb-redisslave:v1 ports: - "6379" environment: - GET_HOSTS_FROM=dns frontend: image: gcr.io/google-samples/gb-frontend:v4 ports: - "80:80" environment: - GET_HOSTS_FROM=dns labels: kompose.service.type: LoadBalancer

Kompose, like Docker Compose, lets us deploy our setup with a single command.

kompose up

You should now be able to see the resources that had produced.

kubectl get deployment,svc,pods,pvc

Converting Kompose

Kompose can also turn your existing Docker Compose file into the Kubernetes object you need.

kompose convert

kubectl apply -f filenames Application Deployment

Now that you’ve mastered the theory and all of Kubernetes’ core ideas, it’s time to put what you’ve learned into practice. This chapter will show you how to use Kubernetes to deploy a backend application.

This tutorial’s specific application is a GraphQL boilerplate for the chúng tôi backend framework.

First, let’s clone the repository.

Images to a Registry

We must first push the images to a publicly accessible Image Registry before starting the construction of Kubernetes objects. It can be a public registry like DockerHub or a private registry of your own.

Visit this post for additional information on creating your own private Docker Image.

To push the image, include the image tag in your Compose file along with the registry you want to move.

version: '3' services: nodejs: build: context: ./ dockerfile: Dockerfile image: gabrieltanner.dev/nestgraphql restart: always environment: - DATABASE_HOST=mongo - PORT=3000 ports: - '3000:3000' depends_on: [mongo] mongo: image: mongo ports: - '27017:27017' volumes: - mongo_data:/data/db volumes: mongo_data: {}

I used a private registry that I had previously set up, but DockerHub would work just as well.

Creating Kubernetes objects

Now that you’ve published your image to a registry, we’ll write our Kubernetes objects.

To begin, create a new directory in which to save the deployments.

mkdir deployments cd deployments touch mongo.yaml touch nestjs.yaml

It is how the MongoDB service and deployment will look.

apiVersion: v1 kind: Service metadata: name: mongo spec: selector: app: mongo ports: - port: 27017 targetPort: 27017 --- apiVersion: apps/v1 kind: Deployment metadata: name: mongo spec: selector: matchLabels: app: mongo template: metadata: labels: app: mongo spec: containers: - name: mongo image: mongo ports: - containerPort: 27017

A deployment object with a single MongoDB container called mongo had included in the file. It also comes with a service that allows the Kubernetes network to use port 27017.

Because the container requires some additional settings, such as environment variables and imagePullSecrets, the chúng tôi Kubernetes object is a little more complicated.

A load balancer helps the service that makes the port available on the host machine.

Deploy the application

Now that the Kubernetes object files are ready. Let us use kubectl to deploy them.

kubectl apply -f mongo.yaml kubectl apply -f nestjs.yaml

On localhost/graphql, you should now view the GraphQL playground.

Congratulations, you’ve just deployed your first Kubernetes application.

Image-6

You persevered to the end! I hope this guide has given you a better understanding of Kubernetes and the way to use it to improve your developer process, with better production-grade solutions.

Kubernetes was created using Google’s ten years of expertise running containerized apps at scale. It has already been adopted by the top public cloud suppliers and technology providers and is now being adopted by the majority of software manufacturers and companies. It even resulted in the formation of the Cloud Native Computing Foundation (CNCF) in 2024, which was the first project to graduate under CNCF and began streamlining the container ecosystem alongside other container-related projects like CNI, Containers, Envoy, Fluentd, gRPC, Jagger, Linkerd, and Prometheus. Its immaculate design, cooperation with industry leaders, making it open source, and always being open to ideas and contributions may be the main reasons for its popularity and endorsement at such a high level.

Learn basic tenets from our blog.

References

Image-1 – Photo by  Ian Taylor On Unsplash

Image-6 – Photo by  Xan Griffin On Unsplash

Related

## Comprehensive Guide On Tcp/Ip Model

What is TCP/IP Model?

Start Your Free Software Development Course

Web development, programming languages, Software testing & others

Understanding TCP/IP Model?

The United States Defense Department initially developed the Internet Protocol Suite during the 1970s. It connects heterogeneous systems and contains a popular set of communication protocols. TCP and IP are the most widely used protocols and differ from the OSI model.

How does TCP/IP work?

Below are a few points explaining the working of TCP/IP:

1. Network Access Layer

Here, the OSI model’s physical layer and data link layer combine to form the network access layer. It allows the transmission of data, physically, through the protocols and hardware elements of the layer. The availability of ARP is measured at layer 3 and summed up by layer 2 protocols.

2. Internet Layer

IP: Stands for “Internet Protocol,” and it is in charge of packet distribution. This distribution is achieved between the source and the destination through the IP addresses in the packet headers. IPv4 and IPv6 are the most widely used versions. All current websites use IPv4, while IPv6 is largely growing in numbers.

ICMP: Stands for “Internet Control Message Protocol.” All information regarding the network program is scripted here; it is measured to sum up with IP datagrams.

ARP: Stands for “Address Resolution Protocol.” ARP determines the hardware address from the specified internet protocol address. The major classifications of ARP are: Reverse ARP, Gratuitous ARP, Proxy ARP, and Inverse ARP.

3. Host-to-Host Layer

It is much equivalent to the OSI model transport layer. All the complexities of data are shielded from the upper layers. The key protocols here are:

User Datagram Protocol (UDP): It is very cost-effective and can be used when security is not a major factor. UDP is a connectionless protocol.

4. Process Layer

All the functions of the top three layers of the OSI model, i.e., Application Layer, Session Layer, and Presentation layer, execute herein. It is responsible for controlling user interface specifications and node-to-node communication. The most commonly used protocols are: HTTP SNMP, NTP,NFS, HTTPS, FTP, TFTP, Telnet, SSH, SMTP, DNS, DHCP, NFS, X Window, and LPD.

HTTP and HTTPS: Stand for Hypertext Transfer Protocol. These HTTP and HTTPS protocols manage server and browser communication. SSL and HTTP intermix herein. It is a systematic protocol for browser forms (sign in, validation, bank transactions).

SSH: Stands for Secure Shell and is very much similar to Telnet. It is terminal emulation software. The primary reason for preferring SSH is its encrypted connection. Moreover, it is a highly secure network.

NTP: Stands for Network Time Protocol. It harmonizes the clocks in the computer systems into a single standard time zone. NTP plays a crucial role in bank-oriented transactions.

Deployable for network-oriented problems.

The model allows communication between heterogeneous networks.

It is an open network protocol suite, which makes it available to an individual or an institute.

A scalable client-oriented architecture allows network additions without current services.

For every system on the network, there is an IP value.

Scope of TCP/IP Model:

In the communication world, the base unit is packets. These packets are built using TCP/IP protocols. Every operating system has several sole ethics coded keen on its functioning of the TCP/IP stack. OS fingerprinting works on this basis, By swot up these exclusive ethics, values like MTU and MSS. It has been whispered previously to identify the irregular; there is a need first to recognize what is usual.

Thus, TCP/IP is a powerful network communication protocol and program that allows access to remote terminals and computers through internet systems.

Recommended Articles

## A Simple Guide To Perform A Comprehensive Content Audit

No matter why you have a website, it needs to be filled with great content.

Without good content, you might as well not have a website at all.

But how exactly do you know when you have good content?

You might read through a piece of content and think it’s perfectly fine, but there’s a more reliable way of figuring it out.

If you’re wondering if your content is performing well, there’s a good chance it’s time for a content audit to check for sure.

By following the right steps, knowing what to look for, and what you’re hoping to get out of your content audit, you can look forward to creating a better website.

What Is a Content Audit?

At some point, every website will need a content audit.

A content audit gives you the opportunity to review closely all of the content on your website and evaluate how it’s working for you and your current goals.

This helps show you:

What content is good.

What needs to be improved.

What should just be tossed away.

What your content goals for the future should look like.

There are also some types of websites that are more in need of content audits than others.

If you have a relatively new website where all of your content is still fresh, you won’t really be in need of a content audit for a while.

Older sites have a lot more to gain from having a content audit done, as well as websites that have a large amount of content.

This makes websites like a news site a great contender for audits. The size of a website will also affect how often a content audit is necessary.

What Is the Purpose of Content Audits?

Content is known for being a great digital marketing investment because it will continue to work for you long into the future, but that doesn’t mean that it doesn’t require some upkeep from time to time.

What worked for your website at one point might not anymore, so it only makes sense to go back and review it.

Improve Organic Ranking

If you aren’t ranking highly, it could be a problem with your content.

Some of the content you have might not be SEO-friendly, and although it might be valuable content to have, there’s no way for it to rank highly.

If the content you have is already good, optimizing it to be more SEO-friendly can be a simple change that makes a big difference in your rankings.

Revitalize Older Content

Even the best content gets old at some point.

After a while, you might end up missing out on important keywords, having content with broken links, outdated information, among other issues.

If older content isn’t performing well, that doesn’t mean it can’t serve a new purpose for your website.

Giving new life to some of your older content can give you the same effect as having something totally brand new, without requiring you to put in the amount of work that an entirely new piece of content would need.

Get Rid of Irrelevant Content

Some content that’s great at the moment only benefits you for a short while.

While you might find older content on your website that can be updated to be more useful, sometimes it has just become irrelevant.

When this is the case, you don’t have to keep it if it’s only taking up space.

Eliminate Similar & Duplicate Content

In addition to unimportant pages, you can also find duplicate content to get rid of during a content audit.

Duplicate content can often occur by accident and wasn’t created to try and cheat the system, but regardless of why you have it, you can be penalized by search engines for it.

If you do find that you have extremely similar or duplicate content, but you can’t get rid of it, you can fix the problem by canonicalizing your preferred URL.

Plan for the Future

When you go through the content you currently have, you might end up seeing some gaps that need to be filled.

When you realize you’re missing out on important information and topics your audience needs, this is the time to make up for that.

You’ll be able to realize what’s lacking in your website to create more useful content in the future.

How to Perform a Content Audit

A content audit at first glance might seem likely simply reading through your website’s content, but there’s much more to it than that.

For an effective content audit, you’ll need to rely heavily on online tools to get the data you need.

So, before you get started with a content audit, it’s important to know exactly what you’ll need to be doing beforehand.

If you’re going through the effort of performing a content audit, you’re not doing it for nothing.

There must be some goal that’s driving you to do this.

Not everyone will have the same reason for having a content audit, although many of the reasons might seem similar, so what you’ll want to look for might vary.

2. Use Screaming Frog to Index Your URLs

One tool that you should always use during an audit is Screaming Frog.

This tool will allow you to create an inventory of the content you have on your website by gathering URLs from your sitemap.

If you have fewer than 500 pages to audit, you can even get away with using the free version.

This is one of the easiest ways of getting all of your content together to begin your content audit.

After you’ve made an inventory of your website’s content, you’ll need to see how it’s performing.

For this, Google Analytics can give you all the information you need.

This can give you valuable insights as to how people feel about your content, such as how long they stick around for it and how many pages they’re viewing per session.

The data you get from Google Analytics will make it easier for you to figure out what your next move will be.

After reviewing your findings, it might be clear what’s holding your content down.

The solution may not be obvious, but by looking closely at what your data tells you and researching, you can figure it out with a little bit of effort.

For example, if you have one great, high-quality piece of content that doesn’t get many views, it might just need to be updated slightly and reshared.

5. Make a Plan

Finally, you should figure out what the necessary changes will be and how you’ll go about making them.

If you have a long list of changes that need to be implemented, consider which ones are a priority and which ones can be fixed over time.

Planning for the future might include not just the changes to be made on existing content, but the arrangements for creating new content in the future.

Finals Thoughts

Content audits might seem intimidating, but they are key to making sure all of the content on your website is working for you and not against you.

Performing a content audit doesn’t mean that you’ve been making huge mistakes with your content.

Such an audit is simply maintenance that even websites with the best content need to do.

Getting into this can seem overwhelming, but with the right help, an audit will leave you feeling more confident in your content and will help guide your next steps.

More Resources:

## A Comprehensive Guide To Time Series Analysis And Forecasting

This article was published as a part of the Data Science Blogathon.

Time Series Analysis and Forecasting is a very pronounced and powerful study in data science, data analytics and Artificial Intelligence. It helps u changing time. For example, let us suppose you have visited a clinic due to some chest pain and want to get an (ECG) test done to see if your heart is healthy functioning. The ECG graph produced is a time-series data where your Heart Rate Variability (HRV) with respect to time is plotted, analysing which the doctor can suggest crucial measures to take care of your heart and reduce the risk of stroke or heart attacks. Time Series is used widely in healthcare analytics, geospatial analysis, weather forecasting, and to forecast the future of data that changes continuously with time!

What is Ti ries Analysis in Machine Learning?

Time-series analysis is the process of extracting useful information from time-series data to forecast and gain insights from it. It consists of a series of data that varies with time, hence continuous and non-static in nature. It may vary from hours to minutes and even seconds (milliseconds to microseconds). Due to its non-static and continuous nature, working with time-series data is indeed difficult even today!

As time-series data consists of a series of observations taken in sequences of time, it is entirely non-static in nature.

Time Series – Analysis Vs. Forecasting

Time series data analysis is the scientific extraction of useful information from time-series data to gather insights from it. It consists of a series of data that varies with time. It is non-static in nature. Likewise, it may vary from hours to minutes and even seconds (milliseconds to microseconds). Due to its continuous and non-static nature, working with time-series data is challenging!

As time-series data consists of a series of observations taken in sequences of time, it is entirely non-static in nature.

Time Series Analysis and Time Series Forecasting are the two studies that, most of the time, are used interchangeably. Although, there is a very thin line between this two. The naming to be given is based on analysing and summarizing reports from existing time-series data or predicting the future trends from it.

Thus, it’s a descriptive Vs. predictive strategy based on your time-series problem statement.

In a nutshell, time series analysis is the study of patterns and trends in a time-series data frame by descriptive and inferential statistical methods. Whereas, time series forecasting involves forecasting and extrapolating future trends or values based on old data points (supervised time-series forecasting), clustering them into groups, and predicting future patterns (unsupervised time-series forecasting).

The Time Series Integrants

Any time-series problem or data can be broken down or decomposed into several integrants, which can be useful for performing analysis and forecasting. Transforming time series into a series of integrants is called Time Series Decomposition.

A quick thing worth mentioning is that the integrants are broken further into 2 types-

1. Systematic — components that can be used for predictive modelling and occur recurrently. Level, Trend, and Seasonality come under this category.

2. Non-systematic — components that cannot be used for predictive modelling directly. Noise comes under this category.

The original time series data is hence split or decomposed into 5 parts-

1. Level — The most common integrant in every time series data is the level. It is nothing but the mean or average value in the time series. It has 0 variances when plotted against itself.

2. Trend — The linear movement or drift of the time series which may be increasing, decreasing or neutral. Trends are observable over positive(increasing) and negative(decreasing) and even linear slopes over the entire range of time.

3. Seasonality — Seasonality is something that repeats over a lapse of time, say a year. An easy way to get an idea about seasonality- seasons, like summer, winter, spring, and monsoon, which come and go in cycles throughout a specified period of time. However, in terms of data science, seasonality is the integrant that repeats at a similar frequency.

Note — If seasonality doesn’t occur at the same frequency, we call it a cycle. A cycle does not have any predefined and fixed signal or frequency is very uncertain, in terms of probability. It may sometimes be random, which poses a great challenge in forecasting.

4. Noise — A irregularity or noise is a randomly occurring integrant, and it’s optional and arrives under observation if and only if the features are not correlated with each other and, most importantly, variance is the similar across the series. Noise can lead to dirty and messy data and hinder forecasting, hence noise removal or at least reduction is a very important part of the time series data pre-processing stage.

5. Cyclicity — A particular time-series pattern that repeats itself after a large gap or interval of time, like months, years, or even decades.

The Time Series Forecasting Applications

Time series analysis and forecasting are done on automating a variety of tasks, such as-

Weather Forecasting

Anomaly Forecasting

Sales Forecasting

Stock Market Analysis

ECG Analysis

Risk Analysis

and many more!

Time Series Components Combinatorics

A time-series model can be represented by 2 methodologies-

When the time series trend is a linear relationship between integrants, i.e., the frequency (width) and amplitude(height) of the series are the same, the additive rule is applied.

Additive methodology is used when we have a time series where seasonal variation is linear or constant over timestamps.

It can be represented as follows-

y(t) or x(t) = level + trend + seasonality + noise

where the model y(multivariate) or x(univariate) is a function of time t.

The Multiplicative Methodology —

When the time series is not a linear relationship between integrants, then modelling is done following the multiplicative rule.

The multiplicative methodology is used when we have a time series where seasonal variation increases with time — which may be exponential or quadratic.

It is represented as-

y(t) or x(t)= Level * Trend * Seasonality * Noise

Deep-Dive into Supervised Time-Series Forecasting

Supervised learning is the most used domain-specific machine learning, and hence we will focus on supervised time series forecasting.

This will contain various detailed topics to ensure that readers at the end will know how to-

Load time series data and use descriptive statistics to explore it

Scale and normalize time series data for further modelling

Extracting useful features from time-series data (Feature Engineering)

Checking the stationarity of the time series to reduce it

ARIMA and Grid-search ARIMA models for time-series forecasting

Heading to deep learning methods for more complex time-series forecasting (LSTM and bi-LSTMs)

So without further ado, let’s begin!

Load Time Series Data and Use Descriptive Statistics to Explore it

For the easy and quick understanding and analysis of time-series data, we will work on the famous toy dataset named ‘Daily Female Births Dataset’.

import numpy import pandas import statmodels import matplotlib.pyplot as plt import seaborn as sns data = pd.read_csv(‘daily-total-female-births-in-cal.csv’, parse_dates = True, header = 0, squeeze=True) data.head()

This is the output we get-

1959–01–01 35 1959–01–02 32 1959–01–03 30 1959–01–04 31 1959–01–05 44 Name: Daily total female births in California, 1959, dtype: int64

Note —Remember, it is required to use ‘parse_dates’ because it converts dates to datetime objects that can be parsed, header=0 which ensures the column named is stored for easy reference, and squeeze=True which converts the data frame of single object elements into a scalar.

Exploring the Time-Series Data –

print(data.size) #output-365

(a) Carry out some descriptive statistics —

print(data.describe())

Output —

count 365.000000 mean 41.980822 std 7.348257 min 23.000000 25% 37.000000 50% 42.000000 75% 46.000000 max 73.000000

(b) A look at the time-series distribution plot —

pyplot.plot(series) pyplot.show() Scale and Normalize Time Series Data for Further Modelling

A normalized data scales the numeric features in the training data in the range of 0 and 1 so that gradient descent and loss optimization is fast and efficient and converges quickly to the local minima. Interchangeably known as feature scaling, it is crucial for any ML problem statement.

Let’s see how we can achieve normalization in time-series data.

For this purpose, let’s pick a highly fluctuating time-series data — the minimum daily temperatures data. Grab it here!

Let’s have a look at the extreme fluctuating nature of the data —

Source-DataMarket

To normalize a feature, Scikit-learn’s MinMaxScaler is too handy! If you want to generate original data points after prediction, an inverse_transform() function is also provided by this awesome built-in function!

Here goes the normalization code —

# import necessary libraries import pandas from sklearn.preprocessing import MinMaxScaler # load and sanity check the data data = read_csv(‘daily-minimum-temperatures-in-me.csv’, parse_dates = True, header = 0, squeeze=True, index_col=0) print(data.head()) #convert data into matrix of row-col vectors values = data.values values = values.reshape((len(values), 1)) # feature scaling scaler = MinMaxScaler(feature_range=(0, 1)) #fit the scaler with the train data to get min-max values scaler = scalar.fit(values) print(‘Min: %f, Max: %f’ % (scaler.data_min_, scaler.data_max_)) # normalize the data and sanity check normalized = scaler.transform(values) for i in range(5): print(normalized[i]) # inverse transform to obtain original values original_matrix= scaler.inverse_transform(normalized) for i in range(5): print(original_matrix[i])

Let’s have a look at what we got –

See how the values have scaled!

Note — In our case, our data does not have outliers present and hence a MinMaxScaler solves the purpose well. In the case where you have an unsupervised learning approach, and your data contains outliers, it is better to go for standardization, which is more robust than normalization, as normalization scales the data close to the mean which doesn’t handle or include outliers leading to a poor model. Standardization, on the other hand, takes large intervals with a standard deviation value of 1 and a mean of 0, thus outlier handling is robust.

More on that here!

Extracting Useful Features from Time-Series Data (Feature Engineering)

Framing data into a supervised learning problem simply deals with the task of handling and extracting useful features and discarding irrelevant features to make the model robust and cost-efficient.

We already know that supervised learning problems have 2 types of features — the independents (x) and dependent/target(y). Hence, how better the target value is achieved depends on how well we choose and engineer the independent features.

You must know by now that time-series data has two columns, timestamp, and its respective value. So, it is very self-explanatory that in the time series problem, the independent feature is time and the dependent feature is value.

Now let us look at what are the features that need to be engineered into these input and output values so that the inherent relationship between these two variables is established to make the forecasting as good as possible.

The features which are extremely important to model the relationship between the input and output variables in a time series are —

1. Descriptive Statistical Features — Quite straightforward as it sounds, calculating the statistical details and summary of any data is extremely important. Mean, Median, Standard Deviation, Quantiles, and min-max values. These come extremely handy while in tasks such as outlier detection, scaling and normalization, recognizing the distribution, etc.

2. Window Statistic Features — Window features are a statistical summary of different statistical operations upon a fixed window size of previous timestamps. There are, in general, 2 ways to extract descriptive statistics from windows. They are

(a) Rolling Window Statistics: The rolling window focuses on calculating rolling means or what we conventionally call Moving Average, and often other statistical operations. This calculates summary statistics (mostly mean) across values within a specific sliding window, and then we can assign these as features in our dataset.

Let, the mean at timestamp t-1 is x and t-2 be y, so we find the average of x and y to predict the value at timestamp t+1. The rolling window hence takes a mean of 2 values to predict the 3rd value. After that is done, the window shifts to the next set of values, and hence the mean is calculated for each window consisting of 2 values. We use rolling window statistics more often when the recent data is more important for forecasting and not previous data.

Let’s see how we can calculate moving or rolling average with a rolling window —

from pandas import DataFrame from pandas import concat df = DataFrame(data.values) tshifts = df.shift(1) rwin = tshifts.rolling(window=2) moving_avg = rwin.mean() joined_df = concat([moving_avg, df], axis=1) joined_df.columns = [‘mean(t-2,t-1)’, ‘t+1’] print(joined_df.head(5))

Let’s have a look at what we got —

(b) Expanding Window Statistics: Almost similar to the rolling window, expanding windows takes into account an extra habit of extracting the predicted value as well as all the previous observations, each time it expands. This is beneficial when the previous data is equally important for forecasting as well as the recent data.

Let’s have a quick look at expanding window code-

window = tshifts.expanding() joined_df2 = concat([rwin.mean(),df.shift(-1)], axis=1) joined_df2.columns = ['mean', 't+1'] print(joined_df2.head(5))

Let’s have a look at what we got -

3. Lag Features — Lag is simply predicting the value at timestamp t+1, provided we know the value at the previous timestamp, say, t-1. It’s simply distance or lag between two values at 2 different timestamps.

4. Datetime Features — This is simply the conversion of time into its specific components like a month, or day, along with the value of temperature for better forecasting. By doing this, we can gather specific information about the month and day at a particular timestamp for each record.

5. Timestamp Decomposition — Timestamp decomposition includes breaking down the timestamp into subset columns of timestamp for storing unique and special timestamps. Before Diwali or, say, Christmas, the sale of crackers and Santa-caps, fruit-cakes increases exponentially more than at other times of the year. So storing such a special timestamp by decomposing the original timestamp into subsets is useful for forecasting.

Time-series Data Stationary Checks

So, let’s first digest what stationary time-series data is!

Stationary, as the term suggests, is consistent. In time-series, the data if it does not contain seasonality or trends is termed stationary. Any other time-series data that has a specific trend or seasonality, are, thus, non-stationary.

Can you recall, that amongst the two time-series data we worked on, the childbirths data had no trend or seasonality and is stationary. Whereas, the average daily temperatures data, has a seasonality factor and drifts, and hence, it’s non-stationary and hard to model!

Stationarity in time-series is noticeable in 3 types —

(a) Trend Stationary — This kind of time-series data possesses no trend.

(b) Seasonality Stationary — This kind of time-series data possesses no seasonality factor.

(c) Strictly Stationary — The time-series data is strictly consistent with almost no variance to drifts.

Now that we know what stationarity in time series is, how can we check for the same?

Vision is everything. A quick visualization of your time-series data at hand can give a quick eye review of whether the data can be stationary or not. Next in the line comes the statistical summary. A clear look into the summary statistics of the data like min, max, variance, deviation, mean, quantiles, etc. can be very helpful to recognize drifts or shifts in data.

Lets POC this!

So, we take stationary data, which is the handy childbirths data we worked on earlier. However, for the non-stationary data, let’s take the famous airline-passenger data, which is simply the number of airline passengers per month, and prove how they are stationary and non-stationary.

Case 1 — Stationary Proof

import pandas as pd import matplotlib.pyplot as plt data = pd.read_csv(‘daily-total-female-births.csv’, parse_dates = True, header = 0, squeeze=True) data.hist() plt.show()

Output —

As I said, vision! Look how the visualization itself speaks that it’s a Gaussian Distribution. Hence, stationary!

More curious? Let’s get solid math proof!

X = data.values seq = round(len(X) / 2) x1, x2 = X[0:seq], X[seq:] meanx1, meanx2 = x1.mean(), x2.mean() varx1, varx2 = x1.var(), x2.var() print(‘meanx1=%f, meanx2=%f’ % (meanx1, meanx2)) print(‘variancex1=%f, variancex2=%f’ % (varx1, varx2))

Output —

meanx1=39.763736, meanx2=44.185792 variancex1=49.213410, variancex2=48.708651

The mean and variances linger around each other, which clearly shows the data is invariant and hence, stationary! Great.

Case 2— Non-Stationary Proof

import pandas as pd import matplotlib.pyplot as plt data = pd.read_csv(‘international-airline-passengers.csv’, parse_dates = True, header = 0, squeeze=True) data.hist() plt.show()

Output —

The graph pretty much gives a seasonal taste. Moreover, it is too distorted for a Gaussian tag. Let’s now quickly get the mean-variance gaps.

X = data.values seq = round(len(X) / 2) x1, x2 = X[0:seq], X[seq:] meanx1, meanx2 = x1.mean(), x2.mean() varx1, varx2 = x1.var(), x2.var() print(‘meanx1=%f, meanx2=%f’ % (meanx1, meanx2)) print(‘variancex1=%f, variancex2=%f’ % (varx1, varx2))

Output —

meanx1=182.902778, meanx2=377.694444 variancex1=2244.087770, variancex2=7367.962191

Alright, the value gap between mean and variances are pretty self-explanatory to pick the non-stationary kind.

ARMA, ARIMA, and SARIMAX Models for Time-Series Forecasting

A very traditional yet remarkable ‘machine-learning’ way of forecasting a time series is the ARMA (Auto-Regressive Moving Average) and Auto Regressive Integrated Moving Average Model commonly called ARIMA statistical models.

Other than these 2 traditional approaches, we have SARIMA (Seasonal Auto-Regressive Integrated Moving Average) and Grid-Search ARIMA, which we will see too!

So, let’s explore the models, one by one!

ARMA

The ARMA model is an assembly of 2 statistical models — the AR or Auto-Regressive model and Moving Average.

The Auto-Regressive Model estimates any dependent variable value y(t) at a given timestamp t on the basis of lags. Look at the formula below for a better understanding —

Here, y(t) = predicted value at timestamp t, α = intercept term, β = coefficient of lag, and, y(t-1) = time-series lag at timestamp t-1.

So α and β are the model estimators that estimate y(t).

The Moving Average Model plays a similar role, but it does not take the past predicted forecasts into account, as said earlier in rolling average. It rather uses the lagged forecast errors in previously predicted values to predict the future values, as shown in the formula below.

Let’s see how both the AR and MA models perform on the International-Airline-Passengers data.

AR model

AR_model = ARIMA(indexedDataset_logScale, order=(2,1,0)) AR_results = AR_model.fit(disp=-1) plt.plot(datasetLogDiffShifting) plt.plot(AR_results.fittedvalues, color='red') plt.title('RSS: %.4f'%sum((AR_results.fittedvalues - datasetLogDiffShifting['#Passengers'])**2))

The RSS or sum of squares residual is 1.5023 in the case of the AR model, which is kind of dissatisfactory as AR doesn’t capture non-stationarity well enough.

MA Model

MA_model = ARIMA(indexedDataset_logScale, order=(0,1,2)) MA_results = MA_model.fit(disp=-1) plt.plot(datasetLogDiffShifting) plt.plot(MA_results.fittedvalues, color='red') plt.title('RSS: %.4f'%sum((MA_results.fittedvalues - datasetLogDiffShifting['#Passengers'])**2))

The MA model shows similar results to AR, differing by a very small amount. We know our data is non-stationary, so let’s make this RSS score better by the non-stationarity handler AR+I+MA!

ARIMA

Along with the squashed use of the AR and MA model used earlier, ARIMA uses a special concept of Integration(I) with the purpose of differentiating some observations in order to make non-stationary data stationary, for better forecasting. So, it’s obviously better than its predecessor ARMA which could only handle stationary data.

What the differencing factor does is, that it takes into account the difference in predicted values between two timestamps (t and t+1, for example). Doing this helps in achieving a constant mean rather than a highly fluctuating ‘non-stationary’ mean.

Let’s fit the same data with ARIMA and see how well it performs!

ARIMA_model = ARIMA(indexedDataset_logScale, order=(2,1,2)) ARIMA_results = ARIMA_model.fit(disp=-1) plt.plot(datasetLogDiffShifting) plt.plot(ARIMA_results.fittedvalues, color='red') plt.title('RSS: %.4f'%sum((ARIMA_results.fittedvalues - datasetLogDiffShifting['#Passengers'])**2))

Great! The graph itself speaks how ARIMA fits our data in a well and generalized fashion compared to the ARMA! Also, observe how the RSS has dropped to 1.0292 from 1.5023 or 1.4721.

SARIMAX

Designed and developed as a beautiful extension to the ARIMA, SARIMAX or, Seasonal Auto-Regressive Integrated Moving Average with eXogenous factors is a better player than ARIMA in case of highly seasonal time series. There are 4 seasonal components that SARIMAX takes into account.

They are -

1. Seasonal Autoregressive Component

2. Seasonal Moving Average Component

3. Seasonal Integrity Order Component

4. Seasonal Periodicity

Source-Wikipedia

If you are more of a theory conscious person like me, do read more on this here, as getting into the details of the formula is beyond the scope of this article!

Now, let’s see how well SARIMAX performs on seasonal time-series data like the International-Airline-Passengers data.

from statsmodels.tsa.statespace.sarimax import SARIMAX SARIMAX_model=SARIMAX(train['#Passengers'],order=(1,1,1),seasonal_order=(1,0,0,12)) SARIMAX_results=SARIMAX_model.fit() preds=SARIMAX_results.predict(start,end,typ='levels').rename('SARIMAX Predictions') test['#Passengers'].plot(legend=True,figsize=(8,5)) preds.plot(legend=True)

Look how beautifully SARIMAX handles seasonal time series!

Heading to DL Methods for Complex Time-Series Forecasting

One of the very common features of time-series data is the long-term dependency factor. It is obvious that many time-series forecasting works on previous records (the future is forecasted based on previous records, which may be far behind). Hence, ordinary traditional machine learning models like ARIMA, ARMA, or SARIMAX are not capable of capturing long-term dependencies, which makes them poor guys in sequence-dependent time series problems.

To address such an issue, a massively intelligent and robust neural network architecture was proposed which can extraordinarily handle sequence dependence. It was known as Recurrent Neural Networks or RNN.

Source-Medium

RNN was designed to work on sequential data like time series. However, a very remarkable pitfall of RNN was that it couldn’t handle long-term dependencies. For a problem where you want to forecast a time series based on a huge number of previous records, RNN forgets the maximum of the previous records which occurred much earlier, and only learns sequences of recent data fed to its neural network. So, RNN was observed to not be up to the mark for NSP (Next Sequence Prediction) tasks in NLP and time series.

To address this issue of not capturing long-term dependencies, a powerful variant of RNN was developed, known as LSTM (Long Short Term Memory) Networks. Unlike RNN, which could only capture short-term sequences/dependencies, LSTM, as its name suggests was observed to learn long as well as short term dependencies. Hence, it was a great success for modelling and forecasting time series data!

Note — Since explaining the architecture of LSTM will be beyond the size of this blog, I recommend you to head over to my article where I explained LSTM in detail!

Let us now take our Airline Passengers’ data and see how well RNN and LSTM work on it!

Imports —

import numpy as np import pandas as pd import tensorflow as tf import matplotlib.pyplot as plt import sklearn.preprocessing from sklearn.metrics import r2_score from keras.layers import Dense, Dropout, SimpleRNN, LSTM from keras.models import Sequential

Scaling the data to make it stationary for better forecasting —

minmax_scaler = sklearn.preprocessing.MinMaxScaler() data['Passengers'] = minmax_scaler.fit_transform(data['Passengers'].values.reshape(-1,1)) data.head()

Scaled data —

Train, test splits (80–20 ratio) —

split = int(len(data[‘Passengers’])*0.8) x_train,y_train,x_test,y_test = np.array(x[:split]),np.array(y[:split]), np.array(x[split:]), np.array(y[split:]) #reshaping data to original shape x_train = np.reshape(x_train, (split, 20, 1)) x_test = np.reshape(x_test, (x_test.shape[0], 20, 1))

RNN Model —

Complie it, fit it and predict—

model.fit(x_train, y_train, epochs=15, batch_size=50) preds = model.predict(x_test)

Let me

Pretty much accurate!

LSTM Model —

Complie it, fit it and predict—

model.fit(x_train, y_train, epochs=15, batch_size=50) preds = model.predict(x_test)

Let me show you a picture of how well the model predicts —

Here, we can easily observe that RNN does the job better than LSTMs. As it is clearly seen that LSTM works great in training data but bad invalidation/test data, which shows a sign of overfitting!

Hence, try to use LSTM only where there is a need for long-term dependency learning otherwise RNN works good enough.

Conclusion

Cheers on reaching the end of the guide and learning pretty interesting kinds of stuff about Time Series. From this guide, you successfully learned the basics of time series, got a brief idea of the difference between Time Series Analysis and Forecasting subdomains of Time Series, a crisp mathematical intuition on Time Series analysis and forecasting techniques and explored how to work on Time Series problems in Machine Learning and Deep Learning to solve complex problems.

Hope you had fun exploring Time Series with Machine Learning and Deep Learning along with intuition! If you are a curious learner and want to “not” stop learning more, head over to this awesome notebook on time series provided by TensorFlow!

Feel free to follow me on Medium and GitHub for more articles and notebooks on Machine & Deep Learning! Connect with me on LinkedIn if you want to discuss anything regarding this article!

Happy Learning!

Related

## Demystifying Bert: A Comprehensive Guide To The Groundbreaking Nlp Framework

Overview

Google’s BERT has transformed the Natural Language Processing (NLP) landscape

Learn what BERT is, how it works, the seismic impact it has made, among other things

We’ll also implement BERT in Python to give you a hands-on learning experience

Introduction to the World of BERT

Picture this – you’re working on a really cool data science project and have applied the latest state-of-the-art library to get a pretty good result. And boom! A few days later, there’s a new state-of-the-art framework in town that has the potential to further improve your model.

That is not a hypothetical scenario – it’s the reality (and thrill) of working in the field of Natural Language Processing (NLP)! The last two years have been mind-blowing in terms of breakthroughs. I get to grips with one framework and another one, potentially even better, comes along.

Google’s BERT is one such NLP framework. I’d stick my neck out and say it’s perhaps the most influential one in recent times (and we’ll see why pretty soon).

It’s not an exaggeration to say that BERT has significantly altered the NLP landscape. Imagine using a single model that is trained on a large unlabelled dataset to achieve State-of-the-Art results on 11 individual NLP tasks. And all of this with little fine-tuning. That’s BERT! It’s a tectonic shift in how we design NLP models.

BERT has inspired many recent NLP architectures, training approaches and language models, such as Google’s TransformerXL, OpenAI’s GPT-2, XLNet, ERNIE2.0, RoBERTa, etc.

I aim to give you a comprehensive guide to not only BERT but also what impact it has had and how this is going to affect the future of NLP research. And yes, there’s a lot of Python code to work on, too!

Note: In this article, we are going to talk a lot about Transformers. If you aren’t familiar with it, feel free to read this article first – How do Transformers Work in NLP? A Guide to the Latest State-of-the-Art Models.

What is BERT?

From Word2vec to BERT: NLP’s quest for learning language representations

How Does BERT Work? A Look Under the Hood

Using BERT for Text Classification (Python Code)

Beyond BERT: Current State-of-the-Art in NLP

What is BERT?

You’ve heard about BERT, you’ve read about how incredible it is, and how it’s potentially changing the NLP landscape. But what is BERT in the first place?

Here’s how the research team behind BERT describes the NLP framework:

“BERT stands for Bidirectional Encoder Representations from Transformers. It is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of NLP tasks.”

That sounds way too complex as a starting point. But it does summarize what BERT does pretty well so let’s break it down.

First, it’s easy to get that BERT stands for Bidirectional Encoder Representations from Transformers. Each word here has a meaning to it and we will encounter that one by one in this article. For now, the key takeaway from this line is – BERT is based on the Transformer architecture.

Second, BERT is pre-trained on a large corpus of unlabelled text including the entire Wikipedia(that’s 2,500 million words!) and Book Corpus (800 million words).

This pre-training step is half the magic behind BERT’s success. This is because as we train a model on a large text corpus, our model starts to pick up the deeper and intimate understandings of how the language works. This knowledge is the swiss army knife that is useful for almost any NLP task.

Third, BERT is a “deeply bidirectional” model. Bidirectional means that BERT learns information from both the left and the right side of a token’s context during the training phase.

The bidirectionality of a model is important for truly understanding the meaning of a language. Let’s see an example to illustrate this. There are two sentences in this example and both of them involve the word “bank”:

BERT captures both the left and right context

If we try to predict the nature of the word “bank” by only taking either the left or the right context, then we will be making an error in at least one of the two given examples.

One way to deal with this is to consider both the left and the right context before making a prediction. That’s exactly what BERT does! We will see later in the article how this is achieved.

And finally, the most impressive aspect of BERT. We can fine-tune it by adding just a couple of additional output layers to create state-of-the-art models for a variety of NLP tasks.

From Word2Vec to BERT: NLP’s Quest for Learning Language Representations

“One of the biggest challenges in natural language processing is the shortage of training data. Because NLP is a diversified field with many distinct tasks, most task-specific datasets contain only a few thousand or a few hundred thousand human-labelled training examples.” – Google AI

Word2Vec and GloVe

The quest for learning language representations by pre-training models on large unlabelled text data started from word embeddings like Word2Vec and GloVe. These embeddings changed the way we performed NLP tasks. We now had embeddings that could capture contextual relationships among words.

These embeddings were used to train models on downstream NLP tasks and make better predictions. This could be done even with less task-specific data by utilizing the additional information from the embeddings itself.

One limitation of these embeddings was the use of very shallow Language Models. This meant there was a limit to the amount of information they could capture and this motivated the use of deeper and more complex language models (layers of LSTMs and GRUs).

Another key limitation was that these models did not take the context of the word into account. Let’s take the above “bank” example. The same word has different meanings in different contexts, right? However, an embedding like Word2Vec will give the same vector for “bank” in both the contexts.

That’s valuable information we are losing.

Enter ELMO and ULMFiT

ELMo was the NLP community’s response to the problem of Polysemy – same words having different meanings based on their context. From training shallow feed-forward networks (Word2vec), we graduated to training word embeddings using layers of complex Bi-directional LSTM architectures. This meant that the same word can have multiple ELMO embeddings based on the context it is in.

ULMFiT took this a step further. This framework could train language models that could be fine-tuned to provide excellent results even with fewer data (less than 100 examples) on a variety of document classification tasks. It is safe to say that ULMFiT cracked the code to transfer learning in NLP.

This is when we established the golden formula for transfer learning in NLP:

Transfer Learning in NLP = Pre-Training and Fine-Tuning

Most of the NLP breakthroughs that followed ULMFIT tweaked components of the above equation and gained state-of-the-art benchmarks.

OpenAI’s GPT

OpenAI’s GPT extended the methods of pre-training and fine-tuning that were introduced by ULMFiT and ELMo. GPT essentially replaced the LSTM-based architecture for Language Modeling with a Transformer-based architecture.

The GPT model could be fine-tuned to multiple NLP tasks beyond document classification, such as common sense reasoning, semantic similarity, and reading comprehension.

GPT also emphasized the importance of the Transformer framework, which has a simpler architecture and can train faster than an LSTM-based model. It is also able to learn complex patterns in the data by using the Attention mechanism.

OpenAI’s GPT validated the robustness and usefulness of the Transformer architecture by achieving multiple State-of-the-Arts.

And this is how Transformer inspired BERT and all the following breakthroughs in NLP.

Now, there were some other crucial breakthroughs and research outcomes that we haven’t mentioned yet, such as semi-supervised sequence learning. This is because they are slightly out of the scope of this article but feel free to read the linked paper to know more about it.

Moving onto BERT

So, the new approach to solving NLP tasks became a 2-step process:

Train a language model on a large unlabelled text corpus (unsupervised or semi-supervised)

Fine-tune this large model to specific NLP tasks to utilize the large repository of knowledge this model has gained (supervised)

With that context, let’s understand how BERT takes over from here to build a model that will become a benchmark of excellence in NLP for a long time.

How Does BERT Work? A Look Under the Hood

Let’s look a bit closely at BERT and understand why it is such an effective method to model language. We’ve already seen what BERT can do earlier – but how does it do it? We’ll answer this pertinent question in this section.

1. BERT’s Architecture

The BERT architecture builds on top of Transformer. We currently have two variants available:

Source

The BERT Base architecture has the same model size as OpenAI’s GPT for comparison purposes. All of these Transformer layers are Encoder-only blocks.

If your understanding of the underlying architecture of the Transformer is hazy, I will recommend that you read about it here.

Now that we know the overall architecture of BERT, let’s see what kind of text processing steps are required before we get to the model building phase.

2. Text Preprocessing

The developers behind BERT have added a specific set of rules to represent the input text for the model. Many of these are creative design choices that make the model even better.

For starters, every input embedding is a combination of 3 embeddings:

Position Embeddings: BERT learns and uses positional embeddings to express the position of words in a sentence. These are added to overcome the limitation of Transformer which, unlike an RNN, is not able to capture “sequence” or “order” information

Segment Embeddings: BERT can also take sentence pairs as inputs for tasks (Question-Answering). That’s why it learns a unique embedding for the first and the second sentences to help the model distinguish between them. In the above example, all the tokens marked as EA belong to sentence A (and similarly for EB)

Token Embeddings: These are the embeddings learned for the specific token from the WordPiece token vocabulary

For a given token, its input representation is constructed by summing the corresponding token, segment, and position embeddings.

Such a comprehensive embedding scheme contains a lot of useful information for the model.

These combinations of preprocessing steps make BERT so versatile. This implies that without making any major change in the model’s architecture, we can easily train it on multiple kinds of NLP tasks.

BERT is pre-trained on two NLP tasks:

Next Sentence Prediction

Let’s understand both of these tasks in a little more detail!

Need for Bi-directionality

BERT is designed as a deeply bidirectional model. The network effectively captures information from both the right and left context of a token from the first layer itself and all the way through to the last layer.

Traditionally, we had language models either trained to predict the next word in a sentence (right-to-left context used in GPT) or language models that were trained on a left-to-right context. This made our models susceptible to errors due to loss in information.

Predicting the word in a sequence

ELMo tried to deal with this problem by training two LSTM language models on left-to-right and right-to-left contexts and shallowly concatenating them. Even though it greatly improved upon existing techniques, it wasn’t enough.

“Intuitively, it is reasonable to believe that a deep bidirectional model is strictly more powerful than either a left-to-right model or the shallow concatenation of a left-to-right and a right-to-left model.” – BERT

That’s where BERT greatly improves upon both GPT and ELMo. Look at the below image:

The arrows indicate the information flow from one layer to the next. The green boxes at the top indicate the final contextualized representation of each input word.

It’s evident from the above image: BERT is bi-directional, GPT is unidirectional (information flows only from left-to-right), and ELMO is shallowly bidirectional.

This is where the Masked Language Model comes into the picture.

Let’s say we have a sentence – “I love to read data science blogs on Analytics Vidhya”. We want to train a bi-directional language model. Instead of trying to predict the next word in the sequence, we can build a model to predict a missing word from within the sequence itself.

Let’s replace “Analytics” with “[MASK]”. This is a token to denote that the token is missing. We’ll then train the model in such a way that it should be able to predict “Analytics” as the missing token: “I love to read data science blogs on [MASK] Vidhya.”

This is the crux of a Masked Language Model. The authors of BERT also include some caveats to further improve this technique:

To prevent the model from focusing too much on a particular position or tokens that are masked, the researchers randomly masked 15% of the words

So, the researchers used the below technique:

80% of the time the words were replaced with the masked token [MASK]

10% of the time the words were replaced with random words

10% of the time the words were left unchanged

I have shown how to implement a Masked Language Model in Python in one of my previous articles here:

b. Next Sentence Prediction

Masked Language Models (MLMs) learn to understand the relationship between words. Additionally, BERT is also trained on the task of Next Sentence Prediction for tasks that require an understanding of the relationship between sentences.

A good example of such a task would be question answering systems.

The task is simple. Given two sentences – A and B, is B the actual next sentence that comes after A in the corpus, or just a random sentence?

Since it is a binary classification task, the data can be easily generated from any corpus by splitting it into sentence pairs. Just like MLMs, the authors have added some caveats here too. Let’s take this with an example:

Consider that we have a text dataset of 100,000 sentences. So, there will be 50,000 training examples or pairs of sentences as the training data.

For 50% of the pairs, the second sentence would actually be the next sentence to the first sentence

For the remaining 50% of the pairs, the second sentence would be a random sentence from the corpus

The labels for the first case would be ‘IsNext’ and ‘NotNext’ for the second case

And this is how BERT is able to become a true task-agnostic model. It combines both the Masked Language Model (MLM) and the Next Sentence Prediction (NSP) pre-training tasks.

Implementing BERT for Text Classification in Python

One of the most potent ways would be fine-tuning it on your own task and task-specific data. We can then use the embeddings from BERT as embeddings for our text documents.

In this section, we will learn how to use BERT’s embeddings for our NLP task. We’ll take up the concept of fine-tuning an entire BERT model in one of the future articles.

For extracting embeddings from BERT, we will use a really useful open source project called Bert-as-Service:

Running BERT can be a painstaking process since it requires a lot of code and installing multiple packages. That’s why this open-source project is so helpful because it lets us use BERT to extract encodings for each sentence in just two lines of code.

Installing BERT-As-Service

BERT-As-Service works in a simple way. It creates a BERT server which we can access using the Python code in our notebook. Every time we send it a sentence as a list, it will send the embeddings for all the sentences.

We can install the server and client via pip. They can be installed separately or even on different machines:

pip install bert-serving-server

# server

pip install bert-serving-client

# client, independent of `bert-serving-server`

Also, since running BERT is a GPU intensive task, I’d suggest installing the bert-serving-server on a cloud-based GPU or some other machine that has high compute capacity.

Now, go back to your terminal and download a model listed below. Then, uncompress the zip file into some folder, say /tmp/english_L-12_H-768_A-12/.

Here’s a list of the released pre-trained BERT models:

Once we have all the files extracted in a folder, it’s time to start the BERT service:

bert-serving-start -model_dir uncased_L-12_H-768_A-12/ -num_worker=2 -max_seq_len 50

You can now simply call the BERT-As-Service from your Python code (using the client library). Let’s just jump into code!

Open a new Jupyter notebook and try to fetch embeddings for the sentence: “I love data science and analytics vidhya”.

View the code on Gist.

Here, the IP address is the IP of your server or cloud. This field is not required if used on the same computer.

The shape of the returned embedding would be (1,768) as there is only a single sentence which is represented by 768 hidden units in BERT’s architecture.

Problem Statement: Classifying Hate Speech on Twitter

Let’s take up a real-world dataset and see how effective BERT is. We’ll be working with a dataset consisting of a collection of tweets that are classified as being “hate speech” or not.

For the sake of simplicity, we say a tweet contains hate speech if it has a racist or sexist sentiment associated with it. So, the task is to classify racist or sexist tweets from other tweets.

We will use BERT to extract embeddings from each tweet in the dataset and then use these embeddings to train a text classification model.

Here is how the overall structure of the project looks like:

Let’s look at the code now:

You’ll be familiar with how most people tweet. There are many random symbols and numbers (aka chat language!). Our dataset is no different. We need to preprocess it before passing it through BERT:

Python Code:

﻿

Now that the dataset is clean, it’s time to split it into training and validation set:

View the code on Gist.

Let’s get the embeddings for all the tweets in the training and validation sets:

View the code on Gist.

It’s model building time! Let’s train the classification model:

View the code on Gist.

Check the classification accuracy:

View the code on Gist.

Even with such a small dataset, we easily get a classification accuracy of around 95%. That’s damn impressive.

In the next article, I plan to take a BERT model and fine-tune it fully on a new dataset and compare its performance.

Beyond BERT: Current State-of-the-Art in NLP

BERT has inspired great interest in the field of NLP, especially the application of the Transformer for NLP tasks. This has led to a spurt in the number of research labs and organizations that started experimenting with different aspects of pre-training, transformers and fine-tuning.

Many of these projects outperformed BERT on multiple NLP tasks. Some of the most interesting developments were RoBERTa, which was Facebook AI’s improvement over BERT and DistilBERT, which is a compact and faster version of BERT.