# Trending December 2023 # Intuition Behind Perceptron For Deep Learning # Suggested January 2024 # Top 17 Popular

You are reading the article Intuition Behind Perceptron For Deep Learning updated in December 2023 on the website Minhminhbmm.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested January 2024 Intuition Behind Perceptron For Deep Learning

Introduction

Perceptron is one of the most fundamental concepts of deep learning which every data scientist is expected to master. It is a supervised learning algorithm specifically for binary classifiers.

Note: If you are more interested in learning concepts in an Audio-Visual format, We have this entire article explained in the video below. If not, you may continue reading.

In this article, we will develop a solid intuition about Perceptron with the help of an example. Without any further delay, let’s begin!

Before you continue, I recommend you check out the following article-

Deep Learning 101: Beginners Guide to Neural Network

Understanding Perceptron with an Example

Let’s start with a simple example of a classification problem. Our aim here is to predict whether the loan should be approved or not, depending on the salary of a person.

In order to do that will need to build a model that takes the salary of the person as an input and predicts whether the loan should be approved for the person or not.

Suppose your bank wants to reduce the risk of loan default and hence decides to roll out loans to only such individuals who have a monthly salary of 50000 and above.

In this case, we want our model to learn to check whether the salary input which is represented as X here, is greater than 50000 or not. Here are the tasks we want our model to perform-

The first thing is it should take in the salary as input.

Next, it has to check whether the given salaries are greater than 50,000 or not.

And if the salary’s more than 50,000 only then give output as “Yes”.

Effectively, this model takes in some input, processes it, and generates an output. This is similar to what happens in a biological neuron:

It takes the input to the dendrite, processes the provided information, and generates the output. You can see the similarity right? Thus the model that we’re talking about can also be called a Neuron.

Now coming back to our loan example. Let’s have a closer look at each of these tasks, starting from the input. We have a single input which is salary but in general, we can have multiple features just like the applicant’s salary, his/her father’s salary, spouse’s salary that can be deciding factors to approve the Loan. And our neurons will take in all of these features as input in order to make decisions. This is to say that the neuron will have multiple inputs. This is similar to the multiple dendrites that we saw in the biological neuron.

Now that there are multiple salary features about the person we’ll take all of them into account as they represent the total income of the household. We can sum them up and check if the total income of the household crosses the threshold or not.

So, the Total Income = Applicant Salary(X1) + Father Salary(X2) + Spouse Salary(X3)

We need to compare this Total Income with the Threshold. Here is the equation representing the same:

We have X1, X2, and X3 as input features and we want to check if their sum crosses the particular threshold, which is 50000 in our case. Now if you bring this threshold to the left side of the equation it will become something like this:

and if we represent this whole quantity which is “- threshold” with a new term Bias, the updated equation would look something like:

this will have the sum of four quantities which are X1, X2, and X3, and note that the bias is actually “- threshold”.

Now this quantity which is Bias, although we have selected it arbitrarily here, it is actually something that neuron learn from the underlined data. If the input exceeds the magnitude of the bias, we want the neuron to give the output as “YES”. That means the loan can be approved by this person. This event is known as the firing of a neuron. If want to write this relationship using equations, we can use the  following equations:

We will say that output should be 1 when this equation is true and output should be 0 in all other cases. These two equations can be represented in the form of a function. Let’s see how:

So here we have the sum of the features X1, X2, X3, and bias represented as Z and we want our output to be 1 if the Z is greater than 0, otherwise 0.

So we can use a Step Function here and this is the graph of the step function:

It basically gives us the output 1 for any value greater than zero and gives an output 0 for any value less than zero. So in order to find output, we will apply the step function on Z here. We have denoted this step function as following;

This step function in this case is used to scale the output of the neuron and in Deep Learning we have an option of choosing such functions to apply to the output of the neurons. They are known as Activation Functions. So when we use the step function as the activation function for a neuron it is called a Perceptron.

End Notes

In this article, we saw how a perceptron model works. It takes in multiple features like applicant salary, father salary, etc. as input and checks whether the sum of these, which is the total income of the household, exceeds the threshold or not. If it does only then it will give an output as one, which means the loan should be approved, otherwise the step function will give an output as zero.

If you are looking to kick start your Data Science Journey and want every topic under one roof, your search stops here. Check out Analytics Vidhya’s Certified AI & ML BlackBelt Plus Program.

Related

You're reading Intuition Behind Perceptron For Deep Learning

## Learning The Basics Of Deep Learning, Chatgpt, And Bard Ai

Introduction

Artificial Intelligence is the ability of a computer to work or think like humans. So many Artificial Intelligence applications have been developed and are available for public use, and chatGPT is a recent one by Open AI.

ChatGPT is an artificial intelligence model that uses the deep model to produce human-like text. It predicts the next word in a text based on the patterns it has learned from a large amount of data during its training process. Bard AI is too AI chatbot launched by google and uses recent work so can work to answer real-time questions.

We will discuss chatGPT and Bard AI and the difference between them.

Learning Objectives

1. Understanding the Deep Learning Model and chatGPT.

2. To understand the difference between chatGPT and Bard.

This article was published as a part of the Data Science Blogathon.

Understanding the Deep Learning Model

Artificial Intelligence is a broad term in today’s world to do everything and behave like a human. When we talk about the algorithm, we are, in other words, talking about a subset of Artificial Intelligence, Machine learning.

Machine learning algorithms look at the past behavior of humans and predict it based on past behavior. When we go further deep, some patterns are adapted or learned themselves when the situation is different. “Deep Learning” further deep algorithms, following the footsteps of neural networks.

“Deep Learning Algorithm” is classified into two Supervised and Unsupervised. “Supervised Learning” is divided into Convolutional Neural Network (CNN) and Recurrent neural networks.

In supervised learning, the data given in input is labeled data. In Unsupervised learning, the data is unlabeled and works by finding patterns and similarities.

Artificial Neural Network (ANN)

Similarly, like a human brain, an input layer, one or more hidden layers, and an output layer make up the node layers of artificial neural networks (ANNs). There is a weight and threshold associated with each artificial neuron or node. When a node’s output exceeds a predetermined threshold, it is activated and sends data to the next layer. Otherwise, no data reaches the next layer.

After an input layer, weights get added. Larger weights contribute more to the output than other inputs. The mass of the input layer gets multiplied, and then the results are added up. After that, the output result is by the activation function, which decides what to do with it. The node is activated if that output exceeds a certain threshold, transmitting data to the next layer. As a result, the input layer of the next layer consists of the output return of the past one and is thus named feed-forward.

Let’s say that three factors influence our decision, and one of the questions is if there is a rainy day tomorrow, and if the answer is Yes, it is one, and if the response is no is 0.

Another question will there be more traffic tomorrow? Yes-1, No -0.

The last question is if the beachside will be good for a picnic. Yes-1, No-0.

We get the following responses.

where

– X1 – 0,

– X2 – 1,

– X3 – 1

Once the input is assigned, we look forward to applying weight. As the day is not rainy, we give the mass as 5. For traffic, we gave it as 2, and for a picnic as 4.

W1 – 5

W2 – 2

W3 – 4

The weight signifies the importance. If the weight is more, it is of the most importance. Now we take the threshold as 3. The bias will be the opposite value of the threshold -3.

y= (5*0)+(1*2)+(1*4)-3 = 3.

Output is more than zero, so the result will be one on activation. Changes in the weights or threshold can result in different returns. Similarly, neural networks make changes depending on the results of past layers.

For example, you want to classify images of cats and dogs.

The image of a cat or dog is the input to the neural network’s first layer.

After that, the input data pass through one or more hidden layers of many neurons. After receiving inputs from the layer before it, each neuron calculates and sends the result to the next layer. When determining which characteristics, the shape of the ears or the patterns, set apart cats from dogs, the neurons in the hidden layers apply weights and biases to the inputs.

The probability distribution of the two possible classes, cat and dog, is the return for final layers, and prediction ranks higher than probability.

Updating weights and biases is termed backpropagation, and it improves at the time in pattern recognition and prediction accuracy.

Facial Recognization by Deep Learning

We will use animal faces to detect digitally based on a convolutional.

from tensorflow.keras.models import Sequential from tensorflow.keras.layers import * from tensorflow.keras.models import Model from tensorflow.keras.applications import InceptionV3 from tensorflow.keras.layers import Dropout, Flatten, Dense, Input from tensorflow.keras.preprocessing.image import ImageDataGenerator import numpy import pandas import matplotlib.pyplot as plt import matplotlib.image as mpimg import pickle from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report import patoolib patoolib.extract_archive('animals.zip') plt.imshow(image) train_data = ImageDataGenerator(rescale = 1./255) test_data = ImageDataGenerator(rescale = 1./255) train_dir= ("C://Users//ss529/Anaconda3//Animals//train") val_dir = ("C://Users//ss529/Anaconda3//Animals//val") train_generator = train_data.flow_from_directory( train_dir, target_size =(150, 150), batch_size = 20, class_mode ='binary') test_generator = test_data.flow_from_directory( val_dir, target_size =(150, 150), batch_size = 20, class_mode ='binary') from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense,Flatten model = Sequential() model.add(Flatten(input_shape=(150, 150,3))) model.add(Dense(4,activation='sigmoid')) model.add(Dense(5,activation='relu')) model.add(Dense(3,activation='softmax')) model.summary() opt = tf.keras.optimizers.Adam(0.001) model.fit(train_generator,epochs=5, validation_data= test_generator) What is ChatGPT?

An up-to-date Artificial Intelligence chatbot, trained by Open AI, developed on Azure that answers your queries, except for mistakes, corrects the code, and can reject unnecessary demands. It depends on a Generative pre-trained transformer equipment GPT 3.5, which uses Artificial or complex work to approach and make out with words.

ChatGPT, which stands for chat-based Generative Pre-trained transformer, is a potent tool that works in different ways to increase output in several distinct areas.

ChatGPT is intelligent to solve simple math problems and answer query-related technical or even some jokes.

For example, the image below shows some funny jokes generated by AI.

In another example, the image below shows to find the area of a triangle with the help of AI.

How to Use ChatGPT?

Here we are going to answer some questions related to chatGPT.

“ChatGPT Plus” is a paid subscription plan. It gives priority access to new features, faster response times, and reliable availability when demand is high.

For example, I asked some business and idea tips on Data Science, and here is the response provided by chatGPT in the below image.

Why Should we Use chatGPT?

chatGPT can give you the best services based on how you want to use a chatbot for your benefit.

It can write for your document or reports.

It is possible to save time and allow messages straight given and professionally by using ChatGPT to generate personalized and engaging responses.

It can help generate new business ideas that assist business leaders and entrepreneurs with original and creative concepts for new projects, schemes, and services.

ChatGPT can come in handy for detection and correction in existing code.

Limitations Of ChatGPT

ChatGPT does not so far shows 100% accuracy.

For example,  for the question about Male Rao Holkar’s death, the response from chatGPT is not similar to the history.

Edward Tiann, a 22 years old student from Princeton University, developed the GPTZero application that can detect plagiarism with the contents texted by AI. It is so far for educational use, and the beta version is ready for public use.

What is Bard AI?

LaMDA (Language Model for Dialogue Applications) powers Bard, an experimental conversation AI service. To respond to queries in a new and high-quality way, it uses data from the Internet.

How does Bard function?

LaMDA, a large language model created by Google and released in 2023, powers Bard. Bard is made available by Google on a thin-weight version of LaMDA, which requires less computing power to run, allowing it to reach a maximum number of users.

The Difference Between ChatGPT and Bard

Google Bard AI and chatGPT are the chatbots that use AI for a chat.

ChatGPT is available and open to the public. Bard is limited to beta testers and not for public use.

For chatGPT service has paid and free options. Bard service is available for free.

Bard uses the langauge model developed by google in 2023 and that of chatGPT, a pre-trained transformer.

ChatGPT has a GPT -2 Output detector that detects plagiarism, and Bard has not.

ChatGPT will search for texts and sources that did exist in 2023. Bard on recent sources that can fetch more data. The Google search engine will undergo some settings to let Bard AI answer.

Q1. What algorithm does the ChatGPT use?

A. ChatGPT is built on the GPT-3.5 architecture, which utilizes a transformer-based deep learning algorithm. The algorithm leverages a large pre-trained language model that learns from vast amounts of text data to generate human-like responses. It employs attention mechanisms to capture contextual relationships between words and generate coherent and contextually relevant responses.

Q2. How is ChatGPT programmed?

A. ChatGPT is programmed using a combination of techniques. It is built upon a deep learning architecture called GPT-3.5, which employs transformer-based models. The programming involves training the model on a massive amount of text data, fine-tuning it for specific tasks, and implementing methods for input processing, context management, and response generation. The underlying programming techniques involve natural language processing, deep learning frameworks, and efficient training and inference pipelines.

Conclusion

ChatGPT is a new chatbot AI that surprised the world with its unique features to answer, solve problems, and detect mistakes.

Some of the key points we learned here

ChatGPT, a new chatbot developed by Open AI, is the new google. For the question’s answers, we usually searched on google to find the answer can be done now on chatGPT, but still, it has less than 100% accuracy.

ChatGPT works on deep learning models.

Brad AI, developed by google in competition with chatGPT, will soon reach the public.

We will use animal faces to detect digitally based on a convolutional.

Related

## Music Genres Classification Using Deep Learning Techniques

This article was published as a part of the Data Science Blogathon

Introduction:

In this blog, we will discuss the classification of music files based on the genres. Generally, people carry their favorite songs on smartphones. Songs can be of various genres. With the help of deep learning techniques, we can provide a classified list of songs to the smartphone user. We will apply deep learning algorithms to create models, which can classify audio files into various genres. After training the model, we will also analyze the performance of our trained model.

Dataset:

We will use GITZAN dataset, which contains 1000 music files. Dataset has ten types of genres with uniform distribution. Dataset has the following genres: blues, classical, country, disco, hiphop, jazz, reggae, rock, metal, and pop. Each music file is 30 seconds long.

Process Flow:

Figure 01 represents the overview of our methodology for the genre classification task. We will discuss each phase in detail. We train three types of deep learning models to explore and gain insights from the data.

Fig. 01

First, we need to convert the audio signals into a deep learning model compatible format. We use two types of formats, which are as follows:

1. Spectrogram generation:

A spectrogram is a visual representation of the spectrum signal frequencies as it varies with time. We use librosa library to transform each audio file into a spectrogram. Figure 02 shows spectrogram images for each type of music genre.

Fig. 02

2. Wavelet generation: –

The Wavelet Transform is a transformation that can be used to analyze the spectral and temporal properties of non-stationary signals like audio. We use librosa library to generate wavelets of each audio file. Figure 03 shows wavelets of each type of music genre.

Fig. 03

3, 4. Spectrogram and Wavelet preprocessing

From Figure 02 and 03, it is clear that we treat our data as image data. After generating spectrograms and wavelets, we apply general image preprocessing steps to generate training and testing data. Each image is of size (256, 256, 3).

5. Basic CNN model training:

After preprocessing the data, we create our first deep learning model. We construct a Convolution Neural Network model with required input and out units. The final architecture of our CNN model is shown in Figure 04. We use only spectrogram data for the training and testing.

Fig. 04

We train our CNN model for 500 epochs with Adam optimizer at a learning rate of 0.0001. We use categorical cross-entropy as the loss function. Figure 05 shows the training and validation losses and model performance in terms of accuracy.

Fig. 05

6. Transfer learning-based model training

We have only 60 samples of each genre for training. In this case, transfer learning could be a useful option to improve the performance of our CNN model. Now, we use the pre-trained mobilenet model to train the CNN model. A schematic architecture is shown in Figure 06.

Fig. 06

The transfer learning-based model is trained with the same settings as used in the previous model. Figure 07 shows the training and validation loss and model performance in terms of accuracy. Here, also we use only spectrogram data for the training and testing.

Fig. 07

7. Multimodal training

We will pass both spectrogram and wavelet data into the CNN model for the training in this experiment. We are using the late-fusion technique in this multi-modal training. Figure 08 represents the architecture of our multi-modal CNN model. Figure 09 shows the loss and performance scores of the model with respect to epochs.

Fig. 08 Fig. 09

Comparison:

Figure 10 shows a comparative analysis of the loss and performance of all three models. If we analyze the training behavior of all three models, we found that the basic CNN model has large fluctuations in its loss values and performance scores for training and testing data. The multimodal model has shown the least variance in performance. Transfer learning model performance increases gradually compared to multimodal and basic CNN models. Validation loss value shot up suddenly after the 30 epochs. On the other side, validation loss decreases continuously for the other two models.

Fig. 10

Testing the models

After training our models, we test each model on the 40% test data. We calculate precision, recall, and F-score for each music genre (class). Our dataset is balanced; therefore, the macro average and weighted average of precision, recall, and F-score are the same.

1. Basic CNN model

Figure 11 presents the results of our CNN model on the test data. CNN model was able to classify “classical” genre music with the highest F1-score. CNN performed worst for “Rock” and “reggae” genre music. Figure 12 shows the confusion matrix of the CNN model on the test data.

Fig. 11

Fig. 12

2. Transfer learning based model

We used the transfer learning technique to improve the performance of genre classification. Figure 13 presents the results of the transfer learning-based model on test data. F1-score for “hiphop”, “jazz”, and “pop” genres increased due to transfer learning. If we look at overall results, we have achieved only a minor improvement after applying transfer learning. Figure 14 shows the confusion matrix for the transfer learning model on the test data.

Fig. 13

Fig. 14

3. Multimodal-based model: We have used both spectrogram and wavelet data to train the multimodal-based model. In the same way, we perform the testing. We have found very surprising results. Instead of improvement, our performance reduced drastically. We have achieved only 38% of F1-score while using a multi-modal approach. Figure 16 shows the confusion matrix of the multimodal-based model on the test data.

Fig. 15 Fig. 16

Conclusion:

In this post, we have performed music genre classification using Deep learning techniques. The transfer learning-based model has performed best among all three models. We have used the Keras framework for the implementation on the google Collaboratory platform. Source code is available at the following GitHub link along with spectrogram and wavelet data on google drive. You don’t need to generate spectrograms and wavelets from the audio files.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

Related

## A Basic Introduction To Activation Function In Deep Learning

This article was published as a part of the Data Science Blogathon.

Introduction

The activation function is defined as follows: The activation function calculates a weighted total and then adds bias to it to decide whether a neuron should be activated or not. The Activation Function’s goal is to introduce non-linearity into a neuron’s output.

A Neural Network without an activation function is basically a linear regression model in Deep Learning, since these functions perform non-linear computations on the input of a Neural Network, enabling it to learn and do more complex tasks. Therefore, studying the derivatives and application of activation functions, also as analysing the pros and drawbacks of each activation function, is essential for selecting the proper type of activation function that may give non-linearity and accuracy in a particular Neural Network model.

We know that neurons in a Neural Network work following their weight, bias, and activation function. We would change the weights and biases of the neurons in a Neural Network based on the output error. Back-propagation is the term for this process. Because the gradients are supplied simultaneously with the error to update the weights and biases, activation functions, therefore, enable back-propagation.

Why Do We Need Activation Functions in CNN?

Variants Of Activation Function

Python Code Implementation

Conclusion

Why do we need it?

Non-linear activation functions: Without an activation function, a Neural Network is just a linear regression model. The activation function transforms the input in a non-linear way, allowing it to learn and as well as accomplish more complex tasks.

Mathematical proof:-

The diagram’s elements include:- A hidden layer, i.e. layer 1:- A visible layer, i.e. layer 2:- A visible layer, i.e.

a(1) = z(1)

= W(1)X + b(1) (1)

Here,

Layer 1’s vectorized output is z(1).

W(1) denotes the vectorized weights (w1, w2, w3, and w4) applied to hidden layer neurons, X denotes the vectorized input features (i1 and i2), and b denotes the vectorized bias (b1 and b2).

Any linear function has a(1) being vectorized form.

(Note that the activation function is not taken into consideration here.)

The output layer, or layer 2, is as follows:

Layer 2 input is z(2) = W(2)a(1) + b(2) a(2) = z(2)a(1) + b(2) a(2) = z(2)a(1) + b(2) a(2) = z(2)a(1) + b(2) a(2) = z(2)a(1) + b(2) a(2) (2)

Output layer calculation:

lue of z(1).

(W(2) * [W(1)X + b(1)]) + b = (W(2) * [W(1)X + b(1)]) (2)

[W(2) * W(1)] = z(2) * [W(2)*b(1) + b(2)] + X

[W(2) * W(1)] = z(2) * [W(2)*b(1) + b(2)] + X

Let,

W = [W(2) * W(1)]

b = [W(2)*b(1) + b(2)]

z(2) = W*X + b is the final result.

Which is a linear function once again.

Even after applying a hidden layer, this observation yields a linear function, hence we can deduce that no matter how many hidden layers we add to a Neural Network, all layers will behave the same way because the combination of two linear functions yields a linear function.

1). Linear Function: –

• Equation: The equation for a linear function is y = ax, which is very much similar to the equation for a straight line.

• -inf to +inf range

• Applications: The linear activation function is only used once, in the output layer.

Source: V7labs

• Problems: If we differentiate a linear function to introduce non-linearity, the outcome will no longer be related to the input “x” and the function will become constant, hence our procedure will not show any behaviour.

For example, determining the price of a home is a regression problem. Because the price of an apartment might be a large or little number, we can employ linear activation at the output layer. Even in this case, any non-linear function at the hidden layers of the Neural Network is required.

2) The sigmoid function:

• It’s a function that is being plotted in the form of ‘S’ Shape.

• Formula: A = 1/(1 + e-x)

• Non-linear in nature. The values of X ranges from -2 to 2, but the Y values are highly steep. This indicates that slight changes in x will result in massive changes in Y’s value.

• 0 to 1 value of the range

3). Tanh Function: Tanh function, also identified as Tangent Hyperbolic function, is an activation that almost always works better than sigmoid function. It’s simply a sigmoid function that has been adjusted. Both are related and can be deduced from one another.

• Equation: f(x) = tanh(x) = 2/(1 + e-2x) – 1 OR tanh(x) = 2 * sigmoid(2x) – 1 OR tanh(x) = 2 * sigmoid(2x) – 1

• Range of values: -1 to +1

• Uses:- Usually employed in hidden layers of a Neural Network since its values change from -1 to 1, causing the hidden layer’s mean to be 0 or very close to it, which aids in data centring by bringing the mean close to 0. This makes learning the next layer much more straight.

4). RELU (Rectified linear unit) is the fourth letter in the alphabet. It’s the most used activation method. Hidden layers of neural networks are primarily used.

• Formula: A(x) = maximum (0,x). If x is positive, it returns x; else, it returns 0.

• Value Range: (inf, 0)

• Non-linear in nature, which means simply backpropagating errors and also having the ReLU function activating many layers of neurons.

• Applications: Because it includes fewer mathematical operations, ReLu is less computationally expensive than tanh and sigmoid. Only a few neurons are active at a time, making the network scarce and efficient for computation.

Simply put, the RELU function learns much faster than the sigmoid and Tanh functions.

5). Softmax Function: The softmax function is a type of sigmoid function that comes in handy when dealing with categorization issues.

• Non-linearity in nature

• Uses: Typically utilised when dealing with many classes. The softmax function would divide by the sum of the outputs and squeeze the outp

• Output: The softmax function is best used in the classifier’s output layer, where we’re trying to define the class of each input using probabilities.

Selecting The Right Activation Function

If one is unsure about the activation function to utilise, just select RELU, which is a broad activation function that is used in most circumstances these days. If our output layer is meant to be used for binary identification/detection, the sigmoid function is an obvious choice.

import numpy as np #designing the function for sigmoid

fig, ax = plt.subplots(figsize=(9, 5))

#Axis spines are basically the lines that are confining the given plot area ax.spines['left'].set_position('center') ax.spines['right'].set_color('none') ax.spines['top'].set_color('none') axxaxis.set_ticks_position('bottom') ax.yaxis.set_ticks_position('left') # Create and show the sigmoid plot

ax.plot(x,sigmoid(x), color=”#9621E2″, linewidth=3, label=”derivative”)

#placing the legend on the upper right corner of the axes ax.legend(loc="upper right", frameon=False) fig.show()

Output, Source: Author

Conclusion

Read more blogs on Analytics Vidhya.

My name is Pranshu Sharma and I am a Data Science Enthusiast

Related

## A Guide On Deep Learning: From Basics To Advanced Concepts

This article was published as a part of the Data Science Blogathon

Why do we need Deep Learning?

The problem

In traditional programming, the programmer formulates rules and conditions in their code that their program can then use to predict or classify in the correct manner. For example, normally to build a classifier, a programmer comes up with a set of rules, and program those rules. Then when you want to classify things, you give it a piece of data, and rules are used to select the category. This approach might be successful for a variety of problems. But remember NOT ALL! How? Let’s see.

Image classification is one such problem that cannot be resolved using the traditional programming method. Imagine how would you write the code for image classification? How could a programmer write rules and regulations to predict images they have never seen before?

Solution Found! What is it?

Deep learning is the solution to the above problem. It is a very efficient technique in pattern recognition by trial and error. With the help of a deep neural network, we can train the network by providing a huge amount of data and providing feedback on its performance, the network can identify, through a huge amount of iteration, its own set of conditions by which it can act in the correct way.

A SIMPLE MODEL: WITH ONE NEURON

Let’s begin understanding the neural network with the simplest model that can be made. The neuron is usually represented by circles with one arrow pointing outward and one inward.

Here, in the above diagram, ‘m’ is the slope of the line, ‘x’ is the input value and ‘b’ is the constant, for the equation, y = m.x + b.

Note the difference between ‘y_hat’ and ‘y’ variables. ‘y_hat’ is the prediction that we have made using the equation called estimated value whereas ‘y’ is the actual value or true value.

Let’s see how this simple model works! We will start assigning random values to ‘m’ (usually between -1 and 1) and ‘b’, and then calculate ‘y_hat’ with the ‘x’ as input values.

In the above figure, (x, y_hat) is plotted and a regression line is drawn. Since ‘m’ and ‘b’ were given random values, there is supposed to be an error in the estimated value from the true value. Thus, the error is calculated as the average of the square of the difference between the expected value and true value i.e.

Here MSE (mean squared error) is being used but one can also use RMSE i.e. root mean squared error, which is nothing but the root of MSE.

MSE basically indicates how far is the expected value from the target/ true value. If MSE is plotted on a 3D graph, it looks like the image given below:

The aim of any model is to minimize the error. Hence, the minimum of the curve is at the bottom of the curve. In order to reach there, we have to use calculus for that.  Gradient Descent is the method for finding out the minimum of this loss curve. It looks like the following graphs:

So how it works? The computer doesn’t have an eyeball like humans which does not have the ability to guestimate the path to reach the target variable from the current position. It calculates the gradient, which is the fancy word to describe multivariate slope which figures out the direction of slope, which is some ratio of the variable ‘b’ to ‘m’.

Now the direction to travel is known. The next step is to determine how far it has to travel to reach the target point. This is known as the ‘learning rate’. Usually, the learning rate is not taken to be a bigger value as it will mean that one is moving away from the target. A too-small learning rate is also a problem since it will take a longer time for computation. There are a few ways to measure the progress of the algorithm. There are epochs which mean a model update with the full dataset. Instead of epochs, nowadays people use a more efficient technique that is batch which means the sample of the full dataset. In either case, the step is either the full dataset or a batch, calculate the gradient, and update the parameters with the gradient and learning rate. Thus, the process is continued till the target value is achieved. This is how gradient descent works!

BUILDING A NETWORK: FROM NEURONS TO NETWORK

in the earlier example, only one input was considered for the neuron. Now, more inputs can be considered for one neuron. From now onwards, instead of having single x input to the neuron with slope ‘m’, multiple inputs (i.e. x1,x2,x3…so on) will be considered with their respective weights (i.e. w1, w2, w3…so on).  Not only this, there is one more interesting thing! The output of one neuron can be used as input to another neuron. This is applicable as long as it does not make a loop. When an error is calculated from new weights using gradient descent an error calculated from the later neuron can be used as part of the error of the previous neuron it’s connected to.

Gotcha! Now is the right time to introduce the concept of the Activation Function here.

ACTIVATION FUNCTIONS

There are only three types of activation functions i.e. Linear, ReLU, and Sigmoid.

Linear Activation Function is one of the basic activation functions. This is used widely because it is fast, as it is linear in nature and hence easy to differentiate. The graph and equation of this function look like the images shown below:

One easy way to add non linearity is to feed the equation of a line into another non-linear function. The most popular function is ReLU stands for Rectified Linear Unit. It is one of the fancy ways of saying that whenever the output of the line is negative, it is going to set it to 0. It can be represented as follows:

Another activation function is the Sigmoid function. This is an ‘S’ plot curve that goes from 0 to 1. This is the most prominent function and is widely in Logistic Regression. The result of this function gives the probability which has to lie between 0 to 1.

OVERFITTING

Why shouldn’t the programmer make a big neural network and solve every problem?  Can one think of the pitfalls of having a large model with lots of neurons?

One problem is that there will be a lot of wastage of computational resources, and the other could be the longer time to train the model. Ugh! There is one more point.

The problem is related to classical statistics. Let’s say, two models have been built, as shown below:

Which of the two models proves to be the best? One that is located on the left side. Why? look at the RMSE value. The left model has a better RMSE value than the right one. But that is not true! Why? Let’s see.

What if we get new sample data that the model has not already seen by the model? The left model would give a very poor result i.e. the RMSE will be very high as compared to the right side model. This is because the left side model has memorized the points and thus provided excellent RMSE during training. On the other hand, the right-side model is more generalized. Thus, the left hand side model is said to be the overfitted model. Thus, the choice depends on the Data Scientist to choose the type of model suitable with respect to their needs.

Now let’s create one project to understand the concepts much better :

Project Name: American Sign Language Dataset

Objectives:

1. Prepare the image data and use it for training the model

2. Create and compile the basic model for image classification

Project begins:

American Sign Language Dataset is a very common dataset and can be found on Kaggle for practice. It consists of 26 letters out of which the letters ‘j’ and ‘k’ are not considered in the training because they require movement. Their classification is beyond the scope of this guide.

Reading in the Data: The sign dataset, that has been downloaded from Kaggle, is in the CSV (stands for comma-separated value) format. CSV file has rows and columns which consists of labels mentioned at the top. One can check the difference between the CSV file and XLSX file by opening them in the text (.txt) format on a usual notepad. In the CSV file, the values in a row are separated by commas.

import pandas as pd train_df = pd.read_csv("sign_mnist_train.csv") valid_df = pd.read_csv("sign_mnist_valid.csv") Exploring the Data

Now it’s time to visualize the data. The data can be visualized by using the head method of the pandas data frame. Each row has some integer values which are nothing but the image pixel’s intensity. The data has a column labeled mentioned which refers to the true value of each image.

Output:

Extracting the Labels

Now, the training and validation labels will be stored in a variable called y_train and y_label variables. The code for their construction can be referred to below:

y_train = train_df[‘label’]

y_valid = valid_df['label'] del train_df['label'] del valid_df['label']

Extracting the Images

Now, previously labels were stored in the variable. Now, the training and validation images dataset will be stored in a variable called x_train and x_valid respectively.

x_train = train_df.values x_valid = valid_df.values

Summarizing the Training and Validation Data

Now, the program has 27,455 images with 784 pixels each for training…

x_train.shape

…as well as their corresponding labels:

y_train.shape

For validation, it has 7,172 images…

x_valid.shape

…and their corresponding labels:

y_valid.shape

Visualizing the Data

To visualize the images, now use the matplotlib library. Here there is no need to worry about the details of this visualization.

Note that data will have to be reshaped from its current 1D shape of 784 pixels to a 2D shape of 28×28 pixels to make sense of the image:

mport matplotlib.pyplot as plt plt.figure(figsize=(40,40)) ​num_images = 20 for i in range(num_images): row = x_train[i] label = y_train[i] image = row.reshape(28,28) plt.subplot(1, num_images, i+1) plt.title(label, fontdict={'fontsize': 30}) plt.axis('off') plt.imshow(image, cmap='gray') Normalize the Image Data

Deep learning models are much better and very efficient at dealing with floating-point numbers between 0 and 1. The Conversion of integer values to floating-point values between 0 and 1 is called normalization. Normalization is a very essential concept that one should be aware of.

Now it’s time to normalize the image data, meaning that their pixel values, instead of being between 0 and 255 as they are currently:

x_train.min()

x_train.max()

…should be floating-point values between 0 and 1. It is coded as follows:

x_train = x_train / 255 x_valid = x_valid / 255 Categorize the Labels

What is categorical encoding?

Consider the case, if someone asks, what is 7-2? If you said that digit 4 is much closer to the answer than digit 9. Unfortunately, this does not happen in neural networks used for image classification. The neural networks should not have this kind of reasoning and it should clearly make a difference that guessing the image as 4, which is actually 5, is equivalently bad as guessing 9.

We can see that images are labeled between 0 to 9. Since these are numerical values, the model thus created should try to categorize them perfectly into ten categories.

Since it was previously mentioned that labels are values, it is required for them to get converted into categories. For this, category encoding is used, which modifies the data so that each label belongs to its possible category. The keras.utils.to_categorical method will be used to accomplish this.

import tensorflow.keras as keras num_classes = 24 y_train = keras.utils.to_categorical(y_train, num_classes) y_valid = keras.utils.to_categorical(y_valid, num_classes)

Build the Model

The data is all prepared, we have normalized images for training and validation, as well as categorically encoded labels for training and validation.

With the training data that we prepared earlier, it is now time to create the model that will train with the data. This first basic model will be made up of several layers and will be comprised of 3 main parts:

An input layer, the layer which receives all data as input

Several hidden layers, each made up of many neurons. Each neuron will have its weight associated with it such that it will affect the performance and accuracy of the model.

An output layer, which will depict the network’s guess for a given image

The units argument specifies the number of neurons in the layer. The activation function concept has already been taught above.

from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense model = Sequential() model.add(Dense(units = 512, activation='relu', input_shape=(784,))) model.add(Dense(units = 512, activation='relu')) model.add(Dense(units = num_classes, activation='softmax'))

Summarizing the Model

The below code will give the way to summarize the model:

model.summary()

Compiling the Model

Time to compile our model with the same options, using categorical cross-entropy to ensure that we want to fit into one of many categories, and measure the accuracy of our model:

Train the Model

Use the model’s fit method to train it for 20 epochs using the training and validation images and labels created above:

model.fit(x_train, y_train, epochs=20, verbose=1, validation_data=(x_valid, y_valid))

Now one can see the training of the model for 20 epochs. The accuracy turns out to be 97% approximately which is quite good.

So, this was all about Deep learning techniques. I hope you enjoyed the guide and have been able to give a head start with deep learning.

All the best! Stay tuned for more articles and guide on Analytics Vidhya!

Related

What is the point of Twitter? The 11-year-old microblogging platform is a social network, a broadcasting tool, a public relations platform, a joke incubator, and a news aggregator. It’s a daunting medium, but with the help of a little AI, it doesn’t have to be. At least, that’s the premise of Post Intelligence, a social media assistant tool launched this week by a pair of former Google executives.

To this end, a user signs into Post Intelligence with a social media account, and within two minutes it scoops up a user’s data, and then suggests what sorts of things a user might want to share online. In my conversation with Reddy and in my own experience, I focused primarily on Twitter, but the tool is also configured for Facebook, and could be expanded to other social networks.

I was intrigued. So, in preparation for the launch, I decided to turn my Twitter over to Post Intelligence, and see how, exactly, an AI could help me Tweet. I set a few rules for myself: for the two-day trial period, I would only using PI to tweet during work hours. I would keep this up for the two days, and I couldn’t let anyone besides my editor know that this was what I was doing. I’ve been on Twitter for a long time, and have developed what I like to think of as a somewhat distinct voice, so I was curious to see what changed.

Screenshot of a draft tweet in the Post Intelligence console.

In the Post Intelligence console, users can draft tweets, add media, schedule a time, and then see a prediction score ranking for how well that tweet will do. This tweet is just a 2 out of 10.

The short version of the experiment is that Post Intelligence told me to tweet less. The day before I started the experiment, I sent 44 tweets. The first day of the experiment, PI recommended I tweet just 4 times (I ultimately tweeted 7, adding a few others through the tool). It recommended I tweet at 2:30, 3:30, 7:00, and 10:00pm, and when I asked it to schedule a fifth tweet, it put it at 3:00. One of the neat tools in PI is a prediction score, where it looks at the words and attached images or links to a tweet, and gives a score from 1 to 10 on how well it thinks that tweet will do. PI preferred the straightforward description for a story about a comet to my dated meme description for a tweet about anchors.

The second day, I leaned more into the suggestions. A couple gaps in PI’s processing were immediately apparent. It recommended I share tweets from a couple different accounts that I’d muted, and even let me schedule a retweet of a post from an account that I knew had me blocked. (That tweet did not go through, so it looks like Twitter’s own blocking tools caught it before it went live). Instead, I shared suggested tweets from people outside my normal feed, which I might not have seen otherwise, and had about the same level of engagement as if I’d shared from within my normal timeline.

For my second day, too, PI recommended I tweet just four time a say, which was a frequency I matched back when I was posting tweets via text message from a flip-phone. In that respect, the scheduling was a nice break: I felt like I was broadcasting observations on the world, rather than living and breathing with the pulse of a social network every second that news happened.

Which brought me to the first major understanding of what Post Intelligence does in practice. It’s a tool for those new to Twitter, and those with limited time to spend on tweets, to broadcast thoughts into the general news stream as it happens. But it’s not a great tool for interacting with others. Whenever someone replied to one of my tweets, there was no way to see that through the PI interface, and so no way to respond directly.

A graph plotting tweets by success and sentiment

Tweets in green are those evaluated as positive, red and negative, and blue as somewhere in-between. On the x-axis is engagement with the tweet, measured by retweets, likes, and replies.

When I asked Reddy about mentions and notifications in our call before my trial, she suggested it as a possible future feature for PI. Without notifications, PI offers feedback on a few different metrics: first, there’s the likes and retweets of sent tweets themselves, displayed below each published tweet in a column in PI, just like they are on the Twitter app itself. And then there’s a whole analytics section, tracking Follower Growth, a Word Cloud, a Relationship Graph, Posting Patterns, and Sentiment. Sentiment is by far the most interesting, as it breaks tweets down into either “positive” or “negative” (with some falling in-between) and then displays a graph of how well tweets of each type performs.

“’Trump is a very funny guy, haha.’ Is that a negative sentiment or a positive one?,” says Reddy. To tackle sentiment, Post Intelligence has their own API to try and infer context. It’s a task that’s hard for AI and for people, too. “That’s something that social media struggles with, when I’m being sarcastic, people think I’m being literal. If you’re being tongue-in-cheek, people take it literally.”

In my brief trial, it wasn’t sentiment that tripped me up, but just the lack of interaction with followers. A joke made in a moment loses potency the next day, and “I’m sorry, it was funny, but I was testing a tool for work” isn’t the greatest excuse for answering a question a day late.

Still, I think there’s value to a tool like PI, especially for people who aren’t glued to the internet for over eight hours every day. The freedom to plan a day’s tweets in five minutes, with automatically supplied topical content, meant I could focus my attention elsewhere, confident that my online presence was intact.

“Twitter is very addicting, and it is very important, even as a company it may be only worth a few billion dollars,” says Reddy, “but it’s really important to the culture of humanity, in some way I know that’s a strong way to say, it’s proven itself as recently as November 9th, it can change the world. I think more people want to do well on it but don’t, because it’s just so difficult to do well on it.”

Viewed as the only way to experience Twitter, Post Intelligence is a little underwhelming, but as a tool to get into Twitter, without needing to spend hours a day following the news looking for good enough jokes and news to share, Post Intelligence makes a pretty good set of training wheels.

Update the detailed information about Intuition Behind Perceptron For Deep Learning on the Minhminhbmm.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!