Trending February 2024 # Music Genres Classification Using Deep Learning Techniques # Suggested March 2024 # Top 6 Popular

You are reading the article Music Genres Classification Using Deep Learning Techniques updated in February 2024 on the website We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested March 2024 Music Genres Classification Using Deep Learning Techniques

This article was published as a part of the Data Science Blogathon


In this blog, we will discuss the classification of music files based on the genres. Generally, people carry their favorite songs on smartphones. Songs can be of various genres. With the help of deep learning techniques, we can provide a classified list of songs to the smartphone user. We will apply deep learning algorithms to create models, which can classify audio files into various genres. After training the model, we will also analyze the performance of our trained model.


We will use GITZAN dataset, which contains 1000 music files. Dataset has ten types of genres with uniform distribution. Dataset has the following genres: blues, classical, country, disco, hiphop, jazz, reggae, rock, metal, and pop. Each music file is 30 seconds long.

Process Flow:

Figure 01 represents the overview of our methodology for the genre classification task. We will discuss each phase in detail. We train three types of deep learning models to explore and gain insights from the data.

Fig. 01

First, we need to convert the audio signals into a deep learning model compatible format. We use two types of formats, which are as follows:

1. Spectrogram generation:

A spectrogram is a visual representation of the spectrum signal frequencies as it varies with time. We use librosa library to transform each audio file into a spectrogram. Figure 02 shows spectrogram images for each type of music genre.

Fig. 02

2. Wavelet generation: –

The Wavelet Transform is a transformation that can be used to analyze the spectral and temporal properties of non-stationary signals like audio. We use librosa library to generate wavelets of each audio file. Figure 03 shows wavelets of each type of music genre.

Fig. 03

3, 4. Spectrogram and Wavelet preprocessing

From Figure 02 and 03, it is clear that we treat our data as image data. After generating spectrograms and wavelets, we apply general image preprocessing steps to generate training and testing data. Each image is of size (256, 256, 3).

5. Basic CNN model training:

 After preprocessing the data, we create our first deep learning model. We construct a Convolution Neural Network model with required input and out units. The final architecture of our CNN model is shown in Figure 04. We use only spectrogram data for the training and testing.

Fig. 04

We train our CNN model for 500 epochs with Adam optimizer at a learning rate of 0.0001. We use categorical cross-entropy as the loss function. Figure 05 shows the training and validation losses and model performance in terms of accuracy.

Fig. 05

6. Transfer learning-based model training

We have only 60 samples of each genre for training. In this case, transfer learning could be a useful option to improve the performance of our CNN model. Now, we use the pre-trained mobilenet model to train the CNN model. A schematic architecture is shown in Figure 06.

Fig. 06

The transfer learning-based model is trained with the same settings as used in the previous model. Figure 07 shows the training and validation loss and model performance in terms of accuracy. Here, also we use only spectrogram data for the training and testing.

Fig. 07

7. Multimodal training

We will pass both spectrogram and wavelet data into the CNN model for the training in this experiment. We are using the late-fusion technique in this multi-modal training. Figure 08 represents the architecture of our multi-modal CNN model. Figure 09 shows the loss and performance scores of the model with respect to epochs.

Fig. 08 Fig. 09


Figure 10 shows a comparative analysis of the loss and performance of all three models. If we analyze the training behavior of all three models, we found that the basic CNN model has large fluctuations in its loss values and performance scores for training and testing data. The multimodal model has shown the least variance in performance. Transfer learning model performance increases gradually compared to multimodal and basic CNN models. Validation loss value shot up suddenly after the 30 epochs. On the other side, validation loss decreases continuously for the other two models.

Fig. 10

Testing the models

 After training our models, we test each model on the 40% test data. We calculate precision, recall, and F-score for each music genre (class). Our dataset is balanced; therefore, the macro average and weighted average of precision, recall, and F-score are the same.

1. Basic CNN model

 Figure 11 presents the results of our CNN model on the test data. CNN model was able to classify “classical” genre music with the highest F1-score. CNN performed worst for “Rock” and “reggae” genre music. Figure 12 shows the confusion matrix of the CNN model on the test data.

Fig. 11

Fig. 12

2. Transfer learning based model

We used the transfer learning technique to improve the performance of genre classification. Figure 13 presents the results of the transfer learning-based model on test data. F1-score for “hiphop”, “jazz”, and “pop” genres increased due to transfer learning. If we look at overall results, we have achieved only a minor improvement after applying transfer learning. Figure 14 shows the confusion matrix for the transfer learning model on the test data.

Fig. 13

Fig. 14

3. Multimodal-based model: We have used both spectrogram and wavelet data to train the multimodal-based model. In the same way, we perform the testing. We have found very surprising results. Instead of improvement, our performance reduced drastically. We have achieved only 38% of F1-score while using a multi-modal approach. Figure 16 shows the confusion matrix of the multimodal-based model on the test data.

Fig. 15 Fig. 16


In this post, we have performed music genre classification using Deep learning techniques. The transfer learning-based model has performed best among all three models. We have used the Keras framework for the implementation on the google Collaboratory platform. Source code is available at the following GitHub link along with spectrogram and wavelet data on google drive. You don’t need to generate spectrograms and wavelets from the audio files.

GitHub Link. . Spectrogram and wavelets data link.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.


You're reading Music Genres Classification Using Deep Learning Techniques

Intuition Behind Perceptron For Deep Learning


Perceptron is one of the most fundamental concepts of deep learning which every data scientist is expected to master. It is a supervised learning algorithm specifically for binary classifiers.

Note: If you are more interested in learning concepts in an Audio-Visual format, We have this entire article explained in the video below. If not, you may continue reading.

In this article, we will develop a solid intuition about Perceptron with the help of an example. Without any further delay, let’s begin!

Before you continue, I recommend you check out the following article-

Deep Learning 101: Beginners Guide to Neural Network

Understanding Perceptron with an Example

Let’s start with a simple example of a classification problem. Our aim here is to predict whether the loan should be approved or not, depending on the salary of a person.

In order to do that will need to build a model that takes the salary of the person as an input and predicts whether the loan should be approved for the person or not.

Suppose your bank wants to reduce the risk of loan default and hence decides to roll out loans to only such individuals who have a monthly salary of 50000 and above.

In this case, we want our model to learn to check whether the salary input which is represented as X here, is greater than 50000 or not. Here are the tasks we want our model to perform-

The first thing is it should take in the salary as input.

Next, it has to check whether the given salaries are greater than 50,000 or not.

And if the salary’s more than 50,000 only then give output as “Yes”.

Effectively, this model takes in some input, processes it, and generates an output. This is similar to what happens in a biological neuron:

It takes the input to the dendrite, processes the provided information, and generates the output. You can see the similarity right? Thus the model that we’re talking about can also be called a Neuron.

Now coming back to our loan example. Let’s have a closer look at each of these tasks, starting from the input. We have a single input which is salary but in general, we can have multiple features just like the applicant’s salary, his/her father’s salary, spouse’s salary that can be deciding factors to approve the Loan. And our neurons will take in all of these features as input in order to make decisions. This is to say that the neuron will have multiple inputs. This is similar to the multiple dendrites that we saw in the biological neuron.

Now that there are multiple salary features about the person we’ll take all of them into account as they represent the total income of the household. We can sum them up and check if the total income of the household crosses the threshold or not.

So, the Total Income = Applicant Salary(X1) + Father Salary(X2) + Spouse Salary(X3)

We need to compare this Total Income with the Threshold. Here is the equation representing the same:

We have X1, X2, and X3 as input features and we want to check if their sum crosses the particular threshold, which is 50000 in our case. Now if you bring this threshold to the left side of the equation it will become something like this:

and if we represent this whole quantity which is “- threshold” with a new term Bias, the updated equation would look something like:

this will have the sum of four quantities which are X1, X2, and X3, and note that the bias is actually “- threshold”.

Now this quantity which is Bias, although we have selected it arbitrarily here, it is actually something that neuron learn from the underlined data. If the input exceeds the magnitude of the bias, we want the neuron to give the output as “YES”. That means the loan can be approved by this person. This event is known as the firing of a neuron. If want to write this relationship using equations, we can use the  following equations:

We will say that output should be 1 when this equation is true and output should be 0 in all other cases. These two equations can be represented in the form of a function. Let’s see how:

So here we have the sum of the features X1, X2, X3, and bias represented as Z and we want our output to be 1 if the Z is greater than 0, otherwise 0.

So we can use a Step Function here and this is the graph of the step function:

It basically gives us the output 1 for any value greater than zero and gives an output 0 for any value less than zero. So in order to find output, we will apply the step function on Z here. We have denoted this step function as following;

This step function in this case is used to scale the output of the neuron and in Deep Learning we have an option of choosing such functions to apply to the output of the neurons. They are known as Activation Functions. So when we use the step function as the activation function for a neuron it is called a Perceptron.

End Notes

In this article, we saw how a perceptron model works. It takes in multiple features like applicant salary, father salary, etc. as input and checks whether the sum of these, which is the total income of the household, exceeds the threshold or not. If it does only then it will give an output as one, which means the loan should be approved, otherwise the step function will give an output as zero.

If you are looking to kick start your Data Science Journey and want every topic under one roof, your search stops here. Check out Analytics Vidhya’s Certified AI & ML BlackBelt Plus Program.


Learning The Basics Of Deep Learning, Chatgpt, And Bard Ai


Artificial Intelligence is the ability of a computer to work or think like humans. So many Artificial Intelligence applications have been developed and are available for public use, and chatGPT is a recent one by Open AI.

ChatGPT is an artificial intelligence model that uses the deep model to produce human-like text. It predicts the next word in a text based on the patterns it has learned from a large amount of data during its training process. Bard AI is too AI chatbot launched by google and uses recent work so can work to answer real-time questions.

We will discuss chatGPT and Bard AI and the difference between them.

Learning Objectives

1. Understanding the Deep Learning Model and chatGPT.

2. To understand the difference between chatGPT and Bard.

This article was published as a part of the Data Science Blogathon.

Understanding the Deep Learning Model

Artificial Intelligence is a broad term in today’s world to do everything and behave like a human. When we talk about the algorithm, we are, in other words, talking about a subset of Artificial Intelligence, Machine learning.

Machine learning algorithms look at the past behavior of humans and predict it based on past behavior. When we go further deep, some patterns are adapted or learned themselves when the situation is different. “Deep Learning” further deep algorithms, following the footsteps of neural networks.

“Deep Learning Algorithm” is classified into two Supervised and Unsupervised. “Supervised Learning” is divided into Convolutional Neural Network (CNN) and Recurrent neural networks.

In supervised learning, the data given in input is labeled data. In Unsupervised learning, the data is unlabeled and works by finding patterns and similarities.

Artificial Neural Network (ANN)

Similarly, like a human brain, an input layer, one or more hidden layers, and an output layer make up the node layers of artificial neural networks (ANNs). There is a weight and threshold associated with each artificial neuron or node. When a node’s output exceeds a predetermined threshold, it is activated and sends data to the next layer. Otherwise, no data reaches the next layer.

After an input layer, weights get added. Larger weights contribute more to the output than other inputs. The mass of the input layer gets multiplied, and then the results are added up. After that, the output result is by the activation function, which decides what to do with it. The node is activated if that output exceeds a certain threshold, transmitting data to the next layer. As a result, the input layer of the next layer consists of the output return of the past one and is thus named feed-forward.

Let’s say that three factors influence our decision, and one of the questions is if there is a rainy day tomorrow, and if the answer is Yes, it is one, and if the response is no is 0.

Another question will there be more traffic tomorrow? Yes-1, No -0.

The last question is if the beachside will be good for a picnic. Yes-1, No-0.

We get the following responses.


– X1 – 0,

– X2 – 1,

– X3 – 1

Once the input is assigned, we look forward to applying weight. As the day is not rainy, we give the mass as 5. For traffic, we gave it as 2, and for a picnic as 4.

W1 – 5

W2 – 2

W3 – 4

The weight signifies the importance. If the weight is more, it is of the most importance. Now we take the threshold as 3. The bias will be the opposite value of the threshold -3.

y= (5*0)+(1*2)+(1*4)-3 = 3.

Output is more than zero, so the result will be one on activation. Changes in the weights or threshold can result in different returns. Similarly, neural networks make changes depending on the results of past layers.

For example, you want to classify images of cats and dogs.

The image of a cat or dog is the input to the neural network’s first layer.

After that, the input data pass through one or more hidden layers of many neurons. After receiving inputs from the layer before it, each neuron calculates and sends the result to the next layer. When determining which characteristics, the shape of the ears or the patterns, set apart cats from dogs, the neurons in the hidden layers apply weights and biases to the inputs.

The probability distribution of the two possible classes, cat and dog, is the return for final layers, and prediction ranks higher than probability.

Updating weights and biases is termed backpropagation, and it improves at the time in pattern recognition and prediction accuracy.

Facial Recognization by Deep Learning

We will use animal faces to detect digitally based on a convolutional.

from tensorflow.keras.models import Sequential from tensorflow.keras.layers import * from tensorflow.keras.models import Model from tensorflow.keras.applications import InceptionV3 from tensorflow.keras.layers import Dropout, Flatten, Dense, Input from tensorflow.keras.preprocessing.image import ImageDataGenerator import numpy import pandas import matplotlib.pyplot as plt import matplotlib.image as mpimg import pickle from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report import patoolib patoolib.extract_archive('') plt.imshow(image) train_data = ImageDataGenerator(rescale = 1./255) test_data = ImageDataGenerator(rescale = 1./255) train_dir= ("C://Users//ss529/Anaconda3//Animals//train") val_dir = ("C://Users//ss529/Anaconda3//Animals//val") train_generator = train_data.flow_from_directory( train_dir, target_size =(150, 150), batch_size = 20, class_mode ='binary') test_generator = test_data.flow_from_directory( val_dir, target_size =(150, 150), batch_size = 20, class_mode ='binary') from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense,Flatten model = Sequential() model.add(Flatten(input_shape=(150, 150,3))) model.add(Dense(4,activation='sigmoid')) model.add(Dense(5,activation='relu')) model.add(Dense(3,activation='softmax')) model.summary() opt = tf.keras.optimizers.Adam(0.001),epochs=5, validation_data= test_generator) What is ChatGPT?

An up-to-date Artificial Intelligence chatbot, trained by Open AI, developed on Azure that answers your queries, except for mistakes, corrects the code, and can reject unnecessary demands. It depends on a Generative pre-trained transformer equipment GPT 3.5, which uses Artificial or complex work to approach and make out with words.

ChatGPT, which stands for chat-based Generative Pre-trained transformer, is a potent tool that works in different ways to increase output in several distinct areas.

ChatGPT is intelligent to solve simple math problems and answer query-related technical or even some jokes.

For example, the image below shows some funny jokes generated by AI.

In another example, the image below shows to find the area of a triangle with the help of AI.

How to Use ChatGPT?

Here we are going to answer some questions related to chatGPT.

Anyone can use ChatGPT for free. One can sign up and log in using google or email. The free version of ChatGPT is open to the general as of the writing date of February 2023.

“ChatGPT Plus” is a paid subscription plan. It gives priority access to new features, faster response times, and reliable availability when demand is high.

For example, I asked some business and idea tips on Data Science, and here is the response provided by chatGPT in the below image.

Why Should we Use chatGPT?

chatGPT can give you the best services based on how you want to use a chatbot for your benefit.

It can write for your document or reports.

It is possible to save time and allow messages straight given and professionally by using ChatGPT to generate personalized and engaging responses.

It can help generate new business ideas that assist business leaders and entrepreneurs with original and creative concepts for new projects, schemes, and services.

ChatGPT can come in handy for detection and correction in existing code.

Limitations Of ChatGPT

ChatGPT does not so far shows 100% accuracy.

For example,  for the question about Male Rao Holkar’s death, the response from chatGPT is not similar to the history.

Edward Tiann, a 22 years old student from Princeton University, developed the GPTZero application that can detect plagiarism with the contents texted by AI. It is so far for educational use, and the beta version is ready for public use.

What is Bard AI?

LaMDA (Language Model for Dialogue Applications) powers Bard, an experimental conversation AI service. To respond to queries in a new and high-quality way, it uses data from the Internet.

How does Bard function?

LaMDA, a large language model created by Google and released in 2023, powers Bard. Bard is made available by Google on a thin-weight version of LaMDA, which requires less computing power to run, allowing it to reach a maximum number of users.

The Difference Between ChatGPT and Bard

Google Bard AI and chatGPT are the chatbots that use AI for a chat.

ChatGPT is available and open to the public. Bard is limited to beta testers and not for public use.

For chatGPT service has paid and free options. Bard service is available for free.

Bard uses the langauge model developed by google in 2023 and that of chatGPT, a pre-trained transformer.

ChatGPT has a GPT -2 Output detector that detects plagiarism, and Bard has not.

ChatGPT will search for texts and sources that did exist in 2023. Bard on recent sources that can fetch more data. The Google search engine will undergo some settings to let Bard AI answer.

Frequently Asked Questions

Q1. What algorithm does the ChatGPT use?

A. ChatGPT is built on the GPT-3.5 architecture, which utilizes a transformer-based deep learning algorithm. The algorithm leverages a large pre-trained language model that learns from vast amounts of text data to generate human-like responses. It employs attention mechanisms to capture contextual relationships between words and generate coherent and contextually relevant responses.

Q2. How is ChatGPT programmed?

A. ChatGPT is programmed using a combination of techniques. It is built upon a deep learning architecture called GPT-3.5, which employs transformer-based models. The programming involves training the model on a massive amount of text data, fine-tuning it for specific tasks, and implementing methods for input processing, context management, and response generation. The underlying programming techniques involve natural language processing, deep learning frameworks, and efficient training and inference pipelines.


ChatGPT is a new chatbot AI that surprised the world with its unique features to answer, solve problems, and detect mistakes.

Some of the key points we learned here

ChatGPT, a new chatbot developed by Open AI, is the new google. For the question’s answers, we usually searched on google to find the answer can be done now on chatGPT, but still, it has less than 100% accuracy.

ChatGPT works on deep learning models.

Brad AI, developed by google in competition with chatGPT, will soon reach the public.

We will use animal faces to detect digitally based on a convolutional.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.


Image Classification Model Trained Using Google Colab

 This article was published as a part of the Data Science Blogathon.



Using labeled sample photos, ing is image classification. Raw pixel data was the only input for early computer vision algorithms. However, pixel data alone does not provide a sufficiently consistent representation to encompass the many oscillations of an item as represented in an image. The placement of the object, the background behind it, ambient lighting, the camera angle, and the camera focus can all affect the raw pixel data.

Traditional computer vision models added new components derived from pixel data, such as textures, colour histograms, and shapes, to model objects more flexibly. The drawback of this approach was that feature engineering became very time-consuming because of the enormous number of inputs that needed to be changed. Which hues were crucial for categorizing cats? How flexible should the definitions of shapes be? Because characteristics had to be adjusted so precisely, it was difficult to create robust models.

Train Image Classification Model

A fundamental machine learning workflow is used in this tutorial:

Analyze dataset

Create an Input pipeline

Build the model

Train the model

Analyze the model

Setup And Import TensorFlow and other libraries import itertools import os import matplotlib.pylab as plt import numpy as np import tensorflow as tf import tensorflow_hub as hub print("TF version:", tf.__version__) print("Hub version:", hub.__version__) print("GPU is", "available" if tf.config.list_physical_devices('GPU') else "NOT AVAILABLE")

The output looks like this:

Select the TF2 Saved Model Module to Use

More TF2 models that produce feature vectors for images may be found here. (Note that TF1 Hub-format models won’t function here.)

There are numerous models that could work. Simply choose a different option from the list in the cell below, then proceed with the notebook. Here, I selected  Inception_v3 and automatically, it chose the Image size from the below list as 299 x 299. 

model_name = "resnet_v1_50" # @param ['efficientnetv2-s', 'efficientnetv2-m', 'efficientnetv2-l', 'efficientnetv2-s-21k', 'efficientnetv2-m-21k', 'efficientnetv2-l-21k', 'efficientnetv2-xl-21k', 'efficientnetv2-b0-21k', 'efficientnetv2-b1-21k', 'efficientnetv2-b2-21k', 'efficientnetv2-b3-21k', 'efficientnetv2-s-21k-ft1k', 'efficientnetv2-m-21k-ft1k', 'efficientnetv2-l-21k-ft1k', 'efficientnetv2-xl-21k-ft1k', 'efficientnetv2-b0-21k-ft1k', 'efficientnetv2-b1-21k-ft1k', 'efficientnetv2-b2-21k-ft1k', 'efficientnetv2-b3-21k-ft1k', 'efficientnetv2-b0', 'efficientnetv2-b1', 'efficientnetv2-b2', 'efficientnetv2-b3', 'efficientnet_b0', 'efficientnet_b1', 'efficientnet_b2', 'efficientnet_b3', 'efficientnet_b4', 'efficientnet_b5', 'efficientnet_b6', 'efficientnet_b7', 'bit_s-r50x1', 'inception_v3', 'inception_resnet_v2', 'resnet_v1_50', 'resnet_v1_101', 'resnet_v1_152', 'resnet_v2_50', 'resnet_v2_101', 'resnet_v2_152', 'nasnet_large', 'nasnet_mobile', 'pnasnet_large', 'mobilenet_v2_100_224', 'mobilenet_v2_130_224', 'mobilenet_v2_140_224', 'mobilenet_v3_small_100_224', 'mobilenet_v3_small_075_224', 'mobilenet_v3_large_100_224', 'mobilenet_v3_large_075_224'] model_handle_map = { } model_image_size_map = { "efficientnetv2-s": 384, "efficientnetv2-m": 480, "efficientnetv2-l": 480, "efficientnetv2-b0": 224, "efficientnetv2-b1": 240, "efficientnetv2-b2": 260, "efficientnetv2-b3": 300, "efficientnetv2-s-21k": 384, "efficientnetv2-m-21k": 480, "efficientnetv2-l-21k": 480, "efficientnetv2-xl-21k": 512, "efficientnetv2-b0-21k": 224, "efficientnetv2-b1-21k": 240, "efficientnetv2-b2-21k": 260, "efficientnetv2-b3-21k": 300, "efficientnetv2-s-21k-ft1k": 384, "efficientnetv2-m-21k-ft1k": 480, "efficientnetv2-l-21k-ft1k": 480, "efficientnetv2-xl-21k-ft1k": 512, "efficientnetv2-b0-21k-ft1k": 224, "efficientnetv2-b1-21k-ft1k": 240, "efficientnetv2-b2-21k-ft1k": 260, "efficientnetv2-b3-21k-ft1k": 300, "efficientnet_b0": 224, "efficientnet_b1": 240, "efficientnet_b2": 260, "efficientnet_b3": 300, "efficientnet_b4": 380, "efficientnet_b5": 456, "efficientnet_b6": 528, "efficientnet_b7": 600, "inception_v3": 299, "inception_resnet_v2": 299, "nasnet_large": 331, "pnasnet_large": 331, } model_handle = model_handle_map.get(model_name) pixels = model_image_size_map.get(model_name, 224) print(f"Selected model: {model_name} : {model_handle}") IMAGE_SIZE = (pixels, pixels) print(f"Input size {IMAGE_SIZE}") BATCH_SIZE = 16#@param {type:"integer"}

The inputs are scaled correctly for the selected module. A larger dataset helps with training, especially when fine-tuning (i.e., random distortions of an image each time it is read).

Our unique dataset should be organized as shown in the figure below.

Our customized dataset must now be uploaded to Drive. We must set the data augmentation parameter to true once our dataset needs augmentation.

data_dir = "/content/Images" def build_dataset(subset): return tf.keras.preprocessing.image_dataset_from_directory(data_dir,validation_split=.10,subset=subset,label_mode="categorical",seed=123,image_size=IMAGE_SIZE,batch_size=1) train_ds = build_dataset("training") class_names = tuple(train_ds.class_names) train_size = train_ds.cardinality().numpy() train_ds = train_ds.unbatch().batch(BATCH_SIZE) train_ds = train_ds.repeat() normalization_layer = tf.keras.layers.Rescaling(1. / 255) preprocessing_model = tf.keras.Sequential([normalization_layer]) do_data_augmentation = False #@param {type:"boolean"} if do_data_augmentation: preprocessing_model.add(tf.keras.layers.RandomRotation(40)) preprocessing_model.add(tf.keras.layers.RandomTranslation(0, 0.2)) preprocessing_model.add(tf.keras.layers.RandomTranslation(0.2, 0)) # Like the old tf.keras.preprocessing.image.ImageDataGenerator(), # image sizes are fixed when reading, and then a random zoom is applied. # RandomCrop with a batch size of 1 and rebatch later. preprocessing_model.add(tf.keras.layers.RandomZoom(0.2, 0.2)) preprocessing_model.add(tf.keras.layers.RandomFlip(mode="horizontal")) train_ds = images, labels:(preprocessing_model(images), labels)) val_ds = build_dataset("validation") valid_size = val_ds.cardinality().numpy() val_ds = val_ds.unbatch().batch(BATCH_SIZE) val_ds = images, labels:(normalization_layer(images), labels)) Output: Defining the Model

All that is required is to use the Hub module to layer a linear classifier on top of the feature extractor layer.

We initially use a non-trainable feature extractor layer for speed, but you can alternatively enable fine-tuning for better precision.

do_fine_tuning = True print("Building model with", model_handle) model = tf.keras.Sequential([ # Explicitly define the input shape so the model can be properly # loaded by the TFLiteConverter tf.keras.layers.InputLayer(input_shape=IMAGE_SIZE + (3,)), hub.KerasLayer(model_handle), tf.keras.layers.Dropout(rate=0.2), tf.keras.layers.Dense(len(class_names),activation='sigmoid', kernel_regularizer=tf.keras.regularizers.l2(0.0001)) ]),)+IMAGE_SIZE+(3,)) model.summary()

Output look below

Model Training

optimizer=tf.keras.optimizers.SGD(learning_rate=0.005, momentum=0.9), loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True, label_smoothing=0.1), metrics=[‘accuracy’])

steps_per_epoch = train_si validation_steps = valid_size hist = train_ds, epochs=50, steps_per_epoch=steps_per_epoch, validation_data=val_ds, validation_steps=validation_steps).history

The output looks below:

Once training is completed, we need to save the model by using the following code: ("save_locationmodelname.h5") Conclusion

This blog post categorized pictures using convolutional neural networks (CNNs) based on their visual content. This data set was utilized for testing and training CNN. Its accuracy percentage is greater than 98 per cent. We must employ tiny, grayscale images as our teaching resources. These photos require a pervasive processing time compared to other regular JPEG photos. A model with more layers and more picture data used to train the network on a cluster of GPUs would classify images more accurately. Future development will concentrate on categorizing enormous coloured images that are very useful for the segmentation process of images.

Key Takeaways

Image classification, a branch of computer vision, classifies and labels sets of pixels or vectors inside an image using a set of specified tags or categories that an algorithm has been trained on.

It is possible to differentiate between supervised and unsupervised classification.

In supervised classification, the classification algorithm is trained using a set of images and their associated labels.

Unsupervised classification algorithms only use raw data for training.

You require a sizable diversity of datasets with accurately labelled data to create trustworthy picture classifiers.

Thanks for reading!

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.


Feature Selection Techniques In Machine Learning (Updated 2023)


As a data scientist working with Python, it’s crucial to understand the importance of feature selection when building a machine learning model. In real-life data science problems, it’s almost rare that all the variables in the dataset are useful for building a model. Adding redundant variables reduces the model’s generalization capability and may also reduce the overall accuracy of a classifier. Furthermore, adding more variables to a model increases the overall complexity of the model.

As per the Law of Parsimony of ‘Occam’s Razor’, the best explanation of a problem is that which involves the fewest possible assumptions. Thus, feature selection becomes an indispensable part of building machine learning models.

Learning Objectives:

Understanding the importance of feature selection.

Familiarizing with different feature selection techniques.

Applying feature selection techniques in practice and evaluating performance.

Table of Contents What Is Feature Selection in Machine Learning?

The goal of feature selection techniques in machine learning is to find the best set of features that allows one to build optimized models of studied phenomena.

The techniques for feature selection in machine learning can be broadly classified into the following categories:

Supervised Techniques: These techniques can be used for labeled data and to identify the relevant features for increasing the efficiency of supervised models like classification and regression. For Example- linear regression, decision tree, SVM, etc.

Unsupervised Techniques: These techniques can be used for unlabeled data. For Example- K-Means Clustering, Principal Component Analysis, Hierarchical Clustering, etc.

From a taxonomic point of view, these techniques are classified into filter, wrapper, embedded, and hybrid methods.

Now, let’s discuss some of these popular machine learning feature selection methods in detail.

Types of Feature Selection Methods in ML Filter Methods

Filter methods pick up the intrinsic properties of the features measured via univariate statistics instead of cross-validation performance. These methods are faster and less computationally expensive than wrapper methods. When dealing with high-dimensional data, it is computationally cheaper to use filter methods.

Let’s, discuss some of these techniques:

Information Gain

Information gain calculates the reduction in entropy from the transformation of a dataset. It can be used for feature selection by evaluating the Information gain of each variable in the context of the target variable.

The Chi-square test is used for categorical features in a dataset. We calculate Chi-square between each feature and the target and select the desired number of features with the best Chi-square scores. In order to correctly apply the chi-squared to test the relation between various features in the dataset and the target variable, the following conditions have to be met: the variables have to be categorical, sampled independently, and values should have an expected frequency greater than 5.

Fisher score is one of the most widely used supervised feature selection methods. The algorithm we will use returns the ranks of the variables based on the fisher’s score in descending order. We can then select the variables as per the case.

Correlation is a measure of the linear relationship between 2 or more variables. Through correlation, we can predict one variable from the other. The logic behind using correlation for feature selection is that good variables correlate highly with the target. Furthermore, variables should be correlated with the target but uncorrelated among themselves.

If two variables are correlated, we can predict one from the other. Therefore, if two features are correlated, the model only needs one, as the second does not add additional information. We will use the Pearson Correlation here.

We need to set an absolute value, say 0.5, as the threshold for selecting the variables. If we find that the predictor variables are correlated, we can drop the variable with a lower correlation coefficient value than the target variable. We can also compute multiple correlation coefficients to check whether more than two variables correlate. This phenomenon is known as multicollinearity.

Variance Threshold

The variance threshold is a simple baseline approach to feature selection. It removes all features whose variance doesn’t meet some threshold. By default, it removes all zero-variance features, i.e., features with the same value in all samples. We assume that features with a higher variance may contain more useful information, but note that we are not taking the relationship between feature variables or feature and target variables into account, which is one of the drawbacks of filter methods.

The get_support returns a Boolean vector where True means the variable does not have zero variance.

Mean Absolute Difference (MAD)

‘The mean absolute difference (MAD) computes the absolute difference from the mean value. The main difference between the variance and MAD measures is the absence of the square in the latter. The MAD, like the variance, is also a scaled variant.’ [1] This means that the higher the MAD, the higher the discriminatory power.

‘Another measure of dispersion applies the arithmetic mean (AM) and the geometric mean (GM). For a given (positive) feature Xi on n patterns, the AM and GM are given by

respectively; since AMi ≥ GMi, with equality holding if and only if Xi1 = Xi2 = …. = Xin, then the ratio

Wrapper Methods

Wrappers require some method to search the space of all possible subsets of features, assessing their quality by learning and evaluating a classifier with that feature subset. The feature selection process is based on a specific machine learning algorithm we are trying to fit on a given dataset. It follows a greedy search approach by evaluating all the possible combinations of features against the evaluation criterion. The wrapper methods usually result in better predictive accuracy than filter methods.

Let’s, discuss some of these techniques:

Forward Feature Selection

This is an iterative method wherein we start with the performing features against the target features. Next, we select another variable that gives the best performance in combination with the first selected variable. This process continues until the preset criterion is achieved.

This method works exactly opposite to the Forward Feature Selection method. Here, we start with all the features available and build a model. Next, we the variable from the model, which gives the best evaluation measure value. This process is continued until the preset criterion is achieved.

This method, along with the one discussed above, is also known as the Sequential Feature Selection method.

Exhaustive Feature Selection

This is the most robust feature selection method covered so far. This is a brute-force evaluation of each feature subset. This means it tries every possible combination of the variables and returns the best-performing subset.

Embedded Methods

These methods encompass the benefits of both the wrapper and filter methods by including interactions of features but also maintaining reasonable computational costs. Embedded methods are iterative in the sense that takes care of each iteration of the model training process and carefully extract those features which contribute the most to the training for a particular iteration.

Let’s discuss some of these techniques here:

LASSO Regularization (L1)

Regularization consists of adding a penalty to the different parameters of the machine learning model to reduce the freedom of the model, i.e., to avoid over-fitting. In linear model regularization, the penalty is applied over the coefficients that multiply each predictor. From the different types of regularization, Lasso or L1 has the property that can shrink some of the coefficients to zero. Therefore, that feature can be removed from the model.

Random Forests is a kind of Bagging Algorithm that aggregates a specified number of decision trees. The tree-based strategies used by random forests naturally rank by how well they improve the purity of the node, or in other words, a decrease in the impurity (Gini impurity) over all trees. Nodes with the greatest decrease in impurity happen at the start of the trees, while notes with the least decrease in impurity occur at the end of the trees. Thus, by pruning trees below a particular node, we can create a subset of the most important features.


We have discussed a few techniques for feature selection. We have purposely left the feature extraction techniques like Principal Component Analysis, Singular Value Decomposition, Linear Discriminant Analysis, etc. These methods help to reduce the dimensionality of the data or reduce the number of variables while preserving the variance of the data.

Apart from the methods discussed above, there are many other feature selection methods. There are hybrid methods, too, that use both filtering and wrapping techniques. If you wish to explore more about feature selection techniques, great comprehensive reading material, in my opinion, would be ‘Feature Selection for Data and Pattern Recognition’ by Urszula Stańczyk and Lakhmi C. Jain.

Key Takeaways

Understanding the importance of feature selection and feature engineering in building a machine learning model.

Familiarizing with different feature selection techniques, including supervised techniques (Information Gain, Chi-square Test, Fisher’s Score, Correlation Coefficient), unsupervised techniques (Variance Threshold, Mean Absolute Difference, Dispersion Ratio), and their classifications (Filter methods, Wrapper methods, Embedded methods, Hybrid methods).

Evaluating the performance of feature selection techniques in practice through implementation.

Frequently Asked Questions Related

Spot Faked Photos Using Digital Forensic Techniques

If only all Photoshop jobs were this obvious, recognizing faked photos would be a lot easier. Stan Horaczek

We see hundreds or even thousands of images a day, and almost all of them have been digitally manipulated in some way. Some have gotten basic color corrections or simple Instagram filter effects, while others have received full on Photoshop jobs to completely transform the subject. It turns out humans aren’t very good at recognizing when an image has been manipulated, even if the change is fairly substantial. Hany Farid is a professor of computer science at Dartmouth College who specializes in photo forensics, and while he can’t share all of his fancy software tools for detecting editing trickery, he has shared a few tips for authenticating images on your own.

Try reverse image searching

A reverse image search in Google looks for images that are exact matches, as well as those that are thematically similar. Stan Horaczek

Before you start trying to CSI an image too hard, you can often debunk a faked photo by finding its source using a reverse image search. Google includes this function as part of its Images suite and looks for the exact image, as well as images that are similar in both subject matter and color aesthetics.

Another powerful tool is Tineye, which performs a similar function, but often returns fewer results that are closer to exact matches, which can make them easier to sort through.

“Often if you just do a reverse image search, you’ll find it right away,” says Farid. “You’ll see the original image that someone took from Getty Images and then added a UFO to the sky or something like that.”

Reverse image search can also be a useful tool if you suspect someone is stealing your social media photos and impersonating you. Upload your own photos to the tools and you can see where they appear on the web.

Look for weirdness

Fight the urge to zoom in too far to examine an image. This unedited image shows weirdness and artifacts when you’re up this close. You don’t have the CSI “enhance” tool. Stan Horaczek

The first step in analyzing an image involves a logical analysis, an area in which humans typically perform much better than computers—at least for now. “Computers are very good at measuring this fine grain details like compression artifacts and inconsistencies in geometry,” says Farid. “But if someone created a picture of a boat sailing down the middle of the road, a computer might not see anything wrong with that.”

Look at an image closely and examine objects that may have been inserted, or look for evidence that other objects may have been removed. Farid warns against zooming in too far, however, because that can introduce its own obstacles. “Sometimes if you zoom into an image up to 500%, it’s very easy to look at something that’s perfectly valid, like artifacts from lens distortions or noise, and start attributing that to manipulation,” says Farid. He recommends zooming to 200% or 300% maximum to avoid false positives.

This is also the time to look for errors in scale and perspective, which are some of the trickiest things to fix in a fake. Does one person in a group photo have an abnormally large head? Does an object look like it’s sitting at an odd angle? These are warning signs that warrant an even closer look.

Check the EXIF data

You can learn a lot about a photo by checking out the metadata associated with it. Stan Horaczek

When a digital camera captures an image, it appends a whole array of information called EXIF data to the image file. This data includes all the critical camera settings, as well as other info like GPS data if it’s available (which is typically the case with smartphone photos, unless the person has intentionally turned location settings off).

If you have the location of the photo, you can plug it into Google Maps and use Street View to get a general idea of what the location might actually look like. The Street View scene won’t necessarily be 100 percent accurate and up-to-date, but it can be a good starting point.

This analysis from chúng tôi shows the metadata attached to the JPEG file. Stan Horaczek

You can also sometimes find the original pixel dimensions of the image. This may not sound very useful, but you can easily look up the typical image dimensions of a photo from a particular camera and then compare them to the file you’re currently viewing. If the final version is smaller, that indicates that the photo may have been cropped to exclude information.

Also in the EXIF data is a software tag. “If an image is opened up in Photoshop and then saved, the metadata will then say “Photoshop” and then whatever version they used,” says Farid. He warns, however, that this tag doesn’t necessarily indicate that a photo is trying to trick you. Many photographs go through Photoshop or some other editing program for simple adjustments like color correction, or even just resizing.

Examine the shadows

The image has been edited to flip the man’s face, which creates a clear contradiction in the direction of the shadows. It was part of a study to determine how well people can recognize faked photos. Cognitive Research: Principles and Implications

We know that the shadow cast by an object will appear opposite the light that caused it. Using that information, investigators can actually map lines between shadows, objects, and the corresponding light sources to see if the image is physically possible.

“Out in the physical 3D world, I have a linear constraint on a shadow, an object, and a light source,” says Farid. “That means I can find all the objects that are casting shadows—as long as I can very clearly attribute a point on a shadow to a point on an object in the image.”

One of the original examples in the study about people’s ability to recognized edited photos showed a man whose face had been flipped so the light source was landing on the same side as the shadow. It can be easy to identify once you’re looking for it.

Mess with it in Photoshop

(The comparison above show two versions of the same image. The one on the right has been subjected to the levels adjustments that clearly show brush adjustments over the front license plate)

If you have access to Photoshop yourself, there are a few adjustments you can make to try and draw out artifacts that you might miss with your naked eye.

One tool Farid suggests using is Levels. You can access this by pressing Command + L (Mac) or Control + L (PC). “If you bring the white point all the way down really close to the black point, what’s going to happen is that the narrow range of black will expand out quite a bit,” says Farid. “If somebody has taken the eraser tool and erased something in a dark area, you can see the traces of the tool.” The same effect happens if you drag the black point all the way up to draw more detail out of the image highlights.

You can try a few other Photoshop tricks to shed some light on alterations. Cranking up the contrast or the sharpness will help emphasize hard edges in the photo, which can sometimes occur when an object is pasted in. Farid also suggests inverting the colors on an image (control + I or command + I) to get a new perspective on the photo, which could jolt your brain into drawing out some irregularities.

Look for patterns

There are some patterns you can recognize with your eyeballs. A novice Photoshop user may well leave repeating patterns behind when trying to clone out an object. Zoom out and look at the image from afar to see if your eye can pick up on any patterns, then zoom in closer to see if there might be some repeating objects in the scene.

Researchers also often look for patterns in artifacts left from JPEG image compression. JPEG is a “lossy” format, which means it jettisons some information from the original file to save space and make it readable by a wider array of machines. This causes artifacts, or changes in the data introduced over time—especially when you save it more than once. “Imagine you go out and you buy your brand new iPhone and even the packaging is beautiful. Everything fits just right down to the tape,” says Farid. “Try putting that back together and see what happens. It never works. The same thing is true of a digital file. When you unpack it in Photoshop, and then recompress it, you can’t get it perfectly right. It leaves artifacts that we can recognize.”

Be wary of online tools

A popular image validation tool says my photo of corn has been edited because the EXIF has been stripped and it was exported from Adobe Lightroom. This is, in fact, what the corn looked like. Stan Horaczek

There are places online where you can upload an image to check for warning signs of editing, but results can be very tricky to interpret. For instance, I uploaded this picture of corn to a popular site and it was flagged because it was “not an original camera image.” I exported a JPEG from a DSLR raw file with some color corrections myself, so I know it wasn’t faked, but it’s still flagged. It didn’t claim the photo was doctored, but it also casts doubt where there shouldn’t be any.

There are some websites that can read the software tags, like this one that can tell you exactly what actions were taken in Lightroom when editing a photo. That’s more useful, but you still need an understanding of the software itself to make an accurate interpretation.

There is software out there that can identify these more complex manipulations, but it’s typically only available commercially, for security and law enforcement operations.

“Making that stuff public is tricky because the more I make the information public, the easier it is to circumvent,” says Farid. “We release the details in scientific publications, but to really go back and implement all that technique would be really hard for somebody. That’s the compromise we have right now.

Don’t fall for false positives

The final step is realizing that sometimes things look altered when they aren’t. “Photographs just look weird sometimes,” says Farid.

Update the detailed information about Music Genres Classification Using Deep Learning Techniques on the website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!