You are reading the article 20 Questions To Test Your Skills On Dimensionality Reduction (Pca) updated in November 2023 on the website Minhminhbmm.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested December 2023 20 Questions To Test Your Skills On Dimensionality Reduction (Pca)
This article was published as a part of the Data Science Blogathon
IntroductionPrincipal Component Analysis is one of the famous Dimensionality Reduction techniques which helps when we work with datasets having very large dimensions.
Therefore it becomes necessary for every aspiring Data Scientist and Machine Learning Engineer to have a good knowledge of Dimensionality Reduction.
In this article, we will discuss the most important questions on Dimensionality Reduction which is helpful to get you a clear understanding of the techniques, and also for Data Science Interviews, which cover its very fundamental level to complex concepts.
Let’s get started, 1. What is Dimensionality Reduction?In Machine Learning, dimension refers to the number of features in a particular dataset.
In simple words, Dimensionality Reduction refers to reducing dimensions or features so that we can get a more interpretable model, and improves the performance of the model.
2. Explain the significance of Dimensionality Reduction.There are basically three reasons for Dimensionality reduction:
Visualization
Interpretability
Time and Space Complexity
Let’s understand this with an example:
Imagine we have worked on an MNIST dataset that contains 28 × 28 images and when we convert images to features we get 784 features.
If we try to think of each feature as one dimension, then how can we think of 784 dimensions in our mind?
We are not able to visualize the scattering of points of 784 dimensions.
That is the first reason why Dimensionality Reduction is Important!
Let’s say you are a data scientist and you have to explain your model to clients who do not understand Machine Learning, how will you make them understand the working of 784 features or dimensions.
In simple language, how we interpret the model to the clients.
That is the second reason why Dimensionality Reduction is Important!
Let’s say you are working for an internet-based company where the output of something must be in milliseconds or less than that, so “Time complexity” and “Space Complexity” matter a lot. More features need more Time which these types of companies can’t afford.
That is the third reason why Dimensionality Reduction is Important!
3. What is PCA? What does a PCA do?Principal Component analysis. It is a dimensionality reduction technique that summarizes a large set of correlated variables (basically high dimensional data) into a smaller number of representative variables, called the Principal Components, that explains most of the variability of the original set i.e, not losing that much of the information.
PCA stands for. It is a dimensionality reduction technique that summarizes a large set of correlated variables (basically high dimensional data) into a smaller number of representative variables, called the Principal Components, that explains most of the variability of the original set i.e, not losing that much of the information.
PCA is a deterministic algorithm in which we have not any parameters to initialize and it doesn’t have a problem of local minima, like most of the machine learning algorithms has.
Image Source: Google Images
4. List down the steps of a PCA algorithm.The major steps which are to be followed while using the PCA algorithm are as follows:
Step-1: Get the dataset.
Step-2: Compute the mean vector (µ).
Step-3: Subtract the means from the given data.
Step-4: Compute the covariance matrix.
Step-5: Determine the eigenvectors and eigenvalues of the covariance matrix.
Step-6: Choosing Principal Components and forming a feature vector.
Step-7: Deriving the new data set by taking the projection on the weight vector.
5. Is it important to standardize the data before applying PCA?Usually, the aim of standardization is to assign equal weights to all the variables. PCA finds new axes based on the covariance matrix of original variables. As the covariance matrix is sensitive to the standardization of variables therefore if we use features of different scales, we often get misleading directions.
Moreover, if all the variables are on the same scale, then there is no need to standardize the variables.
6. Is rotation necessary in PCA? If yes, Why? Discuss the consequences if we do not rotate the components?Yes, the idea behind rotation i.e, orthogonal Components is so that we are able to capture the maximum variance of the training set.
If we don’t rotate the components, the effect of PCA will diminish and we’ll have to select more Principal Components to explain the maximum variance of the training dataset.
7. What are the assumptions taken into consideration while applying PCA?The assumptions needed for PCA are as follows:
1. PCA is based on Pearson correlation coefficients. As a result, there needs to be a linear relationship between the variables for applying the PCA algorithm.
2. For getting reliable results by using the PCA algorithm, we require a large enough sample size i.e, we should have sampling adequacy.
3. Your data should be suitable for data reduction i.e., we need to have adequate correlations between the variables to be reduced to a smaller number of components.
4. No significant noisy data or outliers are present in the dataset.
8. What will happen when eigenvalues are roughly equal while applying PCA?While applying the PCA algorithm, If we get all eigenvectors the same, then the algorithm won’t be able to select the Principal Components because in such cases, all the Principal Components are equal.
9. What are the properties of Principal Components in PCA?The properties of principal components in PCA are as follows:
1. These Principal Components are linear combinations of original variables that result in an axis or a set of axes that explain/s most of the variability in the dataset.
2. All Principal Components are orthogonal to each other.
3. The first Principal Component accounts for most of the possible variability of the original data i.e, maximum possible variance.
4. The number of Principal Components for n-dimensional data should be at utmost equal to n(=dimension). For Example, There can be only two Principal Components for a two-dimensional data set.
10. What does a Principal Component in a PCA signify? How can we represent them mathematically?The Principal Component represents a line or an axis along which the data varies the most and it also is the line that is closest to all of the n observations in the dataset.
In mathematical terms, we can say that the first Principal Component is the eigenvector of the covariance matrix corresponding to the maximum eigenvalue.
Accordingly,
Sum of squared distances = Eigenvalue for PC-1
Sqrt of Eigenvalue = Singular value for PC-1
11. What does the coefficient of Principal Component signify?If we project all the points on the Principal Component, they tell us that the independent variable 2 is N times as important as of independent variable 1.
12. Can PCA be used for regression-based problem statements? If Yes, then explain the scenario where we can use it.Yes, we can use Principal Components for regression problem statements.
, we can use Principal Components for regression problem statements.
PCA would perform well in cases when the first few Principal Components are sufficient to capture most of the variation in the independent variables as well as the relationship with the dependent variable.
The only problem with this approach is that the new reduced set of features would be modeled by ignoring the dependent variable Y when applying a PCA and while these features may do a good overall job of explaining the variation in X, the model will perform poorly if these variables don’t explain the variation in Y.
13. Can we use PCA for feature selection?Feature selection refers to choosing a subset of the features from the complete set of features.
No, PCA is not used as a feature selection technique because we know that any Principal Component axis is a linear combination of all the original set of feature variables which defines a new set of axes that explain most of the variations in the data.
Therefore while it performs well in many practical settings, it does not result in the development of a model that relies upon a small set of the original features.
14. Comment whether PCA can be used to reduce the dimensionality of the non-linear dataset.PCA does not take the nature of the data i.e, linear or non-linear into considerations during its algorithm run but PCA focuses on reducing the dimensionality of most datasets significantly. PCA can at least get rid of useless dimensions.
However, reducing dimensionality with PCA will lose too much information if there are no useless dimensions.
15. How can you evaluate the performance of a dimensionality reduction algorithm on your dataset?A dimensionality reduction algorithm is said to work well if it eliminates a significant number of dimensions from the dataset without losing too much information. Moreover, the use of dimensionality reduction in preprocessing before training the model allows measuring the performance of the second algorithm.
We can therefore infer if an algorithm performed well if the dimensionality reduction does not lose too much information after applying the algorithm.
Comprehension Type Question: (16 – 18) Consider a set of 2D points {(-3,-3), (-1,-1),(1,1),(3,3)}. We want to reduce the dimensionality of these points by 1 using PCA algorithms. Assume sqrt(2)=1.414.Now, Answer the Following Questions:
SOLUTION:
2 i.e, two-dimensional space, and our objective is to reduce the dimensionality of the data to 1 i.e, 1-dimensional data ⇒ K=1
Here the original data resides in Ri.e, two-dimensional space, and our objective is to reduce the dimensionality of the data to 1 i.e, 1-dimensional data ⇒
We try to solve these set of problem step by step so that you have a clear understanding of the steps involved in the PCA algorithm:
Step-1: Get the DatasetHere data matrix X is given by [ [ -3, -1, 1 ,3 ], [ -3, -1, 1, 3 ] ]
Step-2: Compute the mean vector (µ)Mean Vector: [ {-3+(-1)+1+3}/4, {-3+(-1)+1+3}/4 ] = [ 0, 0 ]
Step-3: Subtract the means from the given dataSince here the mean vector is 0, 0 so while subtracting all the points from the mean we get the same data points.
Step-4: Compute the covariance matrixTherefore, the covariance matrix becomes XXT since the mean is at the origin.
Therefore, XXT becomes [ [ -3, -1, 1 ,3 ], [ -3, -1, 1, 3 ] ] ( [ [ -3, -1, 1 ,3 ], [ -3, -1, 1, 3 ] ] )T
= [ [ 20, 20 ], [ 20, 20 ] ]
Step-5: Determine the eigenvectors and eigenvalues of the covariance matrixdet(C-λI)=0 gives the eigenvalues as 0 and 40.
Now, choose the maximum eigenvalue from the calculated and find the eigenvector corresponding to λ = 40 by using the equations CX = λX :
Accordingly, we get the eigenvector as (1/√ 2 ) [ 1, 1 ]
Therefore, the eigenvalues of matrix XXT are 0 and 40.
Step-6: Choosing Principal Components and forming a weight vectorHere, U = R2×1 and equal to the eigenvector of XXT corresponding to the largest eigenvalue.
Now, the eigenvalue decomposition of C=XXT
And W (weight matrix) is the transpose of the U matrix and given as a row vector.
Therefore, the weight matrix is given by [1 1]/1.414
Step-7: Deriving the new data set by taking the projection on the weight vectorNow, reduced dimensionality data is obtained as xi = UT Xi = WXi
x1 = WX1= (1/√ 2 ) [ 1, 1 ] [ -3, -3 ]T = – 3√ 2
x2 = WX2= (1/√ 2) [ 1, 1 ] [ -1, -1 ]T = – √ 2
x3 = WX3= (1/√ 2) [ 1, 1 ] [ 1, 1]T = – √ 2
x4 = WX4= (1/√ 2 ) [ 1, 1 ] [ 3, 3 ]T = – 3√ 2
Therefore, the reduced dimensionality will be equal to {-3*1.414, -1.414,1.414, 3*1.414}.
This completes our example!
19. What are the Advantages of Dimensionality Reduction?1. Less misleading data means model accuracy improves.
2. Fewer dimensions mean less computing. Less data means that algorithms train faster.
3. Less data means less storage space required.
4. Removes redundant features and noise.
5. Dimensionality Reduction helps us to visualize the data that is present in higher dimensions in 2D or 3D.
2. It can be computationally intensive.
3. Transformed features are often hard to interpret.
4. It makes the independent variables less interpretable.
End NotesThanks for reading!
Please feel free to contact me on Linkedin, Email.
About the author Chirag GoyalCurrently, I am pursuing my Bachelor of Technology (B.Tech) in Computer Science and Engineering from the Indian Institute of Technology Jodhpur(IITJ). I am very enthusiastic about Machine learning, Deep Learning, and Artificial Intelligence.
The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.
Related
You're reading 20 Questions To Test Your Skills On Dimensionality Reduction (Pca)
20 Essential Skills For Digital Marketers
Many people come into the digital marketing industry with the soft skills they need to do a great job. These skills can make it easier for them to learn the hard skills necessary to perform their job duties.
It’s been said that soft skills can’t really be taught, but I strongly disagree with that.
It takes both hard and soft skills to do your best work in digital marketing. And the good news is, you can work on building both types.
Depending upon your experiences with the world, you may give up easily because you’ve never gotten what you wanted. You may not want to ask questions because you’ve had questions shut down before. With the proper environment, those skills can definitely sharpen.
Here are 20 essential skills to help you succeed in digital marketing.
Soft SkillsSoft skills are skills related to how you work. These are skills many people possess without really thinking about them.
Here are 10 soft skills that are the most important ones from my experience.
1. CuriosityI love when an employee wants to learn more. There are so many aspects of digital marketing and so many little niche areas. Craving more knowledge about how it all fits together truly makes you better in any role.
2. TenacityIf you give up easily, digital marketing is probably not the field for you.
You may work to rank a site and an update crushes you, or you may pitch ideas that get rejected. You may be called in to help figure out why a site isn’t doing well.
Every day there’s something new, and that’s what keeps it all interesting.
3. Willingness to Listen and LearnI have been wrong so many times it’s crazy. My employees (and clients) know to argue their points with me if they think I’m wrong, and I’ve learned to really trust what they say.
I’ve had clients give me instructions that I don’t think will work out but I try them and have been surprised quite often. Thinking you know everything means you don’t have the opportunity to get better.
4. AdaptabilityWith my team, assignments can vary from month to month depending upon our client roster. They might be working on a finance client for one month then they’ll need to switch up to a travel client.
They may need to pitch in and help someone else out on a client they’ve never worked on.
When I first started out, I got thrown into technical SEO, content writing, and PPC all at the same time. There’s always a chance that you’ll need to do something else or extra so you might as well be prepared for it.
5. Ability to MultitaskThere are always a ton of things going on at once in digital marketing. You want to read the latest articles, see the latest relevant tweets, do your job, figure out how to do something in a different way that saves time, do reports, etc.
If you can’t multitask well, you will quickly fall behind.
6. EmpathyBeing able to see things from someone else’s point of view is essential to marketing of any kind. It’s important to understand why someone thinks a certain way.
Empathy is so important that I wrote an entire article about it.
7. Taking Your Own Ego out of the PictureSometimes we are so caught up in what we think needs to be done, we can’t take a step back and listen to someone else because all we are thinking is that we know what’s best.
We all need to realize that we don’t always get it right, and even if we are right, sometimes it just doesn’t matter.
You can’t take it personally when you think you should bid on certain keywords and the client wants you to bid on different keywords.
I can’t take it personally when I submit an article to this very site and my editor asks me to make a change.
8. Strong Work EthicObviously, you really need this in most careers but with something like link building, you are never going to do well without wanting to work hard as it’s very frustrating and tedious at times.
When marketing fails, it can be extremely difficult to start over. You will encounter lots of roadblocks in some form or another so it’s critical to keep trying and not give up.
9. Honesty and TransparencyOne of my pet peeves is when someone can’t admit to a mistake.
You’ll always be found out.
We had a couple of employees who would leave work on the clock and think they wouldn’t be caught, for example. We had someone clock in from another state and pretend that I just hadn’t seen them in the office.
People say completely absurd things. With so many people available to replace you, not being honest is unacceptable.
10. Being Able to Say “I Don’t Know”I don’t know why this is so difficult but it seems to be. I worked for someone who told me to never admit to not knowing something, and I think that is ridiculous.
You don’t learn unless you admit that you don’t know something. If I don’t know something, I want to dig in and figure it out.
I don’t find it embarrassing to not know everything. I’ve never thought less of anyone who admitted to not knowing something.
With so much information thrown at us constantly, it’s impossible to keep up.
Hard SkillsHard skills are teachable skills. Here are the 10 hard skills that I think are the most achievable and the ones that can help you forge a broader knowledge of the industry.
11. How to Search WellPeople constantly ask questions they could easily query in Google. It can waste a lot of time.
You need to be able to dig for information and get better with your search queries so you aren’t wading through tons of irrelevant information.
We’ve had employees who started to work on a new client and would email to ask me to explain what a certain product was used for, for example.
I’d then spend my own time searching Google and figuring it out, then emailing back. I’d much rather do my own research than ask someone else to do it for me.
13. Conducting Research and Gathering DataYou will most likely need to pull data from various sources at some point. You may have to do a technical audit on a website.
There are so many tools and sources for information that it’s critical you can figure out where to look and how to get what you need.
If you’re creating content, you’ll also need to be able to find and verify information.
14. Using Google AnalyticsYou can get so much information from Google Analytics that it would be a real missed opportunity not to try and master it.
If Google is giving you information about your site, you absolutely need to use it.
From looking at traffic to tracking conversions, Google Analytics is a must-have tool, and it’s free.
15. Using at Least One Major SEO ToolOutside of Google Analytics, it’s good to know how to use at least one tool that can give you a different dataset. I use a few because each has its strong points.
It’s amazing to see how much information you can get from these tools and their reports.
16. Analyzing the Effectiveness of Your EffortsSome people measure progress by increased traffic. Some like conversions.
Whatever your KPIs are, you need to know how to track them reliably.
17. CommunicationWhether you communicate better through writing or speaking, good communication skills are absolutely critical.
My employees are remote workers and none of my clients are anywhere near me, so I spend a lot of time emailing back and forth with everyone.
I think good communication skills come naturally to some people. But if they don’t to you, it’s definitely something you can work towards improving.
18. Figuring Out What’s Going On and What’s Gone WrongIf traffic suddenly drops or your bounce rate drastically increases, it’s important that you know how to start tracking down potential causes.
Not everything is cause for alarm, of course. There may be logical explanations for what you’re seeing.
You simply need to know where to look and how to grab enough information to get an idea of what’s happening and then start to fix it.
19. Using a CrawlerThere are several great crawling tools out there and you should familiarize yourself with at least one of them.
Even if you aren’t getting too technical with your work, just being able to get information about redirects or duplicate content can be incredibly helpful.
20. Coding or Understanding CodeI came into SEO from a programming background so I’m a bit biased, but I do think that SEO professionals should at least know basic HTML.
Coding also teaches you how to think very logically and improves your problem-solving skills. Even if you never have a chance to code, you will be better equipped to think through problems.
Do You Really Need to Possess All of These Skills to Be Great at Your Job?Absolutely not. There are countless SEO pros who don’t know how to code, for example, and they can do their jobs well.
There are people who don’t possess a lot of the soft skills and they’re fine.
With today’s level of remote work situations there is more flexibility than ever to be yourself, work where you like, sometimes work whenever you feel like it, and simply get the job done.
But when it’s time to grow your career and enhance your professional value, you’ll definitely want to work on a few of the key digital marketing skills above.
More Resources:
Top 20 Apache Oozie Interview Questions
This article was published as a part of the Data Science Blogathon.
IntroductionApache Oozie is a Hadoop workflow scheduler. It is a system that manages the workflow of dependent tasks. Users can design Directed Acyclic Graphs of workflows that can be run in parallel and sequentially in Hadoop.
Apache Oozie is an important topic in Data Engineering, so we shall discuss some Apache Oozie interview questions and answers. These questions and answers will help you prepare for Apache Oozie and Data Engineering Interviews.
Read more about Apache Oozie here.
Interview Questions on Apache Oozie1. What is Oozie?
Oozie is a Hadoop workflow scheduler. Oozie allows users to design Directed Acyclic Graphs of workflows, which can then be run in Hadoop in parallel or sequentially. It can also execute regular Java classes, Pig operations, and interface with HDFS. It can run jobs both sequentially and concurrently.
2. Why do we need Apache Oozie?
Apache Oozie is an excellent tool for managing many tasks. There are several sorts of jobs that users want to schedule to run later, as well as tasks that must be executed in a specified order. Apache Oozie can make these types of executions much easier. Using Apache Oozie, the administrator or user can execute multiple independent jobs in parallel, run the jobs in a specific sequence, or control them from anywhere, making it extremely helpful.
3. What kind of application is Oozie?
Oozie is a Java Web App that runs in a Java servlet container.
4. What exactly is an application pipeline in Oozie?
It is important to connect workflow jobs that run regularly but at various times. Multiple successive executions of a process become the input to the following workflow. When these procedures are chained together, the outcome is referred to as a data application pipeline.
5. What is a Workflow in Apache Oozie?
Apache Oozie Workflow is a set of actions that include Hadoop MapReduce jobs, Pig jobs, and so on. The activities are organized in a control dependency DAG (Direct Acyclic Graph) that governs how and when they can be executed. hPDL, an XML Process Definition Language, defines Oozie workflows.
6. What are the major elements of the Apache Oozie workflow?
The Apache Oozie workflow has two main components.
Control flow nodes: These nodes are used to define the start and finish of the workflow, as well as to govern the workflow’s execution path.
Action nodes are used to initiate the processing or calculation task. Oozie supports Hadoop MapReduce, Pig, and File system operations and system-specific activities like HTTP, SSH, and email.
7. What are the functions of the Join and Fork nodes in Oozie?
In Oozie, the fork and join nodes are used in tandem. The fork node divides the execution path into multiple concurrent paths. The join node combines two or more concurrent execution routes into one. The join node’s descendants are the fork nodes that connect concurrently to form join nodes.
Syntax:
…
…
8. What are the various control nodes in the Oozie workflow?
The various control nodes are:
Start
End
Kill
Decision
Fork & Join Control nodes
9. How can I set the start, finish, and error nodes for Oozie?
This can be done in the following Syntax:<error
“[A custom message]”
10. What exactly is an application pipeline in Oozie?
It is important to connect workflow jobs that run regularly but at various times. Multiple successive executions of a process become the input to the following workflow. When these procedures are chained together, the outcome is referred to as a data application pipeline.
11. What are Control Flow Nodes?
The mechanisms that specify the beginning and end of the process are known as control flow nodes (start, end, fail). Furthermore, control flow nodes give way for controlling the workflow’s execution path (decision, fork, and join)
12. What are Action Nodes?
The mechanisms initiating the execution of a computation/processing task are called action nodes. Oozie supports a variety of Hadoop actions out of the box, including Hadoop MapReduce, Hadoop file system, Pig, and others. In addition, Oozie supports system-specific jobs such as SSH, HTTP, email, and so forth.
13. Are Cycles supported by Apache Oozie Workflow?
Apache Oozie Workflow does not support cycles. Workflow definitions in Apache Oozie must be a strict DAG. If Oozie detects a cycle in the workflow specification during workflow application deployment, the deployment is aborted.
14. What is the use of the Oozie Bundle?
The Oozie bundle enables the user to run the work in batches. Oozie bundle jobs are started, halted, suspended, restarted, re-run, or killed in batches, giving you more operational control.
15. How does a pipeline work in Apache Oozie?
The pipeline in Oozie aids in integrating many jobs in a workflow that runs regularly but at different intervals. The output of numerous workflow executions becomes the input of the next planned task in the workflow, which is conducted back to back in the pipeline. The connected chain of workflows forms the Oozie pipeline of jobs.
16. Explain the role of the Coordinator in Apache Oozie?
To resolve trigger-based workflow execution, the Apache Oozie coordinator is employed. It provides a basic framework for providing triggers or predictions, after which it schedules the workflow depending on those established triggers. It enables administrators to monitor and regulate workflow execution in response to cluster conditions and application-specific constraints.
17. What is the decision node’s function in Apache Oozie?
Switch statements are decision nodes that conduct different jobs dependent on the conclusion of another expression.
18. What are the various control flow nodes offered by Apache Oozie workflows for starting and terminating the workflow?
The following control flow nodes are supported by Apache Oozie workflow and start or stop workflow execution.
End Control Node – The end node is the last node to which an Oozie workflow task transfers, which signifies that the workflow job was completed. When a workflow task reaches the end node, it completes, and the job status switches to SUCCEED. One end node is required for every Apache Oozie workflow definition.
The kill control node allows a workflow job to kill itself. When a workflow task reaches the kill node, it terminates in error, and the job status switches to KILLED.
19. What are the various control flow nodes that Apache Oozie workflows offer for controlling the workflow execution path?
The following control flow nodes are supported by Apache Oozie workflow and control the workflow’s execution path.
Decision Control Node – A decision control node is similar to a switch-case statement because it allows a process to choose which execution path to take.
Fork and Join Control Nodes – The fork and join control nodes work in pairs and function as follows. The fork node divides a single execution path into numerous concurrent execution paths. The join node waits until all concurrent execution paths from the relevant fork node arrive.
20. What is the default database Oozie uses to store job ids and statuses?
Oozie stores job ids and job statuses in the Derby database.
ConclusionThese Apache Oozie Interview Questions can assist you in becoming interview-ready for your upcoming personal interview. In Oozie-related interviews, interviewers usually ask the interviewee these .
To sum up:
Apache Oozie is a distributed scheduling system to launch and manage Hadoop tasks.
Oozie allows you to combine numerous complex jobs that execute in a specific order to complete a larger task.
Two or more jobs within a specific set of tasks can be programmed to execute in parallel with Oozie.
The real reason for adopting Oozie is to manage various types of tasks that are being handled in the Hadoop system. The user specifies various dependencies between jobs in the form of a DAG. This information is consumed by Oozie and handled in the order specified in the workflow. This saves the user time when managing the complete workflow. Oozie also determines the frequency at which a job is executed.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.
Related
Top 20 Reactjs Interview Questions And Answers In 2023
ReactJS Interview Questions and Answers
ReactJS is a JavaScript library that is used for building user interfaces. Facebook and an individual group of developers maintain it.
ReactJS is one of the top in-demand skills for web developers, primarily front-end and full-stack developers. As such, a front-end developer earns an average base salary of $129,145 per year. Hence, preparing well for ReactJS interviews can open various job prospects for candidates.
Start Your Free Software Development Course
Web development, programming languages, Software testing & others
Key Highlights
ReactJS interview questions involve core concepts such as JSX, state, props, and component lifecycle.
Experience building real-world applications using ReactJS can help demonstrate practical knowledge and problem-solving skills to the interviewer.
Good knowledge of JavaScript and ES6 features is essential to write clean and efficient code while working with ReactJS.
Excellent communication and collaboration skills and a willingness to learn and adapt to new technologies can help make a good impression on the interviewer.
Part 1 –ReactJS Interview Questions (Basic)This first part covers basic ReactJS Interview Questions and Answers:
Q1. What is React?Answer: React is a JavaScript library used for building user interfaces. ReactJS is used as a base of a single webpage or mobile application. It deals with the view layer of an application.
Q2. What is JSX?Answer: JSX is simple JavaScript that allows HTML syntax and other HTML tags in the code. HTML syntax is processed into JavaScript calls of React framework.
Q3. What is FLUX in ReactJS?Answer: Flux is an application architecture in React View Library that Facebook designed for creating data layers in an application based on JavaScript.
Q4. What are Props and States in React?Answer: Props mean the arguments’ properties passed in the JavaScript function. A state is used for creating a dynamic and interactive component.
Q5. What are refs in React?Answer: For focus management and trigger animation, one uses refs in React. It also contains third-party libraries.
Q6. What is the difference between ReactJS and AngularJS?Answer:
ReactJS AngularJS
A JavaScript library for building user interfaces. A full-featured JavaScript framework for building large-scale, complex web applications.
It uses a virtual DOM to update the actual DOM efficiently. It uses a two-way data binding approach, where any changes to the model automatically update the view and vice versa.
Follows a unidirectional data flow, where data flows only in one direction, from parent to child components. Follows a bidirectional data flow, where changes in the view automatically update the model, and changes in the model automatically update the view.
It provides more flexibility and control, allowing developers to use any other library or framework alongside it. It provides a complete solution for building web applications, including many built-in features like routing, forms, and animations.
A good understanding of JavaScript is required as it relies heavily on it. It relies more on declarative templates and requires less JavaScript knowledge.
Q7. How is flux different from Redux?Answer:
Flux Redux
Flux is an architectural pattern that Facebook introduced. Redux is a predictable state container that is based on Flux architecture.
Flux’s single dispatcher receives actions and dispatches them to the stores. The store receives dispatched actions directly, as Redux has no dispatcher.
Flux has multiple stores that contain the application state. Redux has a single store that contains the entire application state.
Flux stores can have mutable states and be changed anywhere in the application. Redux stores have an immutable state; the only way to change the state is by dispatching an action.
Flux has more boilerplate code and requires more setup. Redux has less boilerplate code and is easier to set up.
Q8. What do you mean by a functional component in React?Answer: A functional component is a component that returns React elements as an element.
Q9. What is routing?Answer:
The ability to switch between various pages or views of an application is called routing in React.
The React Router library implements routing in React applications.
Developers can design routes using essential components and properties because it supports declarative routing.
Routing is integral to building complex React applications, as it allows for better organization and separation of concerns between different parts of an application.
Q10. What are the components of Redux?Answer: Action, Reducer, Store, and View are the components of Redux.
Action: Describes a user’s intent in the form of an object.
Reducer: A pure function that receives the current state and an action and returns a new state.
Store: A centralized place to store the state of an application.
View: The user interface of an application.
Part 2 –ReactJS Interview Questions (Advanced) Q11. List the characteristics of ReactJS.Answer:
JSX: ReactJS has JSX. JSX is simple JavaScript that allows HTML syntax and other HTML tags in the code. The React framework processes HTML syntax into JavaScript calls.
React Native: It contains a native library that supports Native iOS and Android applications.
Simplicity: It is straightforward to grab. Its component-based approach and well-defined lifecycle are direct to use.
Easy to Learn: Anyone with basic programming knowledge can quickly learn ReactJS, for Learning ReactJS, one needs to know the basics of HTML and CSS.
Data-Binding: ReactJS uses one-way data binding and application architecture controls data flow via a dispatcher.
Testability: ReactJS application is straightforward to test. Its views are easy to configure and can be treated as an application.
Q12. What are the lifecycle methods of React Components in detail?Answer: Some of the most important lifecycles methods are given below:
componentWillMount()
componentDidMount()
componentWillRecieveProps()
shouldComponentUpdate()
componentWillUpdate()
Q13. What is the lifecycle of ReactJS?Answer:
Increased application performance.
Client and Server side building.
Reliable due to JSX code.
Easy testing.
Q15. Which company developed React? When was it released?Answer: Facebook developed ReactJS and developed it in March 2013.
Q16. What is the significance of the virtual DOM in ReactJS?Answer: In ReactJS, the virtual DOM is a lightweight copy of the actual DOM, which helps to enhance the application’s performance. Whenever there is a change in the state of a React component, the virtual DOM compares the new and previous states and creates a list of minimum necessary changes. It then updates the actual DOM with these changes, resulting in faster rendering and improved user experience.
Q17. What is the basic difference between pros and state?Answer:
Props
State
Definition Short for “properties,” passed from parent component to child component. User interactions or other events can change a component’s internal state over time.
Immutable Immutable (cannot be modified by the component receiving them) Mutable (can be adjusted using setState())
Update Trigger It can only be updated by the parent component passing in new props. You can update it by calling setState() or forceUpdate() within the component.
Usage Used to pass data from parent to child components. They manage components’ internal state and re-render based on state changes.
Scope It can be accessed throughout the component tree. It can only be accessed within the component where it is defined.
Q18. When to use a class component over a functional component?Answer:
Q19. How does one share the data between components in React?Answer:
Props: Using props is one method of transferring data from a parent component to a child component. Props are read-only, so the child component cannot alter the data passed through them.
Context: React context offers a mechanism to share data that any component within a specific context can access. It is most beneficial to share data necessary for multiple components, such as user authentication data.
Redux: Redux is a library for state management that offers a universal state store that any component can access. It enables components to dispatch actions to update the shop and subscribe to changes in the shop.
React Query: By caching and controlling the state of asynchronous data, React Query is a data fetching module that offers a mechanism to transfer data between components. Additionally, you can use React to manage the global state.
Local Storage: The ability to store data locally in the browser that may be accessed and shared by components is provided by local storage. We should only use local storage for modest amounts of data, not for confidential or sensitive data.
Q20. What are React hooks? Final ThoughtsMany businesses seek developers with experience in ReactJS, as it has become one of the most widely used JavaScript libraries for creating complex user interfaces. If one is preparing for the ReactJS interview, one should also prepare for JavaScript and must have practical hands-on. Preparing important concepts using interview questions can help one ace their interview.
Frequently Asked Questions (FAQs)Q1. How do I prepare for a React interview?
Answer: To prepare for a React interview, it’s essential to review the fundamentals of React, including its core concepts, lifecycle methods, and popular tools and libraries. You should also practice building small React applications and be able to explain your approach and decision-making process. Finally, be sure to research the company you’re interviewing with and familiarize yourself with their React-related projects or initiatives.
2. What is ReactJS used for?
Answer: ReactJS is a JavaScript library used for building user interfaces. It allows developers to create reusable UI components and manage the state of an application in a way that is efficient and easy to understand.
3. What questions are asked in interviews on ReactJS?
What is ReactJS?
What is Flux?
How do you define JSX?
What are Props and State?
What are refs?
4. How do you pass React interview questions?
Answer: To pass React interview questions, it’s essential to have a solid understanding of ReactJS’s core concepts and be able to apply them in practical scenarios. It’s also helpful to be familiar with popular React libraries and tools, such as Redux, React Router, and Jest. Practice building small React applications and be prepared to explain your thought process and decision-making. Finally, be confident, communicate clearly, and demonstrate a willingness to learn and adapt.
Recommended ArticlesWe hope that this EDUCBA information on “ReactJs Interview Questions” was beneficial to you. You can view EDUCBA’s recommended articles for more information.
Pca(Principal Component Analysis) On Mnist Dataset
This article was published as a part of the Data Science Blogathon.
Hello Learners, Welcome!
In this article, we are going to learn about PCA and its implementation on the MNIST dataset. In this article, we are going to implement the Principal Component Analysis(PCA) technic on the MNIST dataset from scratch. but before we apply PCA technic to the MNIST dataset, we will first learn what is PCA, the geometric interpretation of PCA, the mathematical formulation of PCA, and the implementation of PCA on the MNIST dataset.
So the dataset we are going to use in this article is called the MNIST dataset, which contains the information of handwritten digits 0 to 9. in this dataset the information of single-digit is stored in the form of 784*1 array, where the single element of 784*1 array represents a single pixel of 28*28 image. here the value of single-pixel varies from 0 to 1, where the black colour is represented by 1 and white by 0 and middle values represent the shades of grey.
Geometric Interpretation of PCA:Now let’s take an example, Suppose we have a DxN dimensional dataset called X, where the d = 2 and n = 20. and the two features of the dataset is f1 and f1,
Now let’s see that we make the scatter plot with this data and its data distribution is look like the figure shown below,
After seeing the scatter plot, you can easily say that the variance of feature f1 is much more than the variance of feature f2. The variability of f2 is unimportant compared to the variability of f1. if we have to choose one feature between f1and f1, we can easily select the feature f1. now let’s suppose that you cannot visualize 2d data and for visualizing the data you have to convert your 2d data into 1d data then what do you do? so the simple answer is you directly keep those features that have the highest variance. and remove those features which have less impact on the overall result. and that’s what PCA internally does.
So first of all we ensure that our data is standardized because performing the PCA on standardized data becomes much easier than original data.
So now again let’s see that we have a d*n dimensional dataset called X, where the d = 2 and n = 20. and the two features of the dataset are f1 and f2. and remember we standardized the data. but in this case, the scatter plot looks like this.
In this case, if we have to decrease dimensions from 2d to 1d then we can’t clearly select feature f1 or f2 because this time the variance of both features is almost the same both the features seem important. so how does PCA do it?
In this situation, PCA tries to draw the vector of line in the direction where the variance of data is very high. which means instead of projecting the data or measuring the variance in the f1 or f1 axis what if we quantify the variance in the f1′ or f2′ direction because measuring the variance in the f1′ or f2′ direction makes much more sense.
So PCA tries to find the direction of vector or line where the variance of data is very high. the direction of vector where the variance of data is highest is called PC1 ( Principal Component 1 ) and second-highest is called PC2 and third is PC3 and so on.
Mathematical Formulation of PCA:So we show the geometric intuition of PCA, we show that how does PCA reduces the dimensions of data. so PCA simply finds the direction and draws the vector where the variance of data is very high, but you might wonder how the PCA does it and how it finds the right direction of vector where the variance of data is very high. how the PCA calculates the angle and gives us the accurate slope. so PCA uses two techniques to find the angle of a vector. the two methods are Variance maximization and Distance Minimization. so let’s learn about them in brief
1. Variance Maximization: In this method, we simply project all the data points on the unit vector u1 and find the variance of all projected data points. We select that direction where the variance of projected points is maximum.
So let’s assume that we have two-dimensional datasets and the features of the dataset are f1 and f2, and xi is datapoint and u1 is our unit vector. and if we project the data point xi on u1 the projected point is xi’,
u1 = unit vector
f1 and f2 = features of dataset
xi = data point
xi’ = projection of xi on u1
now assume that D = { xi } (1 to n) is our dataset
and D’ = { xi’ } (1 to n) is our dataset of projected point of xi on u1.
now x^’ = u1T * x^ ……..(2) [ x^ = mean of x ]
so find u1 such that the variance{ projection of xi on u1 } is maximum
var {u1T * xi} (i is 1 to n)
if data is columns standardized then mean = 0 and variance = 1
so x^ = [0, 0, 0… .. . . . .0]
we want to maximize the variance.
2. Distance Minimization: So in this technique of PCA we are trying to minimize the distance of data point from u1 ( unit vector of length 1)
we want to minimize the sum of all distance squared.
Implementing PCA on MNIST dataset:So as we talked about the MNIST dataset earlier and we just complete our understanding of PCA so it is the best time to perform the dimensionality reduction technique PCA on the MNIST dataset and the implementation will be from scratch so without wasting any more time lets start it,
So first of all we import our mandatory python libraries which are required for the implementation of PCA.
import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns df = pd.read_csv('mnist_train.csv', nrows = 20000) print("the shape of data is :", df.shape) df.head()Hit Run to see the output
Extracting label column from the dataset
label = df['label'] df.drop('label', axis = 1, inplace = True) ind = np.random.randint(0, 20000) plt.figure(figsize = (20, 5)) grid_data = np.array(df.iloc[ind]).reshape(28,28) plt.imshow(grid_data, interpolation = None, cmap = 'gray') plt.show() print(label[ind])Plotting a random sample data point from The dataset using matplotlib imshow() method
Column standardization of our dataset using StandardScalar class of sklearn.preprocessing module. because after column standardization of our data the mean of every feature becomes 0 (zero) and variance 1. so we perform PCA from the origin point.
from sklearn.preprocessing import StandardScaler scaler = StandardScaler() std_df = scaler.fit_transform(df) std_df.shapeNow Find the Co-Variance matrix which is AT * A using NumPy matmul method. after multiplication, the dimensions of our Co-Variance matrix is 784 * 784 because AT(784 * 20000) * A(20000 * 784).
covar_mat = np.matmul(std_df.T, std_df) covar_mat.shapeFinding the top two Eigen-values and corresponding eigenvectors for projecting onto a 2D surface. The parameter ‘eigvals’ is defined (low value to high value), the eigh function will return the eigenvalues in ascending order and this code generates only the top 2 (782 and 783) eigenvalues.
converting the eigenvectors into (2,d) form for easiness of further computations
from scipy.linalg import eigh values, vectors = eigh(covar_mat, eigvals = (782, 783)) print("Dimensions of Eigen vector:", vectors.shape) vectors = vectors.T print("Dimensions of Eigen vector:", vectors.shape)here the vectors[1] represent the eigenvector corresponding 1st principal eigenvector
here the vectors[0] represent the eigenvector corresponding 2nd principal eigenvector
If we multiply the two top vectors to the Co-Variance matrix, we found our two principal components PC1 and PC2.
final_df = np.matmul(vectors, std_df.T) print("vectros:", vectors.shape, "n", "std_df:", std_df.T.shape, "n", "final_df:", final_df.shape)Now we vertically stack our final_df and label and then Transpose them, then we found the NumPy data table so with the help of pd.DataFrame we create the data frame of our two components with class labels.
final_dfT = np.vstack((final_df, label)).T dataFrame = pd.DataFrame(final_dfT, columns = ['pca_1', 'pca_2', 'label']) dataFrameNow let’s visualize the final data with help of the seaborn FacetGrid method.
sns.FacetGrid(dataFrame, hue = 'label', size = 8) .map(sns.scatterplot, 'pca_1', 'pca_2') .add_legend() plt.show()So you can see that we are successfully converted our 20000*785 data to 20000*3 using PCA. So this is how PCA is used to convert big extent to smaller ones.
What do we learn in this article? We took a brief intro about the PCA and mathematical intuition of PCA. This was all from me thank you for reading this article. I am currently pursuing a chúng tôi in CSE I loved to write articles in data Science. Hope you like this article.
Thank you.
The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.
Related
Pfp Nfts: All Your Questions Answered
CryptoPunks, Bored Apes, and Doodles — these are just a few of the countless influential generative avatar projects that live within the continuously growing non-fungible token (NFT) ecosystem. Commonly referred to as PFPs, an acronym for “profile picture,” these unique NFTs have undoubtedly become frontrunners in the NFT market.
So vast is the dominance of the generative avatar sector that many people outside of the NFT space consider the entirety of the non-fungible ecosystem to be just PFPs. Yet, with a lack of understanding about Web3 and blockchain technology, most are left wondering what PFPs are and why they exist in the first place.
As generative avatar projects continue to grow in popularity, and the debate surrounding whether or not NFTs truly need utility persists, it’s time to shed some light on how PFP NFT collections came to be in the first place. Here’s what you need to know.
What is a PFP NFT?Generative avatars, often called PFPs, are a unique facet of the NFT space. While most independent artists tend to focus on 1/1 pieces and limited edition collections, PFPs are generally conceived by groups of artists/developers and are released as a collection of thousands of individual tokens.
The history of PFP NFTs begins with CryptoPunks, a collection of 10,000 unique 24×24 pixel art images launched in June 2023 by product studio Larva Labs. Yet, Punks wouldn’t gain any significant traction until 2023, when NFTs first started to gain notoriety. Then, in 2023, the NFT ecosystem exploded with numerous other PFP projects, solidifying generative avatars in the NFT market.
While PFPs now have their own unique existence in the NFT space, they still live at the intersection of collectibles and generative art because of how they are created. They are collectibles in that they come in large quantities (usually 10,000 or so) and have varying degrees of rarity, somewhat similar to trading cards. And they are generative in the way that (like other forms of generative art) they are created partly through the use of an autonomous system.
PFPs usually resemble, well, social media profile pictures. The subject of a PFP is often captured only from the waist, chest, or sometimes even neck up. This way, generative avatars resemble (and are easily used as) profile pictures like those you would upload to Twitter, Instagram, or Facebook.
Zero trait CryptoPunks
In the NFT space, though, generative avatars have ventured beyond the usual profile-pic constraints. While many PFP collections emulate the Bored Apes chest-up style, others, like CrypToadz or Invisible Friends, have introduced full-body and even animated PFPs. Regardless of form, though, generative avatar collections are distinctive due to their vast quantities and how they are created.
So how exactly are PFP NFTs made? More often than not, PFPs are created by way of a simple plug-and-play method. Users load a variety of traits — like body type, head shape, background color, etc. — into software or an application that will, in turn, randomly compile vast quantities of NFTs, no two being the same. But these traits have to come from somewhere, which is why PFPs always start with art.
If we retrace the steps of popular projects like Bored Apes, Doodles, and Cool Cats, it’s clear that PFP endeavors always begin with an artist (or artists). In the case of the three aforementioned projects, those artists were Seneca, Burnt Toast, and Clon, respectively.
Even when the road gets rough, never stop believing in yourself 💙 chúng tôi clon (@cloncast) November 7, 2023
Once an artist has set their sights on a PFP endeavor or joined the cause and creative vision of a project, PFP projects become somewhat of a modular process throughout the planning and development phases.
Putting the pieces togetherYou’ve likely seen a statement to the effect of “all 10,000 NFTs were generated from over XXX number of traits” as part of the marketing language for many PFP projects. And all of these traits used to make PFPs are first created by an artist.
Once the project team has decided on a subject (ape, cat, duck, etc.) or an artist has settled on which of their signature characters/objects to duplicate, it’s time to figure out how to create the many thousands of NFTs that will form the collection. To do this, the project developers (devs) need to create a sort of jigsaw puzzle that can be assembled many times over via computer, i.e., the aforementioned plug-and-pay method.
Each NFT needs to be broken down into parts before it can be assembled, so the project’s lead artist must create a multitude of traits like body type, head shape, facial expression, accessories, arm position, background color, and more. For reference, check out the NFT ranking website Rarity Tools’ breakdown of all the top Bored Ape traits.
Aside from the simple creation process, artists have to consider the canvas placement of each attribute so that, once assembled, the entire composition will make sense, yielding a single cohesive NFT avatar. This means creating each NFT character in layers to ensure that ears, noses, eyes, and accessories sit right on the head of the avatar, and the head must sit right on the body, and so on.
With these layers separated, devs can load each trait into a program (like Python) or write some code that can mix and match the attributes to create single NFTs quickly. The programs vary depending on the team and whether they hope to generate each avatar beforehand or in real-time during the minting process.
More often than not, though, after a few generative trial runs to work out any potential errors, devs will generate the total supply of 10,000 or so unique, non-duplicated (but sometimes very similar) NFTs. Creating a 10,000 supply PFP project has become incredibly easy thanks to this plug-and-play method.
It’s important to note that not all PFP projects require a generative aspect. While generative avatars and PFPs often go hand in hand, at times, independent artists will take on a PFP project, illustrating or crafting batches of hundreds of NFTs in their style. One such example of this is Ghxsts — a project consisting of over 700 NFTs that were each drawn by hand as 1/1s but altogether seemingly fit the definition of a PFP collection.
Releasing PFPs onto the blockchainThe final step in the PFP process is the most public and likely best understood in the project pipeline: releasing these new PFP NFTs into the world.
Yet, this final step is entirely dependent on whether or not NFTs will be generated during mint or pre-generated offline and loaded into a sort of randomized, first-come-first-serve grab bag. For independent artists, like Jen Stark and her Cosmic Cuties, generating offline then loading the entire supply onto a marketplace like OpenSea and selling them over time usually makes the most sense.
In the case of large-supply projects, though, many other steps must be completed before PFPs go up for sale. Before release, devs focus on creating a website, setting up a smart contract, testing the minting functions of both the website and the smart contract, deciding on a sale method, setting a release date, price, and more.
This is because, above all else, the release of a project is quite possibly the most important event of all. For a high-level summary of the technical side of releasing a PFP collection, read security researcher Harry Denley’s post here.
Smart contracts and file hosting/sharing aside, this final step mostly comes down to whether or not the project team can generate hype, cultivate a following, and deliver a solid product to their potential collectors. We’ve seen collections time and time again (Mekaverse, Pixelmon, etc.) create obscene amounts of hype only to fall short of their potential — with others, seemingly too good to be true, succeeding beyond expectations.
The public sale has a lot to do with the overall perception of the project. Of course, whether or not the generative NFTs look good plays into things, but rollout mechanics, including whitelisting and pre-sales, Dutch auctions vs. releasing in waves, etc., all play a massive role in how a project is perceived.
Suppose this all goes smoothly and the collection sells out quickly. In that case, collectors are likely to be much more pleased with the process and the product — as opposed to when projects like Akutars charge full speed into a sale and hit significant and debilitating roadblocks.
But why would someone want to buy a PFP NFT in the first place? Well, aside from the potential of a generative avatar to increase in value, minting a PFP NFT often comes with perks, including membership into an exclusive holders-only Discord server, access to live and virtual events, first dibs on subsequent collections, and of course, bragging rights and the ability to change your social media profile pic to the unique new NFT you own.
The top PFP projects of all timeAs previously mentioned, Punks were launched in June of 2023 by Larva Labs, which Yuga Labs acquired in 2023. These Punks were some of the first NFTs ever minted on the Ethereum blockchain, making their importance immeasurable in the grand scheme of NFT history. Featuring humans, apes, zombies, and aliens, CryptoPunks pioneered the idea of generative trait combinations that most other PFP projects still draw inspiration from today.
Second only to CryptoPunks in importance, but undoubtedly first in popularity and value, is the Bored Ape Yacht Club. Also, a collection of 10,000 NFTs, BAYC, launched in April 2023. Although it experienced a slow start, the project exploded in value over the following months, becoming one of, if not the most beloved NFT project of all time.
Doodles, which also consists of 10,000 avatars, launched in October 2023 and has easily become one of the most popular PFP projects in all of NFTs. Featuring a vibrant community, seasoned executive team, and a multifaceted approach to entertainment, Doodles has yet to cease winning over the hearts of countless NFT enthusiasts and veteran NFT collectors.
Moonbirds, another collection of 10,000 NFTs, was created by prominent American internet entrepreneur Kevin Rose as part of his Proof Collective — a private members-only collective of NFT collectors and artists. Only a few days after its April 2023 launch, Moonbirds had already achieved upwards of 100,000 ETH (approximately $300 million at the time) in secondary sales volume, immediately making it one of the highest-grossing NFT collections of all time.
When it comes to community-driven projects, Cool Cats is a tough one to beat. A collection of 9,999 generated and, as the developers say, “stylistically curated” NFTs, the collection launched in June 2023, earning accolades and being propelled by collaborations with Ghxsts and TIME magazine, plus a near viral milk chug challenge.
These are only a few of the dozens of generative avatar projects that dominate the NFT market charts. Read about more interesting and influential PFP NFT projects here.
Update the detailed information about 20 Questions To Test Your Skills On Dimensionality Reduction (Pca) on the Minhminhbmm.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!