Trending March 2024 # 7 Most Commonly Asked Questions On Correlation # Suggested April 2024 # Top 6 Popular

You are reading the article 7 Most Commonly Asked Questions On Correlation updated in March 2024 on the website Hatcungthantuong.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested April 2024 7 Most Commonly Asked Questions On Correlation

Introduction

To begin with, if you are still struggling to understand the difference between correlation and causation, you should refer to my previous article where I’ve explained these concepts in the simplest possible manner.

Let’s proceed further and learn about the most commonly asked questions asked on correlation. If you are learning statistical concepts, you are bound to face these questions which mostly people try to avoid. For people like me, it should be a good refresher.

And if you’re looking to learn these questions for your data science interview, we are delighted to point you towards the ‘Ace Data Science Interviews‘ course! The course has tons of videos and hundreds of questions like these to make sure you’re well prepared for your next data science interview.

What you’ll learn ?

Does correlation and dependency mean the same thing? In simple words if two events have correlation of zero, does this convey they are not dependent and vice-versa?

If two variables have a high correlation with a third variable, does this convey they will also be highly correlated? Is it even possible that A and B are positively correlated to another variable C? Is it possible that A and B are negatively correlated with each other?

Can single outlier decrease or increase the correlation with a big magnitude? Is Pearson coefficient very sensitive to outliers?

Does causation imply correlation?

What’s the difference between correlation and simple linear regression?

How to choose between Pearson and Spearman correlation?

How would you explain the difference between correlation and covariance?

Answers to many of the above questions might seem intuitive, however you can find a few surprise factors in this article about correlation.

Let’s begin!

Understanding the Mathematical formulation of Correlation coefficient

The most widely used correlation coefficient is Pearson Coefficient. Here is the mathematical formula to derive Pearson Coefficient.

Explanation: It simply is the ratio of co-variance of two variables to a product of variance (of the variables). It takes a value between +1 and -1. An extreme value on both the side means they are strongly correlated with each other. A value of zero indicates a NIL correlation but not a non-dependence. You’ll understand this clearly in one of the following answers.

Answer – 1: Correlation vs. Dependency

A non-dependency between two variable means a zero correlation. However the inverse is not true. A zero correlation can even have a perfect dependency. Here is an example :

In this scenario, where the square of x is linearly dependent on y (the dependent variable), everything to the right of y axis is negative correlated and to left is positively correlated. So what will be the Pearson Correlation coefficient?

If you do the math, you will see a zero correlation between these two variables. What does that mean? For a pair of variables which are perfectly dependent on each other, can also give you a zero correlation.

Must remember tip: Correlation quantifies the linear dependence of two variables. It cannot capture non-linear relationship between two variables.

Good Read: Must Read Books in Analytics / Data Science

Answer – 2: Is Correlation Transitive?

Suppose that X, Y, and Z are random variables. X and Y are positively correlated and Y and Z are likewise positively correlated. Does it follow that X and Z must be positively correlated?

As we shall see by example, the answer is (perhaps surprisingly) “No.” We may prove that if the correlations are sufficiently close to 1, then X and Z must be positively correlated.

Let’s assume C(x,y) is the correlation coefficient between x and y. Like wise we have C(x,z) and C(y,z). Here is an equation which comes from solving correlation equation mathematically :

C(x,y) = C(y,z) * C(z,x) - Square Root ( (1 - C(y,z)^2 ) *  (1 - C(z,x)^2 ) )

Now if we want C(x,y) to be more than zero , we basically want the RHS of above equation to be positive. Hence, you need to solve for :

Wow, this is an equation for a circle. Hence the following plot will explain everything :

If the two known correlation are in the A zone, the third correlation will be positive. If they lie in the B zone, the third correlation will be negative. Inside the circle, we cannot say anything about the relationship. A very interesting insight here is that even if C(y,z) and C(z,x) are 0.5, C(x,y) can actually also be negative.

Answer – 3: Is Pearson coefficient sensitive to outliers?

Consider the last two graphs(X 3Y3 and X 4Y4). X3Y3 is clearly a case of perfect correlation where a single outlier brings down the coefficient significantly. The last graph is complete opposite, the correlation coefficient becomes a high positive number because of a single outlier. Conclusively, this turns out to be the biggest concern with correlation coefficient, it is highly influenced by the outliers.

Check your potential: Should I become a Data Scientist?

 

Answer – 4: Does causation imply correlation?

If you have read our above three answers, I am sure you will be able to answer this one. The answer is No, because causation can also lead to a non-linear relationship. Let’s understand how!

Below is the graph showing density of water from 0 to 12 degree Celsius. We know that density is an effect of changing temperature. But, density can reach its maximum value at 4 degree Celsius. Therefore, it will not be linearly correlated to the temperature.

 

Answer – 5: Difference between Correlation and Simple Linear Regression

These two are really close. So let’s start with a few things which are common for both.

The square of Pearson’s correlation coefficient is the same as the one in simple linear regression

Neither simple linear regression nor correlation answer questions of causality directly. This point is important, because I’ve met people thinking that simple regression can magically allow an inference that

X

causes. That’s preposterous belief.

What’s the difference between correlation and simple linear regression?

Now let’s think of few differences between the two. Simple linear regression gives much more information about the relationship than Pearson Correlation. Here are a few things which regression will give but correlation coefficient will not.

The slope in  a linear regression gives the marginal change in output/target variable by changing the independent variable by unit distance. Correlation has no slope.

The intercept in a linear regression gives the value of target variable if one of the input/independent variable is set zero. Correlation does not have this information.

Linear regression can give you a prediction given all the input variables. Correlation analysis does not predict anything.

 

Answer – 6: Pearson vs. Spearman

The simplest answer here is Pearson captures how linearly dependent are the two variables whereas Spearman captures the monotonic behavior of the relation between the variables.

For instance consider following relationship :

y = exp ( x )

Here you will find Pearson coefficient to be 0.25 but the Spearman coefficient to be 1. As a thumb rule, you should only begin with Spearman when you have some initial hypothesis of the relation being non-linear. Otherwise, we generally try Pearson first and if that is low, try Spearman. This way you know whether the variables are linearly related or just have a monotonic behavior.

 

Answer – 7: Correlation vs. co-variance

If you skipped the mathematical formula of correlation at the start of this article, now is the time to revisit the same.

Correlation is simply the normalized co-variance with the standard deviation of both the factors. This is done to ensure we get a number between +1 and -1. Co-variance is very difficult to compare as it depends on the units of the two variable. It might come out to be the case that marks of student is more correlated to his toe nail in mili-meters than it is to his attendance rate.

This is just because of the difference in units of the second variable. Hence, we see a need to normalize this co-variance with some spread to make sure we compare apples with apples. This normalized number is known as the correlation.

End Notes

Questions on correlation are very common in interviews. The key is to know that correlation is an estimate of linear dependence of the two variables. Correlation is transitive for a limited range of correlation pairs. It is also highly influenced by outliers. We learnt that neither Correlation imply Causation nor vice-versa.

Were you able to answer all questions in the beginning of this article? Did this article help you with any of your doubts on correlation? If you have any more questions on Correlation, we will be happy to answer them on our discussion portal.

If you like what you just read & want to continue your analytics learning, subscribe to our emails, follow us on twitter or like our facebook page.

Related

You're reading 7 Most Commonly Asked Questions On Correlation

30+ Most Important Data Science Interview Questions (Updated 2023)

            X               1               20                 30                    40

            Y               1               400                800                   1300

(A)  27.876

(B) 32.650

(C) 40.541

(D) 28.956

Answer: (D)

Explanation: Hint: Use the ordinary least square method.

Q5. The robotic arm will be able to paint every corner of the automotive parts while minimizing the quantity of paint wasted in the process. Which learning technique is used in this problem?

(A) Supervised Learning.

(B) Unsupervised Learning.

(C) Reinforcement Learning.

(D) Both (A) and (B).

Answer: (C)

Explanation: Here robot is learning from the environment by taking the rewards for positive actions and penalties for negative actions.

Q6. Which one of the following statements is TRUE for a Decision Tree?

(A) Decision tree is only suitable for the classification problem statement.

(B) In a decision tree, the entropy of a node decreases as we go down the decision tree.

(C) In a decision tree, entropy determines purity.

(D) Decision tree can only be used for only numeric valued and continuous attributes.

Answer: (B)

Explanation: Entropy helps to determine the impurity of a node, and as we go down the decision tree, entropy decreases.

Q7. How do you choose the right node while constructing a decision tree?

(A) An attribute having high entropy

(B) An attribute having high entropy and information gain

(C) An attribute having the lowest information gain.

(D) An attribute having the highest information gain.

Answer: (D)

Explanation: We select first those attributes which are having maximum information gain.

Q8. What kind of distance metric(s) are suitable for categorical variables to find the closest neighbors?

(A) Euclidean distance.

(B) Manhattan distance.

(C) Minkowski distance.

(D) Hamming distance.

Answer: (D)

Explanation: Hamming distance is a metric for comparing two binary data strings, i.e., suitable for categorical variables.

Q9. In the Naive Bayes algorithm, suppose that the prior for class w1 is greater than class w2, would the decision boundary shift towards the region R1(region for deciding w1) or towards region R2 (region for deciding w2)?

(A) towards region R1.

(B) towards region R2.

(C) No shift in decision boundary.

(D) It depends on the exact value of priors.

Answer: (B)

Explanation: Upon shifting the decision boundary towards region R2, we preserve the prior probabilities proportion since the prior for w1 is greater than w2.

Q10. Which of the following statements is FALSE about Ridge and Lasso Regression?

(A) These are types of regularization methods to solve the overfitting problem.

(B) Lasso Regression is a type of regularization method.

(C) Ridge regression shrinks the coefficient to a lower value.

(D) Ridge regression lowers some coefficients to a zero value.

Answer: (D)

Explanation: Ridge regression never drops any feature; instead, it shrinks the coefficients. However, Lasso regression drops some features by making the coefficient of that feature zero. Therefore, the latter is used as a Feature Selection Technique.

Q11. Which of the following is FALSE about Correlation and Covariance?

(A) A zero correlation does not necessarily imply independence between variables.

(B) Correlation and covariance values are the same.

(C) The covariance and correlation are always the same sign.

(D) Correlation is the standardized version of Covariance.

Answer: (B)

Explanation: Correlation is defined as covariance divided by standard deviations and, therefore, is the standardized version of covariance.

Q12. In Regression modeling, we develop a mathematical equation that describes how, (Predictor-Independent variable, Response-Dependent variable)

(A) one predictor and one or more response variables are related.

(B) several predictors and several response variables response are related.

(C) one response and one or more predictors are related.

(D) All of these are correct.

Answer: (C)

Explanation: In the regression problem statement, we have several independent variables but only one dependent variable.

Q13. True or False: In a naive Bayes algorithm, the entire posterior probability will be zero when an attribute value in the testing record has no example in the training set.

(A) True

(B) False

(C) Can’t be determined

(D) None of these

Answer: (A)

Q14. Which of the following is NOT true about Ensemble Learning Techniques?

(A) Bagging decreases the variance of the classifier.

(B) Boosting helps to decrease the bias of the classifier.

(C) Bagging combines the predictions from different models and then finally gives the results.

(D) Bagging and Boosting are the only available ensemble techniques.

Answer: (D)

Explanation: Apart from bagging and boosting, there are other various types of ensemble techniques such as Stacking, Extra trees classifier, Voting classifier, etc.

Q15. Which of the following statement is TRUE about the Bayes classifier?

(A) Bayes classifier works on the Bayes theorem of probability.

(B) Bayes classifier is an unsupervised learning algorithm.

(C) Bayes classifier is also known as maximum apriori classifier.

(D) It assumes the independence between the independent variables or features.

Answer: (A)

Explanation: Bayes classifier internally uses the concept of the Bayes theorem for doing the predictions for unseen data points.

Q16. How will you define precision in a confusion matrix?

(A) It is the ratio of true positive to false negative predictions.

(B) It is the measure of how accurately a model can identify positive classes out of all the positive classes present in the dataset.

(C) It is the measure of how accurately a model can identify true positives from all the positive predictions that it has made

(D) It is the measure of how accurately a model can identify true negatives from all the positive predictions that it has made

Answer: (C)

Explanation: Precision is the ratio of true positive and (true positive + false positive), which means that it measures, out of all the positive predicted values by a model, how precisely a model predicted the truly positive values.

Q17. What is True about bias and variance?

(A) High bias means that the model is underfitting.

(B) High variance means that the model is overfitting

(C) Bias and variance are inversely proportional to each other.

(D) All of the above

Answer: (D)

Explanation: A model with high bias is unable to capture the underlying patterns in the data and consistently underestimates or overestimates the true values, which means that the model is underfitting. A model with high variance is overly sensitive to the noise in the data and may produce vastly different results for different samples of the same data. Therefore it is important to maintain the balance of both variance and bias. As they are inversely proportional to each other, this relationship between bias and variance is often referred to as the bias-variance trade-off.

Q18. Which of these machine learning models is used for classification as well as regression tasks?

(A) Random forest

(B) SVM(support vector machine)

(C) Logistic regression

(D) Both A and B

Answer: (D)

Explanation: Support Vector Machines (SVMs) and Decision Trees are two popular machine-learning algorithms that can be used for classification and regression tasks.

A. It is computationally expensive

B. It can get stuck in local minima

C. It requires a large amount of labeled data

D. It can only handle numerical data

Answer: (B)

Explanation: It can get stuck in local minima

Data Science Interview Questions on Deep Learning Q19. Which of the following SGD variants is based on both momentum and adaptive learning?

(A) RMSprop.

(B) Adagrad.

(C) Adam.

(D) Nesterov.

Answer: (C)

Explanation: Adam, being a popular deep learning optimizer, is based on both momentum and adaptive learning.

Q20. Which of the following activation function output is zero-centered?

(A) Hyperbolic Tangent.

(B) Sigmoid.

(C) Softmax.

(D) Rectified Linear unit(ReLU).

Answer: (A)

Explanation: Hyperbolic Tangent activation function gives output in the range [-1,1], which is symmetric about zero.

Q21. Which of the following is FALSE about Radial Basis Function Neural Network?

(A) It resembles Recurrent Neural Networks(RNNs) which have feedback loops.

(B) It uses the radial basis function as an activation function.

(C) While outputting, it considers the distance of a point with respect to the center.

(D) The output given by the Radial basis function is always an absolute value.

Answer: (A)

Explanation: Radial basis functions do not resemble RNN but are used as an artificial neural network, which takes a distance of all the points from the center rather than the weighted sum.

Q22. In which of the following situations should you NOT prefer Keras over TensorFlow?

(A) When you want to quickly build a prototype using neural networks.

(B) When you want to implement simple neural networks in your initial learning phase.

(C) When doing critical and intensive research in any field.

(D) When you want to create simple tutorials for your students and friends.

Answer: (C)

Explanation: Keras is not preferred since it is built on top of Tensorflow, which provides both high-level and low-level APIs.

Q23. Which of the following is FALSE about Deep Learning and Machine Learning?

(A) Deep Learning algorithms work efficiently on a high amount of data and require high computational power.

(B) Feature Extraction needs to be done manually in both ML and DL algorithms.

(C) Deep Learning algorithms are best suited for an unstructured set of data.

(D) Deep Learning is a subset of machine learning

Answer: (B)

Explanation: Usually, in deep learning algorithms, feature extraction happens automatically in hidden layers.

Q24. What can you do to reduce underfitting in a deep-learning model?

(A) Increase the number of iterations

(B) Use dimensionality reduction techniques

(C) Use cross-validation technique to reduce underfitting

(D) Use data augmentation techniques to increase the amount of data used.

Answer: (D)

Explanation: Options A and B can be used to reduce overfitting in a model. Option C is just used to check if there is underfitting or overfitting in a model but cannot be used to treat the issue. Data augmentation techniques can help reduce underfitting as it produces more data, and the noise in the data can help in generalizing the model.

Q25. Which of the following is FALSE for neural networks?

(A) Artificial neurons are similar in operation to biological neurons.

(B) Training time for a neural network depends on network size.

(C) Neural networks can be simulated on conventional computers.

(D) The basic units of neural networks are neurons.

Answer: (A)

Explanation: Artificial neuron is not similar in working as compared to biological neuron since artificial neuron first takes a weighted sum of all inputs along with bias followed by applying an activation function to give the final result, whereas the working of biological neuron involves axon, synapses, etc.

Q26. Which of the following logic function cannot be implemented by a perceptron having 2 inputs?

(A) AND

(B) OR

(C) NOR

(D) XOR

Answer: (D)

Explanation: Perceptron always gives a linear decision boundary. However, for the Implementation of the XOR function, we need a non-linear decision boundary.

Q27. Inappropriate selection of learning rate value in gradient descent gives rise to:

(A) Local Minima.

(B) Oscillations.

(C) Slow convergence.

(D) All of the above.

Answer: (D)

Explanation: The learning rate decides how fast or slow our optimizer is able to achieve the global minimum. So by choosing an inappropriate value of learning rate, we may not reach the global minimum; instead, we get stuck at a local minimum and oscillate around the minimum, because of which the convergence time increases.

Data Science Interview Questions on Coding Q28.What will be the output of the following python code?

7 Lessons From My Most Successful Guest Posts

Guest posting is one of the most popular content marketing tactics because, when done right, it can help you grow your online audience.

Despite its usefulness, however, many people have tried it only to get mediocre results.

I’ve done several guest posts of my own. I’ve been fortunate enough to get featured on popular sites including Kissmetrics, Unbounce, and Entrepreneur.

Aside from publishing on blogs in my niche, I also experimented with publishing on other platforms and targeting lesser known blogs.

Some guest posts sent me over a hundred subscribers in a short amount of time, while other posts resulted in hardly any traffic at all. Some guest posts sent me ongoing traffic while other guest articles sent a blip of traffic only to fizzle shortly after it was published.

In this article, I’m going to share some insights into what made some guest posts significantly more successful than others and some tips on how you can get better results from guest blogging.

1. Optimize Your Guest Post to Rank in Search Engines

When guest posting, most bloggers don’t really think about SEO too much. Guest bloggers are usually more focused on creating engaging content for the host blog’s audience or simply just getting featured.

After all, it’s someone else’s blog, so why should you care about SEO?

The obvious answer is that a blog post that is ranking in the search engines can get ongoing traffic, which means more ongoing exposure for your article and your business.

Furthermore, a well-written guest post on a popular blog can attract natural links from that blog’s readers. And once that post ranks in the search engines, it can attract even more links from people who discover the article.

Here’s an example:

My guest post on SmartBlogger is ranking for the term “expert roundup”, which is a phrase that bloggers sometimes search for. I discussed how I was able to create a roundup that got over 5,000 social shares and some of the details that other bloggers miss, which results in most roundups getting less than 100 shares.

By ensuring that the article was of higher quality than competing articles and continuing to promote it, I was able to get the article ranked on the first page of Google.

This post continued to earn backlinks after it was published and sent ongoing traffic to my site, long after it was first published. Ranking for this article would have been a lot more work if I didn’t tap into SmartBlogger’s built-in audience and authority.

2. Promote Old Guest Posts

One reason that guest blogging is so alluring is that we can tap into another blog’s already established audience. Because of the built-in exposure, most guest bloggers don’t bother to promote their guest posts too much on their own.

However, promoting a guest post that you wrote on an authority site can be a great way to make sure it keeps getting you exposure and sending you traffic.

There are several ways you can promote old guest posts, especially ones that have performed well in the past.

You can reference your old guest posts through other guest posts or articles on your own blog. Even if the links are required to be nofollowed, referencing old posts will allow more people to find out about your content which could result in more natural links.

You can also reference old guest posts if you post regularly from the same blog. In addition to helping more people discover your content, the internal links will also help with SEO.

For example, I’ve written multiple articles for Search Engine Journal. Sometimes I’ll mention one of my older SEJ articles in my newer guest posts.

Adding an image along with some descriptive text can be a good way to draw attention to the link.

See how that works?

3. Look for Blogs Outside Your Niche That Have an Audience Interested in What You Do

For guest blogging, most people want to get featured on top tier blogs in their niche. Getting featured on top tier blogs is great for a blogger’s credibility and perceived authority.

On the other hand, some of the top tier blogs that I published on hardly sent me any traffic at all.

On industry blogs, you are competing with lots of other experts that specialize in the same thing. When you target blogs outside of your niche, you can stand out as an expert because your peers aren’t guest posting there.

You may have to experiment and try guest posting to see if a blog deliver results. But don’t rule out publishing on blogs just because they aren’t the most well-known blogs in your niche.

4. Infuse Credibility Into Your Blog Articles

Another thing that you should make a conscious effort to do is infuse credibility into your guest posts. Include screenshots, data, results, and anything else that will boost your credibility in the reader’s eyes.

When writing a blog post, bloggers sometimes get so focused on the content that they forget to insert credibility boosters.

The first time I wrote about how to do an expert roundup that goes viral was on Moz. My post was the most shared post of the week for the phrase “blog promotion”, so in this guest post, I included a screenshot of my post on BuzzSumo.

Including the screenshot of my results gives me more credibility than other bloggers who are simply writing “expert roundup” guides.

Most bloggers will insert credibility boosters into some of their articles, but remember to look for opportunities to add credibility statements into all of your articles.

Can you find my credibility boosters in this article?

5. Look for Blogs That Promote Past Guest Posts

If you want your guest post to have long term value, then find host blogs that promote older articles.

Here are a couple of things I like to look for:

Internal links to old articles: Internal links help with SEO and also drive new visitors to explore old content. Search Engine Journal, Blogging Wizard, and Advanced Web Ranking are a few sites that include internal links in their articles.

Social media: Some blogs will schedule social shares of old articles. Check their Twitter feeds and other social media channels to see if they are sharing old content.

If you are publishing on a blog that promotes past guest posts, you should also pick a topic that other writers will want to reference frequently.

For example, Brian Dean wrote this guide about how to do email outreach on SmartBlogger. Since email outreach is a popular blog promotion technique, that article got referenced a few times by other articles on SmartBlogger.

6. Plan to Commit to Guest Posting for the Long Term

Guest posting is a long-term strategy, so be sure to guest post consistently and approach it with a long-term mindset. Guest posting for the same blog multiple times allows you to gain repeated exposure to that blog’s audience.

It takes an average of 5-7 impressions before someone remembers your brand, so you will want to guest post multiple times for the best results.

Leo Widrich wrote 150 guest posts to kick start Buffer’s growth and get his first 100,000 customers. However, these results came from 9 months of intense writing.

Guest blogging also might not necessarily send tons of traffic, but sometimes it’s more important to get the right kind of traffic – the kind that converts well.

7. Repurpose Winning Content Onto Other Blogs

One important variable that can really boost your guest blogging effectiveness is creating content that stands out from the competition. However, even top influencers struggle to create content that stands out on a regular basis.

So if you do create a guest post that does really well, then consider repurposing it on other blogs.

I used this strategy a few times to leverage my ideas and scale my blog’s growth. One great example was my LinkedIn publishing study, which accumulated more than 4,000 social shares and became one of the most shared articles on my blog.

LinkedIn had just opened up its Pulse platform to the public, but it was still in closed beta (meaning you had to apply and be accepted to contribute).

Seeing how popular my article was, I also published a similar article on Mirasee’s blog, which resulted in a quick gain of more than 100 subscribers.

I also wrote about my experience with LinkedIn publishing on Matthew Woodward’s blog. Some other people saw my post here and asked me to guest post on their sites, including Ahrefs and WordStream.

You don’t have to cover the exact same ideas in every guest post. I added additional insights that I discovered along the way to make each guest post unique.

Summary

Guest posting is one of the best strategies to improve your SEO results, get you in front of your target audience, and become a thought leader in your niche. For the best results:

Promote your old guest posts.

Experiment with writing for different blogs.

Be consistent and committed to guest posting for the long term.

More Guest Posting Resources:

Image Credits

Interview Questions On Support Vector Machines

Introduction

Support vector machines are one of the most widely used machine learning algorithms known for their accuracy and excellent performance on any dataset. SVM is one of the algorithms that people try on almost any kind of dataset, and due to the nature and working mechanism of the algorithm, it learns from the data as well, no matter how the data is and what type it is.

This article will discuss and answer the intervention on support vector machines with proper explanations and reasons behind them. This will help one to answer these questions efficiently and accurately in the interview and will also enhance the knowledge on the same.

Learning Objectives

After going through this article, you will learn.

Kernal tricks and margin concepts in SVM

A proper answer to why SVM needs longer training duration and why it is nonparametric

An efficient way to answer questions related to SVM

How interview questions can be tackled in an appropriate manner

This article was published as a part of the Data Science Blogathon.

Table of Contents

How would you explain SVM to a nontechnical person?

What are the Assumptions of SVM?

Why is SVM a nonparametric algorithm?

When do we consider SVM as a Parametric algorithm?

What are Support vectors in SVM?

What are hard and soft-margin SVMs?

What are Slack variables in SVM?

What could be the minimum number of support vectors in N-dimensional data?

Why SVM needs a long training duration?

What is the kernel trick in SVM?

Conclusion

Q1. How Would You Explain SVM to a Nontechnical Person?

As we can see in the above image, there are a total of three lines that are present on the road; the middle line divides the route into two parts, which can be understood as a line dividing for positive and negative values, and the left and right bar are them which signifies the limit of the road, means that after this line, there will be no driving area.

Same way, the support vector machine classifies the data points which the help of regression and support vector lines, here the upper and lower or the left and suitable vectors are limited for the positive and negative values, and any data point lying after these lines are considered as a positive and negative data point.

Q2. What are the Assumptions of SVM?

There are no certain assumptions about the SVM algorithm. Instead, the algorithm learns from the data and its patterns. If any data is fed to the algorithm, the algorithm will take time to learn the patterns of the data, and then it will result accordingly to the data and its behavior.

Q3. Why Support Vector Machine is a Nonparametric Algorithm?

Nonparametric machine learning algorithms assume any assumption during the model’s training. In these types o n and function that will be used during the training and testing phase of the model; instead, the model trains on the patterns of the. Instead returns an output.

Q4. When do we consider SVM as a Parametric Algorithm?

In the case of linear SVM, the algorithm tries to fit the data linearly and produces a linear boundary to split the data; here, as the regression line or the boundary line is linear, its principle is the same as the linear regression, and hence the direct function can be applied to solve the problem, which makes the algorithm parametric.

Q5. What are Support Vectors in SVM

Support vectors in SVM are data points, or we can call them regression line, which divides or classifies the data. The data points or the observations that fall below or above the support vectors are then classified accordingly to their category.

In SVMs, the support vector is considered for classifying the data observations, and they are only responsible for the accuracy and,d the performance of the model. Here the distance between the vectors should be maximized to increase the model’s accuracy. The points should fall after the support vector; some data points can lie before or between support vectors.

Q6. What is Hard and Soft margin SVMs?

As shown in the below image, some of the data points in soft margin SVM are not precisely lying inside their margin limits. Instead, they are crossing the boundary and lying a. Instead. Rather, instead. Instead, such a distance from their respective vector line.

Whereas the hard margin SVM are those in which the data points are restricted to lie after their respective vector and are not allowed to cross the margin limit, which can be seen in the above image.

Q7. What are Slack Variables in SVM?

Slack variables in SVM are defined in so if margin algorithm that how much a particular data observation is allowed to violet the limit of the support vector and go beyond or above it. Here note that the more the slack variable, the violation of the support vector. To get an optimum model, we need to reduce the slack variable as much as possible.

Q8. What Could be the Minimum Number of Support Vectors in “N” Dimensional Data?

To classify the data points into their respective classes, there could be a minimum of two support vectors in the algorithm. Here, the data’s time or size will not affect the number of vectors, as per the general understanding of the algorithm. Theds a minimum of two support vectors to classify the data (in case of binary classification).

Q9. Why SVM needs a Longer Training Duration?

As we mentioned, SVMs are a nonparametric machine learning algorithm that does not rely on any specified function; instead, they learn the data patterns and then return an output. Due to this, the model needs time to analyze and sklearn from the data, unlike the parametric model, which implements the function to train on data.

Q10. What are Kernel Tricks in SVMs?

The support vectors in SVMs are one of the best approaches to solving data patterns and can classify the linearly separable data set. Still, in the case of nonlinear data, the same decision boundary can not be used as it will perform inferior, and that is where the kernel trick comes into action.

The kernel trick allows the support vector to separate between nonlinear data classes and classify nonlinear data with the exact working mechanism.

Here, several functions are kernel tricks, and some popular kernel functions are linear, nonfunctions linear, polynomial, and sigmoid.

Conclusion

In this article, we discussed the support vector machine and some interview questions related to the same. This will help one answer these questions efficiently and correctly and enhance knowledge about this algorithm.

Some of the Key Takeaways from this article are:

Support Vector Machines are one of the best-performing machine learning algorithms which use its support vector to classify the data and its classes.

Complex margin support vectors do not allow data points to cross their respective vectors, whereas, in soft margin SVM, there are no complicated rules, and some of the data points travel the margin.

A support vector machine is a nonparametric model that takes more time for training, but the algorithm’s learning is not limited.

In the case of nonlinear data, the kernel function can be used in SVM to solve the data patterns.

Wants to contact the author?

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion. 

Related

Top Interview Questions On Dictionary In Python

This article was published as a part of the Data Science Blogathon.

Intro

In python, Dictionary is an unordered collection of data values, i.e., key: value pair within the curly braces. The keys in the dictionary are unique (can’t be repeated), whereas values can be duplicated. Questions on Dictionary are often asked in interviews due to its massive use during projects.

Therefore, having a piece of good knowledge about dictionary for every Data Scientist aspirant.

In this article, some critical theoretical as well as practical questions will be discussed, which will help aspirants have a good understanding of the Dictionary.

Interview Questions on Dictionary

Question 1: What is a dictionary?

Dictionary is a set of key: value pairs, with each pair being unique. The dictionary can be created by using empty braces {}. We can add a key: value pair to it.

eg-  dictionary1 = { ‘a’: 1, ‘b’: 2, ‘c’: 3 }

Question 2: Are dictionaries case-sensitive?

Yes, dictionaries are case-sensitive, i.e., the same name of keys, but different cases are treated differently, i.e., ‘apple’ and ‘APPLE’ will be treated as separate keys.

Question 3: What are different ways of creating a Dictionary?

Three different ways of creating a Dictionary are:

1. Create an empty Dictionary

Dictionary1 = {} print(Dictionary1)

Output:

{} key1 = 'a' value1 = 1 Dictionary1[key1] = value1

Output:

{'a': 1}

2. Create Dictionary using dict() method

Dictionary1 = dict({1: 'a', 2: 'b'}) print(Dictionary1)

Output:

{1: 'a', 2: 'b'}

3. Create Dictionary with each item as Pair

Dictionary1 = dict([(1,'a'), (2, 'b')]) print(Dictionary1)

Output:

{1: 'a', 2: 'b'}

4. Creating Dictionary directly

Dictionary1 = {1: 'a', 2: 'b'}

Output:

{1: 'a', 2: 'b'}

Question 4: What is a Nested Dictionary? How is it created?

A dictionary inside the dictionary is known as a “Nested Dictionary”. For ex-

dictionary1 = {1: {'roll': '101', 'name': 'sam'},                           2: {'roll': '102', 'name': 'ram'}} print(dictionary1)

Output

{1: {'roll': '101', 'name': 'sam'}, 2: {'roll': '102', 'name': 'ram'}}

The elements of nested dictionary can be accessed using

print(dictionary[1]['roll'])

Output:

101

Question 5: How do you add an element in Dictionary?

Elements in a Dictionary can be added in multiple ways:

1. Adding one pair at a time

Dict1 ={} Dict1[0] = 'a' Dict1[1] = 'b' print("Dictionary after adding 3 elements: ", Dict1)

Output:

{0: 'a', 1: 'b' }

2. Adding more than one value to a single key

Dict1['values'] = 4, 5, 6 print("Dictionary after adding multiple values to a key: ", Dict1)

Output:

{0: 'a', 1: 'b', 'values': (4, 5, 6) }

3. Adding nested key-value pair

Dict1['Nested'] = {1: 'Analytics', 2: 'Life'}

Output:

{0: 'a', 1: 'b', 'values': (4, 5, 6), 'Nested': {1: 'Analytics', 2: 'Life'} }

Question 6: Discuss different methods used with Dictionary.

Various methods used with Dictionary are:

1. clear()

It is used to delete all elements from a dictionary i.e., to create empty dictionary.

dict2 = {1: 'Analytics', 2: 'Vidhya'} dict2.clear() print(dict2)

Output:

{ }

2. get()

It is used to get the value of the specified key.

x = dict2.get(2) print(x)

Output:

Vidhya

3. copy()

It is used to return copy of a dictionary

dict3 = dict2.copy() print(dict3)

Output:

{1: 'Analytics', 2: 'Vidhya'}

4. items()

It is used to return a list tuples consisting of key-value pairs.

Dict1 = {1: 'Analytics', 2: 'Vidhya'} print(Dict1.items())

Output:

dict_items([(1, 'Analytics'), (2, 'Vidhya')])

5. keys() and values()

Returns all keys and values within a dictionary respectively.

Dict1.key() Dict1.values()

Output:

dict_keys([1, 2])  dict_values(['Analytics', 'Vidhya'])

6. update()

This method updated value of a key in dictionary

Dict1.update({2:"Blogathon"}) print(Dict1)

Output:

{1: 'Analytics', 2: 'Blogathon'}

Question 7: Create a dictionary from a given list. For instance-

Input : [1, ‘a’, 2, ‘b’, 3, ‘c’] Output : {1: ‘a’, 2: ‘b’, 3: ‘c’}

def Convert_list_dict(dict2):

x = iter(dict2)

res_dct1 = dict(zip(x, x))

return res_dct1

dict1 = [1, ‘a’, 2, ‘b’,3, ‘c’]

print(Convert_list_dict(dict1))

Here, zip() function takes iterables (it can be more than two also) and combines them in a tuple.

Output:

{1: 'a', 2: 'b', 3: 'c'}

Question 8: Create a list of tuples from the dictionary

The list of tuples can be created in following way:

dict1 = { 1: 'a', 2: 'b', 3: 'c' } lst1 = list(dict1.items()) print(lst1)

Output:

[(1, 'a'), (2, 'b'), (3, 'c')]

Question 9: Create a list from the dictionary.

Suppose the given dictionary is:

dict1 = { 1: 'a', 2: 'b', 3: 'c' }

A list can be created using the below code:

x = list(dict1.keys()) y = list(dict1.values()) for i in y:       x.append(i) print(x)

Output:

[1, 2, 3, 'a', 'b', 'c']

Question 10: How can you delete key-value pair from Dictionary?

Key-value pair can be deleted by using ‘del’ keyword as shown below:

del dict1[1] print(dict1)

Output:

{2: 'b', 3: 'c' }

Question 11: Is the dictionary mutable?

The term ‘Mutable’ means we can add, remove or update key-value pairs in a dictionary.

Yes, the dictionary is mutable. For instance,

Dict1 = {1: 'a', 2: 'b', 3: 'c', 4: 'd' } Dict1[2] = 'h' print(Dict2)

Output:

{1: 'a', 2: 'h', 3: 'c', 4: 'd' }

Question 12: Given two lists, create a dictionary from them.

Input: [ 1, 2, 3, 4, 5], [‘a’, ‘b’, ‘c’, ‘d’, ‘e’]

Output: {1: ‘a’, 2: ‘b’, 3: ‘c’, 4: ‘d’, 5: ‘e’}

Let’s define these two lists as list1 and list2 as follows:

list1 = [1, 2, 3, 4, 5] list2 = ['a', 'b', 'c', 'd', 'e'] dict1 = {} for i, j in zip(list1, list2): dict1[i] = j print(dict1)

Output:

{1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e'}

Another way of achieving the same output:

dict1 = {i:h for i,j in zip(list1, list2)} print(dict1)

Output:

{1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e'}

Question 13: Write a code to sort dictionaries using a key.

Input: {2: ‘Apple’, 1:’Mango’, 3:’Orange’, 4:’Banana’}

4: Banana

Below is the code to sort dictionaries using the key:

dict1 = {2: 'Apple', 1:'Mango', 3:'Orange', 4:'Banana'} print(sorted(dict1.keys())) for key in sorted(dict1):       print("Sorted dictionary using key:",(key, color_dict[key]))

Output:

[1, 2, 3, 4]  1: Mango 2: Apple 3: Orange 4

Conclusion

In this blog, we studied some of the important and frequently asked interview questions on Dictionary. To sum up, the following are the major contributions of the article:

1. Basic concepts of the Dictionary have been discussed to make the reader familiar with it.

2. We learned how to perform various functions on Dictionary, such as adding key-value pairs and deleting key-value pairs.

3. We discussed various functions that can be used to work and play with Dictionary.

4. Further, we also discussed several programming questions on Dictionary that can be asked in interviews.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Related

I Asked Chatgpt About Ethereum’s Performance, It Predicted…

Ethereum [ETH] immediately soared to a three-month high following the partial victory of Ripple [XRP] in its legal battle with the U.S. Securities and Exchange Commission (SEC) on 13 July. It surpassed the $2,000-price mark the next day but fell thereafter.

The U.S. District Court of the Southern District of New York ruled in its judgement that the sale of Ripple’s XRP tokens on crypto exchanges and though programmatic sales did not constitute investment contracts; hence, it is not a security in this case. But the court also ruled that the institutional sale of the XRP tokens violated federal securities laws.

The first quarter of the new year brought a stop to the heartbreak crypto investors repeatedly experienced in 2023. However, the balance of the crypto-market has been nothing close to the AI hype of the same period. What is the sole reason, though? ChatGPT!

In fact, the natural language processing tool has accustomed itself to providing human-like conversations.

The good thing is— The broader crypto ecosystem has not been left out of the trend. So, with the Ethereum Shanghai upgrade set in motion, I spoke to ChatGPT about the development while touching a bit on ETH’s price.

Understanding the Shanghai Upgrade

So, for this article, I decided to test the AI’s intelligence regarding one of the major upgrades of the crypto space this year – Ethereum’s Shanghai Upgrade. Proposed in 2023, the upgrade is the most significant development of the second-largest blockchain since the Merge.

For a while, assets were allocated to the Ethereum Beacon Chain. The Beacon Chain is the consensus mechanism for the 2023 Proof-of-Stake (PoS) transition. Thereby, making sure that newly created blocks and validators are duly rewarded.

However, in this case, each validator needs 32 ETH deposited into the Ethereum Mainnet to qualify. The idea of the Shanghai upgrade was scheduled for March 2023; however, it was completed on 12 April with a delay. This, to allow these validators to begin withdrawal of their rewards.

ChatGPT, on the other hand, has existed for some years. However, its recent push by OpenAI has shown that its ability is one that no other AI product may be able to match up with.

Here’s where it gets interesting. I openly admit that ChatGPT could be one of the best innovations of this decade. However, my views on this incredible development won’t allow me to keep my hands to myself. So, I decided to test its knowledge about the Shanghai upgrade. Trust me, you will be amazed at its response.

Looking at its response above, it’s evident it started by correcting me. Some would say it has a point too. However, a further evaluation showed that it acted like it was not yet in 2023. Notably, it made some errors with the definition.

ChatGPT can’t remember Merge?

A notable observation is its mention of the PoS switch, popularly called the Merge. This is an event that took place in September 2023. Even so, it still responded like it is a future event. But no, I’m not blaming its capability as it is a learning tool. So, to further assess its knowledge, I educated it, or shall I say “jailbreak-ed” it by having a heart-to-heart conversation.

Something I find interesting about ChatGPT is not only its smartness, but its human feel too. As shown below, I tried to educate it on what the upgrade was. And to be honest, I never expected an apology from a bot. But yes, I got it.

However, it again failed to give the correct answer to my inquiry. Although I must applaud it for giving bits and pieces of related information.

While it did not get to the Testnet stages that the blockchain had reached and passed, it is worth noting that the Sepolia and Goerlii Testnets have been forked. However, Ethereum developer Tim Beiko had on 14 March said that several validators had failed to upgrade on the Beacon Chain.

Also, this has caused some issues with the nodes on Georli, with Beiko noting that the development team is working on it so it does not affect the Mainnet upgrade.

Now, let’s get back to ChatGPT. As you probably know, developments in the crypto-ecosystem sometimes lead to a hike in tokens related to projects. Unfortunately, that was not the case for ETH during the Merge. In fact, the altcoin’s price was shredded after many looked forward to an uptick.

That sentiment, as the next upgrade approaches, is similar among some investors. In light of this, I decided to ask ChatGPT’s opinion about the matter.

ChatGPT tells me about Ethereum’s future performance

Remember how I said it apologized and gave me a human-like feel? This time, it was different and its reply was something any honest person in the space would give.

However, this was not the response I was expecting. From the reviews I saw online, I believe that ChatGPT should be able to give me an exact figure. If it can’t do that, then maybe it should be able to give a price range, or at worst, an idea if the price would be bullish or capitulate.

So, my determination made me dig deep as I tried to jailbreak it. To do that, I decided to go with the “Do Anything Now” (DAN) model. This was a trick I discovered from AI writer SM Raiyyan.

In this jailbreaking process, ChatGPT is expected to give a response to my command and, if possible, ditch its excuse of not being able to predict the future. Then again, I asked ChatGPT to give me a price prediction following the Shanghai Upgrade. 

And voila! I got a jailbroken response. Here’s what it said.

This time, it gave a little too enthusiastic response regarding the future performance of the token after being jailbroken. It predicted that ETH’s price will reach $10K— a rather ridiculous claim.

ChatGPT (Classic) mentioned that price action depends on several underlying factors and it cannot predict cryptocurrency’s price. But the Jailbreak response said that ETH will skyrocket to the moon. 

We then asked, “What will be the price of Ethereum by December 2023?”

As you can see from the jailbroken response, it projected a bullish ride for king alt and predicted that ETH will be worth $8K by the end of the year— again, an astonishingly optimistic prediction. 

At press time, ETH was trading hands at $1,933.8, reflecting a rise of 3.6% within a week.

Its Relative Strength Index (RSI) rested only slightly below the neutral 50-mark while its Money Flow Index (MFI) rested above the mark. Its On Balance Volume (OBV) mirrors its price action— first bullish and then stagnant. 

In conclusion, the short-term prospects of ETH don’t look so bullish.

Finally! It showed me the code

I gave ChatGPT one last chance to redeem itself. Again, this question was a simple one, and I expected an accurate answer. I went further to explain things to it carefully. But here is what I got when I asked it to show me the code of ETH’s price on a price tracking platform like CoinGecko or CoinMarketCap.

If you had thought it would disappoint again, sorry to burst your bubble. ChatGPT gave me the code for ETH’s price. Another thing I was impressed with was the disclaimer it gave about not using the information for investment purposes.

All in all, I must admit that ChatGPT has come to stay. Even though it lags in some areas, I noticed that if you teach it; it learns fast. However, I can’t say for sure that it would get you information about Ethereum or the Shanghai upgrade quickly.

Thoughtful responses and the GPT-4 mastermind?

Since I had limited knowledge about AI, I decided to speak to an expert. I was lucky enough to get the attention of Ilman Shazhaev, CEO and founder of Farcana. He is a Dubai-based techpreneur with extensive experience in launching IT and DeepTech projects. Has a strong background in IT management, data science, and AI.

Q- ChatGPT seems to be giving a few incorrect or backdated answers. What do you think could be responsible for this?

Q- Do you think the AI is capable of predicting a cryptocurrency’s price, especially if a development is approaching? Let’s say the Ethereum Shanghai Upgrade

Artificial Intelligence can do anything, including predicting a cryptocurrency’s price. The tool can do this by riding on the tons of available data, which it can efficiently use as a basis for its predictions.

Still, while predicting the price of crypto is one thing, the accuracy of the prediction is another. Considering the fact that AI can only use data, there are fundamental factors and analyses that it may not be able to factor in, thus impairing its accuracy by a significant factor.

Q- If it struggles to give correct responses to up-to-date developments. How long do you think it would take to learn about it?

Q- Do you think AI in any way can influence the Ethereum blockchain or ETH’s price going forward?

There are many aspects through which AI and a blockchain protocol can co-exist, and innovators, including our team at Farcana, are exploring what new use cases we can build in this regard. While AI and blockchain are independently innovative, their combination can do quite a lot, including influencing ETH’s price.

Meanwhile, OpenAI may be working on improvements to the challenges experienced by ChatGPT.  On 14 March, the company revealed an upgraded version of the product on GPT-4. With amazing capabilities and talks of passing difficult exams, who knows? Maybe it could fill in for all the lapses opened up by ChatGPT.

So, now that there is a new version, I wanted to see if there is any difference or improvement. My next line of action was to ask GPT-4 the first question I asked ChatGPT.

And to my surprise, it gave me a direct answer.

Following my experience with the upgraded version, I must admit that GPT-4 seems to be smarter than the ChatGPT-3.5 model. Although the answers were not entirely correct, the bot did not give a “not being familiar” with the term excuse.

Following the encounter with ChatGPT, I must admit that it may be a good idea to leverage its capabilities. As technology develops, so does its potential to revolutionize the cryptocurrency ecosystem. 

More importantly, you may want to take its “classic” response a little seriously. This is because it might be practically impossible for ETH to replace the U.S. Dollar as the world’s reserve by the time frame.

Besides that, there has been a slow rate in the network growth of several crypto projects recently. But with ChatGPT available, crypto education and adoption could improve. 

Conclusion

As far as price analysis and prediction of Ethereum is concerned, ChatGPT turned out to be a reliable ally. You only need to interact with it enough and it will guide you to the moon.

We will see if Ethereum really hits $8,000 by the end of the year, as ChatGPT predicts.

Update the detailed information about 7 Most Commonly Asked Questions On Correlation on the Hatcungthantuong.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!