## Top Answers to Machine Learning Interview Questions

Machine Learning and Artificial Intelligence are among the most popular technologies in the world today. This comprehensive blog consists of some of the most frequently asked Machine Learning interview questions that aim to help you revise all the necessary concepts and skills to land your dream job. This blog is specifically designed for you to do a thorough Machine Learning interview preparation before going for the interview. Listed below are some of the most frequently asked questions in an ML job interview. Go through them and succeed in your career!

Q1. Explain Machine Learning, Artificial Intelligence and Deep Learning?

Q2. What are Bias and Variance in Machine Learning?

Q3. What is Clustering in Machine Learning?

Q4. What is a Linear Regression in Machine Learning?

Q5. What is a Decision Tree in Machine Learning?

Q6. What is Overfitting in Machine Learning and how can you avoid?

Q7. What is the Hypothesis in Machine Learning?

Q8. What are the differences between Deep Learning and Machine Learning?

Q9. What are the differences between Supervised And Unsupervised Machine Learning?

Q10. What is Bayes’ Theorem in Machine Learning?

Q11. What is PCA in Machine Learning?

Q12. What is SVM (Support Vector Machines) in Machine Learning?

Q13. What is Cross-Validation in Machine Learning?

Q14. What is Entropy in Machine Learning?

Q15. What is Epoch in Machine Learning?

### Check out our Machine Learning Interview Questions And Answers Video on Youtube:

**1. What are the types of Machine Learning?**

In all the ML Interview Questions that we would be going to discuss, this is one of the most basic questions.

So, basically, there are three types of Machine Learning techniques:

**Supervised Learning: **In this type of Machine Learning technique, machines learn under the supervision of labeled data. There is a training dataset on which the machine is trained, and it gives the output according to its training.

**Unsupervised Learning: **Unlike supervised learning, it has unlabeled data. So, there is no supervision under which it works on the data. Basically, unsupervised learning tries to identify patterns in data and make clusters of similar entities. After that, when a new input data is fed into the model, it does not identify the entity; rather, it puts the entity in a cluster of similar objects.

**Reinforcement Learning: **Reinforcement learning includes models that learn and traverse to find the best possible move. The algorithms for reinforcement learning are constructed in a way that they try to find the best possible suite of action on the basis of the reward and punishment theory.

**2. Differentiate between classification and regression in Machine Learning.**

In Machine Learning, there are various types of prediction problems based on supervised and unsupervised learning. These are classification, regression, clustering, and association. Here, we will discuss classification and regression.

**Classification: **In classification, we try to create a Machine Learning model that assists us in differentiating data into separate categories. The data is labeled and categorized based on the input parameters.

For example, imagine that we want to make predictions on the churning out customers for a particular product based on some data recorded. Either the customers will churn out or they will not. So, the labels for this would be ‘Yes’ and ‘No.’

**Regression: **It is the process of creating a model for distinguishing data into continuous real values, instead of using classes or discrete values. It can also identify the distribution movement depending on the historical data. It is used for predicting the occurrence of an event depending on the degree of association of variables.

For example, the prediction of weather conditions depends on factors such as temperature, air pressure, solar radiation, the elevation of the area, and distance from the sea. The relation between these factors assists us in predicting the weather condition.

### Get 50% Hike!

Master Most in Demand Skills Now !

**3. What is a Linear Regression in Machine Learning?**

Linear Regression is a supervised Machine Learning algorithm. It is used to find the linear relationship between the dependent and the independent variables for predictive analysis.

The equation for Linear Regression:

where:

is the input or the independent variable*X*is the output or the dependent variable*Y*is the intercept and*a*is the coefficient of*b**X*

Below is the **best fit line **that shows the data of weight (** Y** or the dependent variable) and height (

**or the independent variable) of 21-years-old candidates scattered over the plot. This straight-line shows the best linear relationship that would help in predicting the weight of candidates according to their height.**

*X*To get this **best fit line**, we will try to find the best values of ** a** and

**. By adjusting the values of**

*b**a*and

*b*, we will try to reduce errors in the prediction of

*Y*.

This is how linear regression helps in finding the linear relationship and predicting the output.

**4. How will you determine the Machine Learning algorithm that is suitable for your problem?**

To identify the Machine Learning algorithm for our problem, we should follow the below steps:

**Step 1: Problem Classification: **Classification of the problem depends on the classification of input and output:

**Classifying the input:**Classification of the input depends on whether we have data labeled (supervised learning) or unlabeled (unsupervised learning), or whether we have to create a model that interacts with the environment and improves itself (reinforcement learning).**Classifying the output:**If we want the output of our model as a class, then we need to use some classification techniques.

If it is giving the output as a number, then we must use regression techniques and, if the output is a different cluster of inputs, then we should use clustering techniques.

**Step 2: Checking the algorithms in hand: **After classifying the problem, we have to look for the available algorithms that can be deployed for solving the classified problem.

**Step 3: Implementing the algorithms: **If there are multiple algorithms available, then we will implement each one of them, one by one. Finally, we would select the algorithm that gives the best performance.

**5. Explain Machine Learning, Artificial Intelligence, and Deep Learning?**

It is common to get confused between the three in-demand technologies: Machine Learning, Artificial Intelligence, and Deep Learning. These three technologies, though a little different from one another, are interrelated. While Deep Learning is a subset of Machine Learning, Machine Learning is a subset of Artificial Intelligence. Since some terms and techniques may overlap with each other while dealing with these technologies, it is easy to get confused between them.

Therefore, let’s learn about these technologies in detail so that you become capable of differentiating between them:

**Machine Learning:**Machine Learning involves various statistical and Deep Learning techniques that allow machines to use their past experiences and get better at performing specific tasks without having to be monitored.**Artificial Intelligence:**Artificial Intelligence uses numerous Machine Learning and Deep Learning techniques that enable computer systems to perform tasks using human intelligence, with logic and rules.**Deep Learning:**Deep Learning comprises several algorithms that enable software to learn from themselves and perform various business tasks, including image and speech recognition. This is possible when the systems expose their multi-layered neural networks to large volumes of data for learning.

**6. What is clustering in Machine Learning?**

Clustering is a technique used in unsupervised learning that involves grouping data points. If you have a set of data points, you can make use of the clustering algorithm. This technique will allow you to classify all the data points into their particular groups. The data points that are thrown into the same category have similar features and properties, whereas the data points that belong to different groups have distinct features and properties. This method allows you to perform statistical data analysis. Let’s take a look at three of the most popular and useful clustering algorithms.

**K-means clustering:**This algorithm is commonly used when you have data with no specific group or category. It allows you to find the hidden patterns in the data that can be used to classify them into various groups. The variable*k*is used to represent the number of groups they are divided into, and the data points are clustered using the similarity of features. Here, the centroids of the clusters are used for labeling new data.**Mean-shift clustering:**The main aim of this algorithm is to update the center point candidates to be the mean and find the center points of all the groups. Unlike k-means clustering, in this, you do not need to select the possible number of clusters as it can automatically be discovered by the mean shift.**Density-based spatial clustering of applications with noise (DBSCAN):**This clustering is based on density and has similarities with mean-shift clustering. There is no need to pre-set the number of clusters, but unlike mean-shift, it identifies outliers and treats them like noise. Moreover, it can identify arbitrarily sized and shaped clusters without much effort.

Career Transition

**7. What is a hypothesis in Machine Learning?**

Machine Learning allows you to use the dataset available to understand a specific function that maps inputs to outputs in the best possible way. This problem is known as function approximation. In this, you need to use an approximation for the unknown target function that maps in the best manner all the plausible observations based on the given problem. Hypothesis in Machine learning is a model that helps in approximating the target function and performing the necessary input-to-output mappings. The choice and configuration of algorithms allows you to define the space of plausible hypotheses that may be represented by the model.

In the hypothesis, lowercase h (h) is used for a specific hypothesis, while uppercase h (H) is used for the hypothesis space that is being searched. Let’s briefly understand these notations:

**Hypothesis (h):**A hypothesis is a specific model that helps in mapping inputs to outputs, which can further be used for evaluation and prediction.**Hypothesis set (H):**Hypothesis set consists of a space of hypotheses that can be used to map inputs to outputs, which can be searched. The general constraints include the choice of problem framing, the model, and the model configuration.

**8. What are the differences between Deep Learning and Machine Learning?**

**Deep Learning:**Deep Learning allows machines to make various business-related decisions using artificial neural networks, which is one of the reasons why it needs a vast amount of data for training. Since there is a lot of computing power required, it requires high-end systems as well. The systems acquire various properties and features with the help of the given data, and the problem is solved using an end-to-end method.**Machine Learning:**Machine Learning technology gives machines the ability to make business decisions without any external help, using the knowledge gained from past data. Machine Learning systems require relatively small amounts of data to train themselves, and most of the features need to be manually coded and understood in advance. Here, the given business problem is dissected into two, and they are solved individually. Once the solutions of both parts have been acquired, they are then combined.

**9. What are the differences between Supervised and Unsupervised Machine Learning?**

**Supervised learning:**Algorithms of supervised learning use labeled data to get trained. The models take direct feedback to confirm whether the output that is being predicted is, indeed, correct. Moreover, both the input data and the output data are provided to the model, and the main aim here is to train the model to predict the output when it receives new data. It can largely be divided into two parts, classification and regression. It offers accurate results.**Unsupervised learning:**Unsupervised learning algorithms use unlabeled data for training purposes. In this, the models do not take any feedback, and unlike the case of supervised learning, these models identify hidden data trends. The unsupervised learning model is only provided with the input data, and its main aim is to identify hidden patterns to extract information from the unknown sets of data. It can also be classified into two parts, namely, clustering and associations. Unfortunately, unsupervised learning offers results that are comparatively less accurate.

**10. What is Bayes’ theorem in Machine Learning?**

The Bayes’ theorem offers the probability of any given event to occur using prior knowledge. In mathematical terms, it can be defined as the true positive rate of the given sample condition divided by the sum of the true positive rate of the said condition and the false positive rate of the entire population.

Two of the most significant applications of the Bayes’ theorem in Machine Learning are Bayesian optimization and Bayesian belief networks. This theorem is also the foundation behind the Machine Learning brand that involves the Naive Bayes classifier.

**11. What is cross-validation in Machine Learning?**

The cross-validation method in Machine Learning allows a system to increase the performance of the given Machine Learning algorithm to which you feed multiple sample data from the dataset. This sampling process is done to break the dataset into smaller parts that have the same number of rows, out of which a random part is selected as a test set, and the rest of the parts are kept as train sets. It consists of the following techniques:

- Holdout method
- K-fold cross-validation
- Stratified k-fold cross-validation
- Leave p-out cross-validation

**12. What is entropy in Machine Learning?**

Entropy in Machine Learning measures the randomness in the data that needs to be processed. The more entropy in the given data, the more difficult it becomes to draw any useful conclusion from the data. For example, let’s take the incident of flipping a coin. The result of this is random as it does not favor heads or tails. Here, the result for any number of tosses cannot be predicted easily as there is no definite relationship between the action of flipping and the possible outcomes.

Courses you may like

**13. What is epoch in Machine Learning?**

Epoch in Machine Learning is used to indicate the count of passes in a given training dataset where the Machine Learning algorithm has done its job. Generally, when there is a huge chunk of data, it is grouped into several batches. Here, each of these batches goes through the given model, and this process is referred to as iteration. Now, if the batch size comprises the complete training dataset, then the count of iterations is the same as that of epochs.

In case there is more than one batch, d*e=i*b is the formula used, wherein ‘d’ is the dataset, ‘e’ is the number of epochs, ‘i’ is the number of iterations, and ‘b’ is the batch size.

**14. What are Bias and Variance in Machine Learning?**

**Bias**is the difference between the average prediction of our model and the correct value. If the bias value is high, then the prediction of the model is not accurate. Hence, the bias value should be as low as possible to make the desired predictions.**Variance**is the number that gives the difference of prediction over a training set and the anticipated value of other training sets. High variance may lead to large fluctuation in the output. Therefore, the model’s output should have low variance.

The below diagram shows the bias–variance trade off:

Here, the desired result is the blue circle at the center. If we get off from the blue section, then the prediction goes wrong.

*Interested in learning Machine Learning? Enroll in our **Machine Learning Training** now!*

### Watch this complete course video on Machine Learning

**15. What is Variance Inflation Factor?**

Variance Inflation Factor (VIF) is the estimate of the volume of multicollinearity in a collection of many regression variables.

VIF = Variance of the model / Variance of the model with a single independent variable

We have to calculate this ratio for every independent variable. If VIF is high, then it shows the high collinearity of the independent variables.

**16. Explain false negative, false positive, true negative, and true positive with a simple example.**

**True Positive (TP)**: When the Machine Learning model **correctly** predicts the condition, it is said to have a True Positive value.

**True Negative (TN)**: When the Machine Learning model **correctly** predicts the negative condition or class, then it is said to have a True Negative value.

**False Positive (FP)**: When the Machine Learning model **incorrectly** predicts a negative class or condition, then it is said to have a False Positive value.

**False Negative (FN)**: When the Machine Learning model **incorrectly** predicts a positive class or condition, then it is said to have a False Negative value.

**17. What is a Confusion Matrix?**

Confusion matrix is used to explain a model’s performance and gives the summary of predictions on the classification problems. It assists in identifying the uncertainty between classes.

A confusion matrix gives the count of correct and incorrect values and also the error types.**Accuracy of the model: **

For example, consider this confusion matrix. It consists of values as True Positive, True Negative, False Positive, and False Negative for a classification model. Now, the accuracy of the model can be calculated as follows:

Thus, in our example:

Accuracy = (200 + 50) / (200 + 50 + 10 + 60) = 0.78

This means that the model’s accuracy is 0.78, corresponding to its True Positive, True Negative, False Positive, and False Negative values.

**18. What do you understand by Type I and Type II errors?**

**Type I Error**: Type I error (False Positive) is an error where the outcome of a test shows the non-acceptance of a true condition.

For example, a cricket match is going on and, when a batsman is not out, the umpire declares that he is out. This is a false positive condition. Here, the test does not accept the true condition that the batsman is not out.

**Type II Error**: Type II error (False Negative) is an error where the outcome of a test shows the acceptance of a false condition.

For example, the CT scan of a person shows that he is not having a disease but, in reality, he is having it. Here, the test accepts the false condition that the person is not having the disease.

**19. When should you use classification over regression?**

Both classification and regression are associated with prediction. Classification involves the identification of values or entities that lie in a specific group. The regression method, on the other hand, entails predicting a response value from a consecutive set of outcomes.

The classification method is chosen over regression when the output of the model needs to yield the belongingness of data points in a dataset to a particular category.

For example, we have some names of bikes and cars. We would not be interested in finding how these names are correlated to bikes and cars. Rather, we would check whether each name belongs to the bike category or to the car category.

**20. Explain Logistic Regression.**

Logistic regression is the proper regression analysis used when the dependent variable is categorical or binary. Like all regression analyses, logistic regression is a technique for predictive analysis. Logistic regression is used to explain data and the relationship between one dependent binary variable and one or more independent variables. Also, it is employed to predict the probability of a categorical dependent variable.

We can use logistic regression in the following scenarios:

- To predict whether a citizen is a Senior Citizen (1) or not (0)
- To check whether a person is having a disease (Yes) or not (No)

There are three types of logistic regression:

**Binary Logistic Regression**: In this, there are only two outcomes possible.

**Example**: To predict whether it will rain (1) or not (0)

**Multinomial Logistic Regression**: In this, the output consists of three or more unordered categories.

**Example**: Prediction on the regional languages (Kannada, Telugu, Marathi, etc.)

**Ordinal Logistic Regression**: In ordinal logistic regression, the output consists of three or more ordered categories.

**Example**: Rating an Android application from 1 to 5 stars.

*Interested in learning Machine Learning? Click here to learn more in this **Machine Learning Training in Bangalore**!*

**21. Imagine, you are given a dataset consisting of variables having more than 30% missing values. Let’s say, out of 50 variables, 8 variables have missing values, which is higher than 30%. How will you deal with them?**

To deal with the missing values, we will do the following:

- We will specify a different class for the missing values.
- Now, we will check the distribution of values, and we would hold those missing values that are defining a pattern.
- Then, we will charge these into a yet another class, while eliminating others.

**22. How do you handle the missing or corrupted data in a dataset?**

In Python Pandas, there are two methods that are very useful. We can use these two methods to locate the lost or corrupted data and discard those values:

**isNull()**: For detecting the missing values, we can use the isNull() method.**dropna()**: For removing the columns/rows with null values, we can use the dropna() method.

Also, we can use **fillna() **to fill the void values with a placeholder value.

### Watch this complete course video on Machine Learning Interview Questions

**23. What is PCA in Machine Learning?**

Firstly, this is one of the most important Machine Learning Interview Questions.

In the real world, we deal with multi-dimensional data. Thus, data visualization and computation become more challenging with the increase in dimensions. In such a scenario, we might have to reduce the dimensions to analyze and visualize the data easily. We do this by:

- Removing irrelevant dimensions
- Keeping only the most relevant dimensions

This is where we use Principal Component Analysis (PCA).

Finding a fresh collection of uncorrelated dimensions (orthogonal) and ranking them on the basis of variance are the goals of Principal Component Analysis.

**The Mechanism of PCA**:

- Compute the covariance matrix for data objects
- Compute the Eigen vectors and the Eigen values in a descending order
- To get the new dimensions, select the initial
*N*Eigen vectors - Finally, change the initial n-dimensional data objects into N-dimensions

**Example**: Below are the two graphs showing data points (objects) and two directions: one is ‘green’ and the other is ‘yellow.’ We got the Graph 2 by rotating the Graph 1 so that the x-axis and y-axis represent the ‘green’ and ‘yellow’ directions, respectively.

After the rotation of the data points, we can infer that the green direction (x-axis) gives us the line that best fits the data points.

Here, we are representing 2-dimensional data. But in real-life, the data would be multi-dimensional and complex. So, after recognizing the importance of each direction, we can reduce the area of dimensional analysis by cutting off the less-significant ‘directions.’

Now, we will look into another important Machine Learning Interview Question on PCA.

**24. Why rotation is required in PCA? What will happen if you don’t rotate the components?**

Rotation is a significant step in PCA as it maximizes the separation within the variance obtained by components. Due to this, the interpretation of components becomes easier.

The motive behind doing PCA is to choose fewer components that can explain the greatest variance in a dataset. When rotation is performed, the original coordinates of the points get changed. However, there is no change in the relative position of the components.

If the components are not rotated, then we need more extended components to describe the variance.

**25. We know that one hot encoding increases the dimensionality of a dataset, but label encoding doesn’t. How?**

When we use **one-hot encoding**, there is an increase in the dimensionality of a dataset. The reason for the increase in dimensionality is that, for every class in the categorical variables, it forms a different variable.

**Example**: Suppose, there is a variable ‘Color.’ It has three sub-levels as Yellow, Purple, and Orange. So, one hot encoding ‘Color’ will create three different variables as Color.Yellow, Color.Porple, and Color.Orange.

In **label encoding**, the sub-classes of a certain variable get the value as **0** and **1**. So, we use label encoding only for binary variables.

This is the reason that one hot encoding increases the dimensionality of data and label encoding does not.

*Now, if you are interested in doing an end-to-end certification course in Machine Learning, you can check out Intellipaat’s **Machine Learning Course** with Python.*

**26. What is Overfitting in Machine Learning and how can you avoid?**

Overfitting happens when a machine has an inadequate dataset and it tries to learn from it. So, overfitting is inversely proportional to the amount of data.

For small databases, we can bypass overfitting by the cross-validation method. In this approach, we will divide the dataset into two sections. These two sections will comprise testing and training sets. To train the model, we will use the training dataset and, for testing the model for new inputs, we will use the testing dataset.

This is how we can avoid overfitting.

**27. Why do we need a validation set and a test set?**

We split the data into three different categories while creating a model:

**Training set**: We use the training set for building the model and adjusting the model’s variables. But, we cannot rely on the correctness of the model build on top of the training set. The model might give incorrect outputs on feeding new inputs.**Validation set**: We use a validation set to look into the model’s response on top of the samples that don’t exist in the training dataset. Then, we will tune hyperparameters on the basis of the estimated benchmark of the validation data.

When we are evaluating the model’s response using the validation set, we are indirectly training the model with the validation set. This may lead to the overfitting of the model to specific data. So, this model won’t be strong enough to give the desired response to the real-world data.

**Test set**: The test dataset is the subset of the actual dataset, which is not yet used to train the model. The model is unaware of this dataset. So, by using the test dataset, we can compute the response of the created model on hidden data. We evaluate the model’s performance on the basis of the test dataset.

**Note**: We always expose the model to the test dataset after tuning the hyperparameters on top of the validation set.

As we know, the evaluation of the model on the basis of the validation set would not be enough. Thus, we use a test set for computing the efficiency of the model.

**28. What is a Decision Tree in Machine Learning?**

A decision tree is used to explain the sequence of actions that must be performed to get the desired output. It is a hierarchical diagram that shows the actions.

We can create an algorithm for a decision tree on the basis of the hierarchy of actions that we have set.

In the above decision tree diagram, we have made a sequence of actions for driving a vehicle with/without a license.

**29. Explain the difference between KNN and K-means Clustering.**

**K-nearest neighbors**: It is a supervised Machine Learning algorithm. In KNN, we give the identified (labeled) data to the model. Then, the model matches the points based on the distance from the closest points.

**K-means clustering**: It is an unsupervised Machine Learning algorithm. In this, we give the unidentified (unlabeled) data to the model. Then, the algorithm creates batches of points based on the average of the distances between distinct points.

**30. What is Dimensionality Reduction?**

In the real world, we build Machine Learning models on top of features and parameters. These features can be multi-dimensional and large in number. Sometimes, the features may be irrelevant and it becomes a difficult task to visualize them.

Here, we use dimensionality reduction to cut down the irrelevant and redundant features with the help of principal variables. These principal variables are the subgroup of the parent variables that conserve the feature of the parent variables.

**31. Both being tree-based algorithms, how is Random Forest different from Gradient Boosting Algorithm (GBM)?**

The main difference between a random forest and GBM is the use of techniques. Random forest advances predictions using a technique called ‘bagging.’ On the other hand, GBM advances predictions with the help of a technique called ‘boosting.’

**Bagging**: In bagging, we apply arbitrary sampling and we divide the dataset into*N*After that, we build a model by employing a single training algorithm. Following, we combine the final predictions by polling. Bagging helps increase the efficiency of the model by decreasing the variance to eschew overfitting.**Boosting**: In boosting, the algorithm tries to review and correct the inadmissible predictions at the initial iteration. After that, the algorithm’s sequence of iterations for correction continues until we get the desired prediction. Boosting assists in reducing bias and variance, both, for making the weak learners strong.

**32. Suppose, you found that your model is suffering from high variance. Which algorithm do you think could handle this situation and why?**

**Handling High Variance**

- For handling issues of high variance, we should use the bagging algorithm.
- The bagging algorithm would split data into sub-groups with a replicated sampling of random data.
- Once the algorithm splits the data, we use random data to create rules using a particular training algorithm.
- After that, we use polling for combining the predictions of the model.

**33. What is ROC curve and what does it represent?**

ROC stands for ‘Receiver Operating Characteristic.’ We use ROC curve to represent the trade-off between True and False positive rates, graphically.

In ROC, AUC (Area Under the Curve) gives us an idea about the accuracy of the model.

The above graph shows an ROC curve. Greater the Area Under the Curve better the performance of the model.

Next, we would be looking at Machine Learning Interview Questions on Rescaling, Binarizing, and Standardizing.

**34. What is Rescaling of data and how is it done?**

In real-world scenarios, the attributes present in data will be in a varying pattern. So, rescaling of the characteristics to a common scale gives benefit to algorithms to process the data efficiently.

We can rescale the data using Scikit-learn. The code for rescaling the data using MinMaxScaler is as follows:

#Rescaling data import pandas import scipy import numpy from sklearn.preprocessing import MinMaxScaler names = ['Abhi', 'Piyush', 'Pranay', 'Sourav', 'Sid', 'Mike', 'pedi', 'Jack', 'Tim'] Dataframe = pandas.read_csv(url, names=names) Array = dataframe.values # Splitting the array into input and output X = array[:,0:8] Y = array[:,8] Scaler = MinMaxScaler(feature_range=(0, 1)) rescaledX = scaler.fit_transform(X) # Summarizing the modified data numpy.set_printoptions(precision=3) print(rescaledX[0:5,:])

**35. What is Binarizing of data? How to Binarize?**

In most of the Machine Learning Interviews, apart from theoretical questions, interviewers focus on the implementation part. So, these ML Interview Questions focused on the implementation of the theoretical concepts.

Converting data into binary values on the basis of threshold values is known as the binarizing of data. The values that are less than the threshold are set to **0** and the values that are greater than the threshold are set to **1**. This process is useful when we have to perform feature engineering, and we can also use it for adding unique features.

We can binarize data using Scikit-learn. The code for binarizing the data using Binarizer is as follows:

from sklearn.preprocessing import Binarizer import pandas import numpy names = ['Abhi', 'Piyush', 'Pranay', 'Sourav', 'Sid', 'Mike', 'pedi', 'Jack', 'Tim'] dataframe = pandas.read_csv(url, names=names) array = dataframe.values # Splitting the array into input and output X = array[:,0:8] Y = array[:,8] binarizer = Binarizer(threshold=0.0).fit(X) binaryX = binarizer.transform(X) # Summarizing the modified data numpy.set_printoptions(precision=3) print(binaryX[0:5,:])

**36. How to Standardize data?**

Standardization is the method that is used for rescaling data attributes. The attributes would likely have a value of mean as **0** and the value of standard deviation as **1**. The main objective of standardization is to prompt the mean and standard deviation for the attributes.

We can standardize the data using Scikit-learn. The code for standardizing the data using StandardScaler is as follows:

# Python code to Standardize data (0 mean, 1 stdev) from sklearn.preprocessing import StandardScaler import pandas import numpy names = ['Abhi', 'Piyush', 'Pranay', 'Sourav', 'Sid', 'Mike', 'pedi', 'Jack', 'Tim'] dataframe = pandas.read_csv(url, names=names) array = dataframe.values # Separate the array into input and output components X = array[:,0:8] Y = array[:,8] scaler = StandardScaler().fit(X) rescaledX = scaler.transform(X) # Summarize the transformed data numpy.set_printoptions(precision=3) print(rescaledX[0:5,:])

**37. Executing a binary classification tree algorithm is a simple task. But, how does a tree splitting take place? How does the tree determine which variable to break at the root node and which at its child nodes?**

Gini index and Node Entropy assist the binary classification tree to take decisions. Basically, the tree algorithm determines the feasible feature that is used to distribute data into the most genuine child nodes.

According to Gini index, if we arbitrarily pick a pair of objects from a group, then they should be of identical class and the possibility for this event should be **1**.

To compute the Gini index, we should do the following:

- Compute Gini for sub-nodes with the formula: The sum of the square of probability for success and failure (p^2 + q^2)
- Compute Gini for split by weighted Gini rate of every node of the split

Now, Entropy is the degree of indecency that is given by the following:

where ** a** and

**are the probabilities of success and failure of the node**

*b*When** Entropy = 0**, the node is homogenous

When** Entropy is high**, both groups are present at 50–50 percent in the node.

Finally, to determine the suitability of the node as a root node, the entropy should be very low.

**38. What is SVM (Support Vector Machines) in Machine Learning?**

SVM is a Machine Learning algorithm that is majorly used for classification. It is used on top of the high dimensionality of the characteristic vector.

Below is the code for the SVM classifier:

# Introducing required libraries from sklearn import datasets from sklearn.metrics import confusion_matrix from sklearn.model_selection import train_test_split # Stacking the Iris dataset iris = datasets.load_iris() # A -> features and B -> label A = iris.data B = iris.target # Breaking A and B into train and test data A_train, A_test, B_train, B_test = train_test_split(A, B, random_state = 0) # Training a linear SVM classifier from sklearn.svm import SVC svm_model_linear = SVC(kernel = 'linear', C = 1).fit(A_train, B_train) svm_predictions = svm_model_linear.predict(A_test) # Model accuracy for A_test accuracy = svm_model_linear.score(A_test, B_test) # Creating a confusion matrix cm = confusion_matrix(B_test, svm_predictions)

**39. Implement the KNN classification algorithm.**

We will use the Iris dataset for implementing the KNN classification algorithm.

# KNN classification algorithm from sklearn.datasets import load_iris from sklearn.neighbors import KNeighborsClassifier import numpy as np from sklearn.model_selection import train_test_split iris_dataset=load_iris() A_train, A_test, B_train, B_test = train_test_split(iris_dataset["data"], iris_dataset["target"], random_state=0) kn =KNeighborsClassifier(n_neighbors=1)kn.fit(A_train, B_train) A_new = np.array([[8, 2.5, 1, 1.2]]) prediction = kn.predict(A_new) print("Predicted target value: {}\n".format(prediction)) print("Predicted feature name: {}\n".format (iris_dataset["target_names"][prediction])) print("Test score: {:.2f}".format(kn.score(A_test, B_test)))Output: Predicted Target Name: [0] Predicted Feature Name: [‘ Setosa’] Test Score: 0.92

*Come to Intellipaat’s **Machine Learning Community** if you have more queries on Machine Learning Interview Questions!*