# 10 Machine Learning Algorithms for beginners.

Machine Learning Algorithms has gained immense traction post the Harvard Business Review article terming a ‘Data Scientist’ as the ‘Sexiest job of the 21st century’. So, for those starting out in the field of ML, here are set of algorithms used in ML.

Machine learning has grown so much that it is the most trending way to solve modern problems with advanced approach. Here are top 10 ML problem solving algorithms for beginners.

Contents of Tutorial

## 1- Artifitial Neural Network

An artificial neural network is one of our crowning achievements. We have created multiple nodes which are interconnected to each other, as shown in the image, which mimics the neurons in our brain. In simple terms, each neuron takes in information through another neuron, performs work on it, and transfers it to another neuron as output.

Each circular node represents an artificial neuron and an arrow represents a connection from the output of one neuron to the input of another.

Neural networks can be more useful if we use it to find interdependencies between various asset classes, rather than trying to predict a buy or sell choice.

## 2- K-means Clustering

In this kind of Machine Learning Algorithms, the goal is to label the data points according to their similarity. Thus, we do not define the clusters prior to the algorithm but instead, the algorithm finds these clusters as it goes forward.

A basic model would be that given the information of football players, we will utilize K-implies grouping and name them as per their similitude. Subsequently, these bunches could be founded on the strikers inclination to score on free kicks or fruitful handles, in any event, when the calculation isn’t given pre-characterized marks to begin with.

K-means clustering would be beneficial to traders who feel that there might be similarities between different assets which cannot be seen on the surface.

## 3- Naive Bayes theorem

It is a classification technique based on Bayes’ Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.

For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in diameter. Even if these features depend on each other or upon the existence of the other features, all of these properties independently contribute to the probability that this fruit is an apple and that is why it is known as ‘Naive’.

It is easy and fast to predict class of test data set. It also perform well in multi class prediction

Naive Bayes model is easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods.

## 4- Random Forest

An arbitrary forest calculation was intended to address a portion of the impediments of choice trees.

Irregular Forest contains choice trees which are diagrams of choices speaking to their strategy or likelihood. These different trees are mapped to a solitary tree which is called Classification and Regression (CART) Model.

To order an article dependent on its properties, each tree gives a characterization which is said to “vote” for that class. The backforest at that point picks the grouping with the best number of votes. For relapse, it thinks about the normal of the yields of various trees.

Each tree is developed as enormous as could be expected under the circumstances.

Random Forest works in the following way:

1. Assume the number of cases as N. A sample of these N cases is taken as the training set.
2. Consider M to be the number of input variables, a number m is selected such that m < M. The best split between m and M is used to split the node. The value of m is held constant as the trees are grown.
3. Each tree is grown as large as possible.
4. By aggregating the predictions of n trees, predict the new data.

By collecting the forecasts of n trees (i.e., larger part votes in favor of characterization, normal for relapse), anticipate the new information.

## 5- Recurrent Neural Networks (RNN)

RNN is most used and advanced Machine Learning Algorithms. Siri and Google Assistant use RNN in their programming? RNNs are essentially a type of neural network which have a memory attached to each node which makes it easy to process sequential data i.e. one data unit is dependent on the previous one.

A way to explain the advantage of RNN over a normal neural network is that we are supposed to process a word character by character. If the word is “trading”, a normal neural network node would forget the character “t” by the time it moves to “d” whereas a recurrent neural network will remember the character as it has its own memory.

## 6- Logistic Regression

Linear regression predictions are continuous values, logistic regression predictions are discrete values after applying a transformation function.

Logistic regression is best suited for binary classification. It is named after the transformation function used in it, called the logistic function h(x)= 1/ (1 + e^x), which is an S-shaped curve.

In logistic regression, the output is in the form of probabilities of the default class (unlike linear regression, where the output is directly produced). As it is a probability, the output lies in the range of 0-1. The output (y-value) is generated by log transforming the x-value, using the logistic function h(x)= 1/ (1 + e^ -x) . A threshold is then applied to force this probability into a binary classification.

To determine whether a tumour is malignant or not, the default variable is y=1 (tumour= malignant) ; the x variable could be a measurement of the tumour, such as the size of the tumour. As shown in the figure, the logistic function transforms the x-value of the various instances of the dataset, into the range of 0 to 1. If the probability crosses the threshold of 0.5 (shown by the horizontal line), the tumour is classified as malignant.

The logistic regression equation P(x) = e ^ (b0 +b1*x) / (1 + e^(b0 + b1*x)) can be transformed into ln(p(x) / 1-p(x)) = b0 + b1*x.

The goal of logistic regression is to use the training data to find the values of coefficients b0 and b1 such that it will minimize the error between the predicted outcome and the actual outcome. These coefficients are estimated using the technique of Maximum Likelihood Estimation.

## 7- CART

Classification and Regression Trees (CART) is an implementation of Decision Trees, among others such as ID3, C4.5.

The non-terminal nodes are the root node and the internal node. The terminal nodes are the leaf nodes. Each non-terminal node represents a single input variable (x) and a splitting point on that variable; the leaf nodes represent the output variable (y). The model is used as follows to make predictions: walk the splits of the tree to arrive at a leaf node and output the value present at the leaf node.

The decision tree in below figure classifies whether a person will buy a sports car or a minivan depending on their age and marital status. If the person is over 30 years and is not married, we walk the tree as follows : ‘over 30 years?’ -> yes -> ’married?’ -> no. Hence, the model outputs a sportscar.

## 8- PCA

Principal Component Analysis (PCA) is used to make data easy to explore and visualize by reducing the number of variables. This is done by capturing the maximum variance in the data into a new co-ordinate system with axes called ‘principal components’. Each component is a linear combination of the original variables and is orthogonal to one another. Orthogonality between components indicates that the correlation between these components is zero.

The first principal component captures the direction of the maximum variability in the data. The second principal component captures the remaining variance in the data but has variables uncorrelated with the first component. Similarly, all successive principal components capture the remaining variance while being uncorrelated with the previous component.

## 9- Apriori

The Apriori algorithm is used in a transactional database to mine frequent itemsets and then generate association rules. It is popularly used in market basket analysis, where one checks for combinations of products that frequently co-occur in the database. In general, we write the association rule for ‘if a person purchases item X, then he purchases item Y’ as : X -> Y.

Example: if a person purchases milk and sugar, then he is likely to purchase coffee powder. This could be written in the form of an association rule as: {milk,sugar} -> coffee powder. Association rules are generated after crossing the threshold for support and confidence.

The Support measure helps prune the number of candidate itemsets to be considered during frequent itemset generation. This support measure is guided by the Apriori principle. The Apriori principle states that if an itemset is frequent, then all of its subsets must also be frequent.

These are boosting algorithms is one of the most used Machine Learning Algorithms and is used when massive loads of data have to be handled to make predictions with high accuracy. Boosting is an ensemble learning algorithm that combines the predictive power of several base estimators to improve robustness. 