Ensembles and Random Forest Analysis: How it Works

Ensemble methods can use multiple machine learning algorithms to predict performance. Ensemble is essentially about combining methods to have better predictions.  For example, in terms of logistic regression with ensemble classification, if the first classifier is a base classifier and the second is a corrector classifier, then the first does the initial classification, and the predicted class is then fed into the feature of the second classifier.  The second classifier can either result in a classification which is identical to the first or can correct the prediction if more accuracy is found.  The base classifier helps with the initial prediction of the target class.  The corrector classifier attempts to correct any errors in the prediction of the base classifier by focusing on the decision boundary of the base classifier.  For example, a choice of the base classifier could be logistic regression.  Logistic regression is a parametric discriminative classifier that can be used for training.  Also, for a corrector, the k-nearest neighbors can be the parametric classifier which would take the average of k nearest training data to make the prediction.

Random Forest is a type of ensemble method that performs both regression and classification with the use of multiple decision trees.  The technique is often referred to as Bootstrap Aggregation.  The Bootstrap Aggregation method involves training each decision tree on a different random.  The sampling in this instance occurs through replacement.