In this week we use the LearningTrees (Random Forests) for classification after eliminating a set of domain based common words, and build the ground truth of the classifier before and after eliminating domain based common words.
We used the sentence vectors after applying CBOW as input features to Random Forests, and about 20 % of data reserved for testing.
We convert our data (train and test data) to the form used by the ML bundles. Then we separate the Independent Variables and the Dependent Variables. Classification expects Dependent variables to be unsigned integers representing discrete class labels. It therefore expresses Dependent variables using the DiscreteField layout.
We were able to increase the accuracy by 3% after eliminating 5000 words.