The main research question was if unlabeled data along with labeled data, which is called semi-supervised learning, can be used to build a strong classification model.
Jafar Tanha presented several algorithms that advance the state-of-the-art in semi-supervised learning. He observed that a good probability estimation leads to selecting reliable subset of the newly-labeled data. Tanha then combined a distance-based measure with probability estimation of decision tree, as the selection metric. Therefore, he could conclude that using a combination of probability estimation of the base learner and a distance-based measure can be a proper selection metric for self-training. Tanha further used both the classifier predictions and pairwise similarities among data to sample from unlabeled data. He showed that this combination achieved better results than the state-of-the-art methods.
In many classification problems such as object detection, action recognition, document and web-page categorization a large amount of unlabeled data is readily available, while labeled data usually is not easy to obtain. Assigning labels to data - annotating data - in some domains such as medical diagnosis and bioinformatics, requires special measurements or applying specialized devices, which are often expensive. On the other hand, unlabeled data is usually available in abundance, and collecting a large pool of them is an easy task. Therefore, it is interesting and important enough to investigate the methods that can effective learn from both labeled and unlabeled data - the so-called semi-supervised learning.
J. Tanha: Ensemble Approaches to Semi-Supervised Learning.
The supervisor is Prof. H. Afsarmanesh and co-supervisor is Dr. M.W. van Someren.
The ceremony is open to the general public