"Semi-Supervised Learning", Professor Bing Liu's instruction slides, UIC CS583 course, Spring 2005.
Motivation:
To address the classification problem with only positive labeled and unlabeled items provided.
Contribution:
1). Argued that unlabeled data is useful in increase classifier accuracy (this is certified by Nigam, et al in 2000, Prof. Liu extended the idea); 2). Theoretically proved that EM is usable in text mining; with methods of Spy-EM, Naive Baysian and SVM. 3). Proposed a new metric to evaluate classifier performance under the situation that F-Score can not be estimated.
Methods:
1). Spy-EM find reliable negative items in the M-stage. 2). r-square/Pr(f(x)=1) is used, where r can be estimated from the positive cases in validation set and Pr(f(x)=1) can be estimated from the whole validation set.
Discussions:
The concept of EM is not hard. The key point is to find a way for maximization in the M-stage. The author's main contribution is to define maximization problem to be find reliable negative cases under the constraint of keping r shreshold, and proved that r-squre/Pr(f(x)) =1 is a proper metric for classifier performance.