active learning 主动学习
Deep Active Learning: Pool-based AL selects most informative data iteratively from a large pool of unlabeled i.i.d. data samples until either the basic learner(s) reaches a certain level of performance or a fixed budget is exhausted.
Querying Strategies
uncertainty-based
Uncertainty-based DAL selects data samples with high aleatoric uncertainty or epistemic uncertainty.
- Aleatoric uncertainty refers to the natural uncertainty in data due to influences on data generation processes that are inherently random.
- Epistemic uncertainty comes from the modeling/learning process and is caused by a lack of knowledge.
Typical methods:
- Maximum Entropy (Entropy) selects data x that maximize the predictive entropy.
- Margin selects data x whose two most likely labels have smallest difference in posterior probabilities.
- Least Confidence (LeastConf) selects data x whose most likely label ŷ has lowest posterior probability
- **Bayesian Active Learning by Disagreements (BALD) ** chooses data points that are
expected to maximize the information gained from the model parameters ω, i.e. the mu-
tual information between predictions and model posterior: αBALD (x, M) = HM [y|x] −
Ep(ω|Dl) [HM[y|x, ω]]. - Mean Standard Deviation (MeanSTD) maximizes the mean standard deviation of the predicted probabilities over all k classes: αMeanSTD (x, M) = 1 k Pk pVarq(ω)[p(y = k|x, ω)].
- DeepFool Active Learning method (AdvDeepFool)
- Generative Adversarial Active Learning (GAAL)
- Bayesian Generative Active Deep Learning (BGADL)
- Batch Active learning by Diverse Gradient Embeddings (BADGE)
- Loss Prediction Loss (LPL)
representativeness/diversity-based
Representative/diversity-based strategies select batches of samples representative of the unlabeled set and are based on the intuition that the selected representative examples, once labeled, can act as a surrogate for the entire dataset.
Typical methods:
- KMeans
- CoreSet
- Cluster-Margin
- Active-DPP
combined strategies
Due to the demand for larger batch size (representative/diversity) and more precise decision boundaries for higher model performance (uncertainty) in DAL, combined strategies have become the dominant approaches to DAL. It aims to achieve a trade-off between uncertainty and representativeness/diversity in query selection.