ML笔记:(二) The Learning Problems

Outline

Lecture 2: The Learning Problems

Learning with Different Output Space
Learning with Different Data Label
Learning with Different Protocol
Learning with Different Input Space

Learning with Different Output Spcae

Multilabel Classification

上一篇我們討論到的Credit Approval Problem是一個binary classification(二元分類問題)，相似的還有email spam/non-spam等。那麼相類似的還有以下Multiclass Classification:

在圖像辨識中甄別物體屬於哪種類型，而候選類型存在多個。

相類似的還有Multilabel classification: classify input to multiple(or no) categories。即分類的物體可能同時屬於多個class:

Multilabel Classification可以轉化為多個 isolated binary classification，這種方法叫做Binary Relevance(BR)。BR有以下缺點:

Isolation- hidden relations not exploited.
Imbalanced- few yes, many no

Regression: Patient Recovery Prediction Problem

Regression是指用ML產出的output是一個實數或者一個實數範圍，即 or

Sophisticated Output: Image Generation Problems

有一些output形式比較複雜的問題，如圖片的合成等圖片操作問題，這時output的維度極高。

Learning with Different Data Label

Supervised

Supervised learning: every come with corresponding

supervised(監督式)學習的data set中每個input都有對應的output。

Unsupervised

unsupervised learning: Learning without 用於以下議題(分群問題、密度檢測、異常檢測):

Self-supervised: Unsupervised+Self-defined Goal(s)

自監督學習是一種介於監督式與非監督式學習之間的機器學習方法。資料本身並沒有label，而是通過pretext task進行一個預訓練。

Semi-supervised

Semi-supervised learning: leverage unlabeled data to avoid 'expensive' labeling

半監督式學習在資料集中有一小部分labeled data而大部分是沒標註的。

Weakly-supervised

Weakly-supervised learning: another realistic family to reduce labeling burden.

獲得大量完整的labeled data是相對困難的，弱監督學習退而求其次選擇complementary label。

Reinforcement Learning

Reinforcement Learning: learn with 'partical/implicit information(often sequentially)'

沒有直接的label但是對機器不同的判斷給予不同的reward進行訓練。

Mini Summary

Learning with Different Protocol

Batch Learning

batch supervised multiclass classification: learn from all known data.

即一次性將所有data提供給機器做一次性的學習，非常常見的ML protocol。

Online Learning

online: hypothesis 'improves' through receiving data instances sequentially

在模型上線後依然由新獲取的data來更新自己的模型，以 spam detection為例:

而在現實的應用中，online和batch的模式常常是相互結合的。

Active Learning

Active: improve hypothesis with fewer labels(hopefully) by asking questions strategically

機器主動的詢問input對應的label來幫助改進自身模型。

Mini Summary of ML Protocol

Learning with Different Space

Feature可以分為:

Concrete Feature 具體特征
Raw Feature 原始特征
Abstract Features 抽象特征

Concrete Feature

Concrete features: the 'easy' one for ML

這種特征是非常具體且對機器訓練非常有直接幫助的，例如在硬幣分類中的硬幣尺寸這一特征。Concrete feature常常由人類思考得到。

Raw feature

Raw features: often need human("feature engineering") or machines to convert to concrete ones.

比較低階原始(常常是物理層面上的)的feature，如對於數字辨識中，考慮每個像素的特征則為raw feature，若考慮對稱性則為concrete feature

Raw feature往往需要被轉換為Concrete feature以更好的ML，這個過程可以通過人工或者機器完成，以下為機器的自動式轉換：

從圖片中截取pattern作為feature並判斷是否合適,重複該過程得到的pattern逐漸得到concrete feature。

Abstract Features

Abstract: again need 'feature conversion/extraction/construction'

比較抽象的feature，諸如學生的學號等，也需要轉換為concrete feature。

Mini Summary

参考

https://tzuruey.medium.com/neurips-day-7-self-supervised-learning-workshop-5ec57ce5eab1
https://www.csie.ntu.edu.tw/~htlin/course/ml21fall/

Roy's Blog