2015년 7월 16일 목요일

Dealing with Imbalanced Data Set

1. Use PR(precision-recall) curve instead of AUROC(area under receiver operatic characteristic curve).

- Stack Exchange: http://stats.stackexchange.com/questions/64047/effective-validity-of-auroc-as-performance-measure-what-about-very-high-auroc


2. Balance data by down sampling or up sampling

3. Optimize threshold (= cutoff)

4. ...and more to be explored

댓글 없음:

댓글 쓰기