When dealing with real datasets in machine learning or data mining, we quite frequently encounter a 2 category classification task. However, to add to our agony the dataset is skewed. This means samples from one class are more in number than the other. There are a few well know techniques to get around the problem. Access my short PPT on “Technique to Handle Imbalanced Datasets“.
One of the most popular techniques to handle data imbalance is SMOTE (synthetic minority over sampling technique). You can access the source code for SMOTE by our team here on Matlab Central. Acknowledgement for the code : Atin Mathur, Ardhendhu Shekhar Tripathi.
(Code acknowledgement : Ardhendhu Tripathi, Atin Mathur)
- Kotsiantis, Sotiris, Dimitris Kanellopoulos, and Panayiotis Pintelas. “Handling imbalanced datasets: A review.” GESTS International Transactions on Computer Science and Engineering 30.1 (2006): 25-36.
- Chawla, Nitesh V., et al. “SMOTE: synthetic minority over-sampling technique.” arXiv preprint arXiv:1106.1813 (2011).
- Ha, Thien M., and Horst Bunke. “Off-line, handwritten numeral recognition by perturbation method.” Pattern Analysis and Machine Intelligence, IEEE Transactions on 19.5 (1997): 535-539.