Machine Learning : Handling Imbalanced Datasets

When dealing with real datasets in machine learning or data mining, we quite frequently encounter a 2 category classification task. However, to add to our agony the dataset is skewed. This means samples from one class are more in number than the other. There are a few well know techniques to get around the problem. Access my short PPT on “Technique to Handle Imbalanced Datasets“.

One of the most popular techniques to handle data imbalance is SMOTE (synthetic minority over sampling technique). You can access the source code for SMOTE by our team here on Matlab Central. Acknowledgement for the code : Atin Mathur, Ardhendhu Shekhar Tripathi.

Download Code:–synthetic-minority-over-sampling-technique-

(Code acknowledgement : Ardhendhu Tripathi, Atin Mathur)


  1. Kotsiantis, Sotiris, Dimitris Kanellopoulos, and Panayiotis Pintelas. “Handling imbalanced datasets: A review.” GESTS International Transactions on Computer Science and Engineering 30.1 (2006): 25-36.
  2. Chawla, Nitesh V., et al. “SMOTE: synthetic minority over-sampling technique.” arXiv preprint arXiv:1106.1813 (2011).
  3. Ha, Thien M., and Horst Bunke. “Off-line, handwritten numeral recognition by perturbation method.” Pattern Analysis and Machine Intelligence, IEEE Transactions on 19.5 (1997): 535-539.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s