Datasets

These datasets are public available for research.

Sensitivity Analysis datasets (used to test input relevance of supervised learning models):

Synthetic datasets:

Regression:

  • ssin.csv
  • psin.csv
  • int2.csv
  • tree.csv
  • fri1.csv
  • Classification

  • ssin-2.csv -- ssin-2c and ssin-2p
  • ssin-n2p.csv
  • int2-3c.csv
  • int2-8p.csv
  • Real-World datasets:

  • Bank dataset: bank.csv -- dataset in .csv format and bank-names.txt -- description of the dataset. Also available at UCI respository
  • cmc -- available at UCI repository
  • servo -- available at UCI repository
  • white wine quality -- available at UCI repository
  • Please cite this reference as a source for the synthetic datasets:

  • P. Cortez and M. Embrechts. Using Sensitivity Analysis and Visualization Techniques to Open Black Box Data Mining Models. In Information Sciences, Elsevier, 225:1-17, March 2013. http://dx.doi.org/10.1016/j.ins.2012.10.039

  • Return to: Paulo Cortez Downloads Paulo Cortez Home Page