Contents

0.6

scikits.learn 0.6 was released on december 2010. It is marked by the inclusion of several new modules and a general renaming of old ones. It is also marked by the inclusion of new example, including applications to real-world datasets.

banner1 banner2 banner3 banner4

Changelog

  • New stochastic gradient descent module by Peter Prettenhofer. The module comes with complete documentation and examples.
  • Improved svm module: memory consumption has been reduced by 50%, heuristic to automatically set class weights, possibility to assign weights to samples (see SVM: Weighted samples for an example).
  • New Gaussian Processes module by Vincent Dubourg. This module also has great documentation and some very neat examples. See Gaussian Processes regression: basic introductory example or Gaussian Processes classification example: exploiting the probabilistic output for a taste of what can be done.
  • It is now possible to use liblinear’s Multi-class SVC (option multi_class in svm.LinearSVC)
  • New features and performance improvements of text feature extraction.
  • Improved sparse matrix support, both in main classes (grid_search.GridSearchCV) as in modules scikits.learn.svm.sparse and scikits.learn.linear_model.sparse.
  • Lots of cool new examples and a new section that uses real-world datasets was created. These include: Faces recognition example using eigenfaces and SVMs, Species distribution modeling, Libsvm GUI, Wikipedia princial eigenvector and others.
  • Faster Least Angle Regression algorithm. It is now 2x faster than the R version on worst case and up to 10x times faster on some cases.
  • Faster coordinate descent algorithm. In particular, the full path version of lasso (linear_model.lasso_path()) is more than 200x times faster than before.
  • It is now possible to get probability estimates from a linear_model.LogisticRegression model.
  • module renaming: the glm module has been renamed to linear_model, the gmm module has been included into the more general mixture model and the sgd module has been included in linear_model.
  • Lots of bug fixes and documentation improvements.

People

People that made this release possible preceeded by number of commits:

0.5

Changelog

New classes

  • Support for sparse matrices in some classifiers of modules svm and linear_model (see svm.sparse.SVC, svm.sparse.SVR, svm.sparse.LinearSVC, linear_model.sparse.Lasso, linear_model.sparse.ElasticNet)
  • New pipeline.Pipeline object to compose different estimators.
  • Recursive Feature Elimination routines in module Feature selection.
  • Addition of various classes capable of cross validation in the linear_model module (linear_model.LassoCV, linear_model.ElasticNetCV, etc.).
  • New, more efficient LARS algorithm implementation. The Lasso variant of the algorithm is also implemented. See linear_model.lars_path, linear_model.LARS and linear_model.LassoLARS.
  • New Hidden Markov Models module (see classes hmm.GaussianHMM, hmm.MultinomialHMM, hmm.GMMHMM)
  • New module feature_extraction (see class reference)
  • New FastICA algorithm in module scikits.learn.fastica

Documentation

Fixes

  • API changes: adhere variable names to PEP-8, give more meaningful names.
  • Fixes for svm module to run on a shared memory context (multiprocessing).
  • It is again possible to generate latex (and thus PDF) from the sphinx docs.

Examples

External dependencies

  • Joblib is now a dependencie of this package, although it is shipped with (scikits.learn.externals.joblib).

Removed modules

  • Module ann (Artificial Neural Networks) has been removed from the distribution. Users wanting this sort of algorithms should take a look into pybrain.

Misc

  • New sphinx theme for the web page.

Authors

The following is a list of authors for this release, preceeded by number of commits:

  • 262 Fabian Pedregosa
  • 240 Gael Varoquaux
  • 149 Alexandre Gramfort
  • 116 Olivier Grisel
  • 40 Vincent Michel
  • 38 Ron Weiss
  • 23 Matthieu Perrot
  • 10 Bertrand Thirion
  • 7 Yaroslav Halchenko
  • 9 VirgileFritsch
  • 6 Edouard Duchesnay
  • 4 Mathieu Blondel
  • 1 Ariel Rokem
  • 1 Matthieu Brucher

0.4

Changelog

Major changes in this release include:

  • Coordinate Descent algorithm (Lasso, ElasticNet) refactoring & speed improvements (roughly 100x times faster).
  • Coordinate Descent Refactoring (and bug fixing) for consistency with R’s package GLMNET.
  • New metrics module.
  • New GMM module contributed by Ron Weiss.
  • Implementation of the LARS algorithm (without Lasso variant for now).
  • feature_selection module redesign.
  • Migration to GIT as content management system.
  • Removal of obsolete attrselect module.
  • Rename of private compiled extensions (aded underscore).
  • Removal of legacy unmaintained code.
  • Documentation improvements (both docstring and rst).
  • Improvement of the build system to (optionally) link with MKL.

Also, provide a lite BLAS implementation in case no system-wide BLAS is found.

  • Lots of new examples.
  • Many, many bug fixes ...

Authors

The committer list for this release is the following (preceded by number of commits):

  • 143 Fabian Pedregosa
  • 35 Alexandre Gramfort
  • 34 Olivier Grisel
  • 11 Gael Varoquaux
  • 5 Yaroslav Halchenko
  • 2 Vincent Michel
  • 1 Chris Filo Gorgolewski