'fastText' Wrapper for Text Classification and Word Representation • fastrtext

R Documentation | Release Notes | FAQ | Multilingual pretrained models

R wrapper for fastText C++ code from Facebook.

FastText is an open-source, free, lightweight library that allows users to learn text representations and text classifiers. It works on standard, generic hardware. Models can later be reduced in size to even fit on mobile devices.

License

Installation

You can install the fastrtext package from Cran or Github as follows:

# From Cran
install.packages("fastrtext")

# From Github
# install.packages("devtools")
devtools::install_github("pommedeterresautee/fastrtext")

Documentation

All the updated documentation can be reached at this address.

API

API documentation can be reached at this address.

In particular, command line options are listed there.

Supervised learning (text classification)

Data for a multi-class task are embedded in this package.
Follow this link to learn a model and then measure the accuracy in 5 minutes.

Unsupervised learning (word representation)

Data for a word representation learning task are embedded in this package.
Following this link will route you to a 5mn tutorial to learn vectorial representation of words (aka word embeddings):

Alternatives

Why not use the command line client?

You can call the client from the client using system("fasttext ...") ;
To get prediction, you will need to write file, make predictions from the command line, then read the results ;
fastrtext makes your life easier by making all these operations in memory ;
It takes less time, and use less commands ;
Easy to install from R directly.

Why not use fastTextR ?

fastrtext implements both supervised and unsupervised parts of fastText (fastTextR implements only the unsupervised part) ;
with fastrtext, predictions can be done in memory (fastTextR requires to write the sentence on hard drive and requires you to read the predictions after) ;
fastText original source code embedded in fastTextR is not up to date (miss several new features, bug fixes since January 2017).

References

Please cite 1 if using this code for learning word representations or 2 if using for text classification.

Enriching Word Vectors with Subword Information

[1] P. Bojanowski*, E. Grave*, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information

@article{bojanowski2016enriching,
  title={Enriching Word Vectors with Subword Information},
  author={Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas},
  journal={arXiv preprint arXiv:1607.04606},
  year={2016}
}

Bag of Tricks for Efficient Text Classification

[2] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification

@article{joulin2016bag,
  title={Bag of Tricks for Efficient Text Classification},
  author={Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Mikolov, Tomas},
  journal={arXiv preprint arXiv:1607.01759},
  year={2016}
}

FastText.zip: Compressing text classification models

[3] A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, T. Mikolov, FastText.zip: Compressing text classification models

@article{joulin2016fasttext,
  title={FastText.zip: Compressing text classification models},
  author={Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Douze, Matthijs and J{\'e}gou, H{\'e}rve and Mikolov, Tomas},
  journal={arXiv preprint arXiv:1612.03651},
  year={2016}
}

(* These authors contributed equally.)