Many people asked me which model to use for Spotify recommendations, so I wanted to share some insights. Here are the benchmarks for some models. Please note that not all of these are used in a production environment.

*Recommended model performance*

This particular benchmark looks at how well “related artists” can be ranked. More information about the model:

- vector_exp: Proprietary method. Latent factor method trained on all log data using Hadoop (events over 50B).
- word2vec: Google open source word2vec.. Train your model with subsampled (5%) playlist data using skipgrams and 40 factors.
- rnn: Recurrent neural network Trained in session data (user plays tracks in sequence). There are 40 nodes in each layer Hierarchical softmax With the output layer drop out For regularization.
- Koren: Collaborative filtering of implicit feedback datasets.. It is trained with the same data as vector_exp. Run on Hadoop, 40 factors.
- lda: Potential Diricle Allocation It runs on Hadoop using 400 topics with the same dataset as above.
- freebase: Training of latent factor models for artist entities Freebase Throw it away.
- plsa: Stochastic latent semantic analysisUses 40 elements and the same dataset / framework as above. More factors give significantly better results, but none can still compete with other models.

Again, not all of these models are in production. Conversely, other algorithms not included above are also in production. This is just a small part of what we have experimented with. In particular, it’s interesting to note that neither PLSA nor LDA are working very well.Taking the sequence into account (rnn, word2vec) seems to add a lot of value, but our best model (vector_exp) is pure Bag of words model.