Bucketing: A Technique To Reduce Train Time Complexity For Seq2Seq Model

Picture By Marina On Unsplash

Sequence to sequence models have got great applications in many nlp tasks like building a chatbot,machine translation,question-answering etc. Here we will discuss a small enhancement of it by adding a concept called bucketing.

Why Need Bucketing:

Let us consider we have input sequence of different length. In that case we are supposed to pad before feeding into our model. However padding can be quite intimidating if we have sequences where majority are of short length and few of them are of very long length,which is quite possible in the real world. For example in a movie review corpus we can have reviews of length 10–20 words and some of them can be of length 1000. In that case we need to pad most of the sentences with a length of 900 tokens. This might make the whole training process slow exoribitantly. So to do away with this problem we will apply the technique called bucketing.


In this process first we shall sort the sequence by length. And all the input sequence shall be provided with bucket of various length. For example while training for sequence of length between 5–10 we can assign all of them a bucket of length 10. In that case we shall pad all the sequences to a length of 10. Similarly we do that for the further sequences. The following image shows as intuitive idea of bucketing.

(Source: StackOverflow)

Bucketing is implemented while dividing the data into many batches. Each batch shall be assigned different buckets. This way we can reduce the time step for different batches while training the sequence to sequence model.

This small but efficient technique can be considered as an enhancement to our sequence to sequence model which can boost up the model processing speed.

NLP Engineer