Michał’s notes

A function for scheduling retries in a job queue

Let’s imagine, that you need to execute a task, that can not be completed successfully before some unpredictable point in the future. E.g., you are trying to pull an outcome of some asynchronous operation. You wish to get your task done possibly soon, so you are retrying until it eventually succeeds.

That is a problem which I have come across while integrating an on-line store with an ERP. The communication was one-directional, the e-commerce could talk to ERP, but not the other way. I needed to pull information about the order progress: if it has been accepted for fulfilment, if all products have been reserved, if it has been shipped etc.

You may be rescheduling tasks with different intervals depending on the characteristic of the problem. However, hitting counterpart system every minute will usually be just a waste of resources. If a shop has all ordered products in stock, it is likely that an order will be ready for shipping the same day, and otherwise it may take even a week. Let’s focus on that case.

We would start from short intervals, that would grow with time, up to some convenience maximum value. The function that behaves in this way is inverse tangent, arctan.

Arctan plot

Of course, we need to transform it first. Number of already failed attempts, denoted as n, will be the input. The interval in minutes is the output. We would like to have no delay before the first execution: f(0)=0. The second attempt should take place about an hour after the first one: f(1)≈60. The maximum interval should be 24 hours (1440 minutes). After applying all these constraints we would get:

\(f(n) =1440 \frac{2}{\pi} \arctan (\frac{n^3}{15})\)

We use n3 to speed up the function ascendant. The \(\frac{2}{\pi}\) is the inverse of the upper bound of arctan. Below you can see a plot of values of this function for n ∈ <0, 20> expressed in hours:

Plot of number of attempts and following intervals.

And these are exact values in hours:

n f(n)/60 n f(n)/60 n f(n)/60 n f(n)/60
0 0.00 1  1.022 2 7.49 3 16.25
4 20.48 5 22.18 6 22.94 7 23.33
8 23.55 9 23.69 10 23.77 11 23.83

How could it be used? Let’s imagine having an MySQL based queue for some small workload. We could have a table tasks with columns id longint, done boolean, updated_at timestamp, attempts int, task blob. Then we could use the following query to fetch tasks for execution:

SELECT 
  id, task
FROM 
  tasks 
WHERE 
  NOT done
  AND updated_at < DATE_SUB(NOW(), INTERVAL 917 * atan(pow(attempts, 3)/15) MINUTE)

There are of course other functions, that could be used in similar situation, like sigmoid or sequence \({(1 + \frac{1}{n})}^n\).