A function for scheduling retries in a job queue
Let’s imagine, that you need to execute a task, that can not be completed successfully before some unpredictable point in the future. E.g., you are trying to pull an outcome of some asynchronous operation. You wish to get your task done possibly soon, so you are retrying until it eventually succeeds.
That is a problem which I have come across while integrating an on-line store with an ERP. The communication was one-directional, the e-commerce could talk to ERP, but not the other way. I needed to pull information about the order progress: if it has been accepted for fulfilment, if all products have been reserved, if it has been shipped etc.
You may be rescheduling tasks with different intervals depending on the characteristic of the problem. However, hitting counterpart system every minute will usually be just a waste of resources. If a shop has all ordered products in stock, it is likely that an order will be ready for shipping the same day, and otherwise it may take even a week. Let’s focus on that case.
We would start from short intervals, that would grow with time, up to some convenience maximum value. The function that behaves in this way is inverse tangent, arctan.
Of course, we need to transform it first. Number of already failed attempts, denoted as n, will be the input. The interval in minutes is the output. We would like to have no delay before the first execution: f(0)=0. The second attempt should take place about an hour after the first one: f(1)≈60. The maximum interval should be 24 hours (1440 minutes). After applying all these constraints we would get:
\(f(n) =1440 \frac{2}{\pi} \arctan (\frac{n^3}{15})\)
We use n3 to speed up the function ascendant. The \(\frac{2}{\pi}\) is the inverse of the upper bound of arctan. Below you can see a plot of values of this function for n ∈ <0, 20> expressed in hours:
And these are exact values in hours:
n | f(n)/60 | n | f(n)/60 | n | f(n)/60 | n | f(n)/60 |
---|---|---|---|---|---|---|---|
0 | 0.00 | 1 | 1.022 | 2 | 7.49 | 3 | 16.25 |
4 | 20.48 | 5 | 22.18 | 6 | 22.94 | 7 | 23.33 |
8 | 23.55 | 9 | 23.69 | 10 | 23.77 | 11 | 23.83 |
How could it be used? Let’s imagine having an MySQL based queue for some small workload. We could have a table tasks with columns id longint, done boolean, updated_at timestamp, attempts int, task blob. Then we could use the following query to fetch tasks for execution:
SELECT id, task FROM tasks WHERE NOT done AND updated_at < DATE_SUB(NOW(), INTERVAL 917 * atan(pow(attempts, 3)/15) MINUTE)
There are of course other functions, that could be used in similar situation, like sigmoid or sequence \({(1 + \frac{1}{n})}^n\).