Over the past few months I’ve been dealing with some champagne problems like, “How are we going to process all this data for so many customers?” (please pardon the humblebrag). The answer to problems like that is almost always to introduce a queing system of some sort.
I would like to share with you some guidelines that I’ve used for designing the SQS queues at Blissfully.
NOTE While these guidelines could apply to any queueing system, I am specifically using AWS’s Simple Queue Service (SQS) with AWS Lambda and my own sqs-lambda-bridge. If you need to use these technologies together, check it out!
For the highest throughput, you’d want to have only one queue, with lots of workers pulling jobs from it. There are a few general rules about when it makes sense to split your work into different queues.
If all jobs are otherwise equal, but some need to be completed relatively sooner than others, create a dedicated queue for time-critical work and add to it sparingly. You may create separate queues for different classes of customers: free users vs. paying customers, or you may have separate queues for batch vs. interactive uses.
2. Different jobs consume different resources
With Lambda we have amazing scalability, right? But not all functions are stateless. A function may make a connection to RDS, which has a limit of 1000 concurrent connections, or it could hit a 3rd party API that has strict rate limits. Each function should be sent to a queue named for the most scarce resource which it consumes.
Lambdafor stateless stuff that is only limited by your account’s Lambda concurrency.
Aurorafor functions which make a connection to the database. Set the
concurrencytag to something conservative with respect to your Aurora connection limit (max 1000 probably).
Some3rdPartyAPIfor a 3rd party API that you can’t hit too often.
There is exactly one O(1) operation in SQS and that is purgeQueue. If there’s a class of function invocation that we may need to cancel, dedicate a queue to to it. All SQS messages are immutable. Selectively deleting messages from the queue requires that you first recieve that message, making it invisible to other clients, and incrementing its recieve count, which may move it closer to the dead-letter queue, before you can delete it. In-flight messages are also invisible, so you won’t necessarily find everything on the first pass.
4. Queue-level features.
For example, normal queues can’t do deduplication or guarantee ordering, and FIFO queues can’t do delays on individual messages. If you need mutually exclusive features, make more queues.