I have an API that I want to allow outside access to, but it needs to be protected from abuse. I wanted to limit the number of hits from a given source in a way that does not require tracking every single hit in database. Specifically I wanted a method to track millions of sources without requiring insane amounts of disk or memory.
I ended up with the concept of a "bucket" of hits in a given time window. This groups all the hits for a source into a bucket of time which makes it easier to track. My back of the envelope math means that each source should require between 40 and 50 bytes of memory to track. This makes it feasible to track and limit a million clients using about ~50MB of memory. Some type of caching system is required. Memcached, Redis, or a simple in-memory key/value cache would work perfectly.