mCaptcha-cache/README.md

201 lines
5.7 KiB
Markdown
Raw Normal View History

2021-06-01 15:45:24 +03:00
<div align="center">
<h1>mCaptcha Cache</h1>
<p>
<strong>
Redis module that implements
<a href="https://en.wikipedia.org/wiki/Leaky_bucket"
>leaky bucket algorithm</a
>
</strong>
</p>
2021-06-05 15:08:40 +03:00
[![CI Linux)](https://github.com/mCaptcha/cache/actions/workflows/linux.yml/badge.svg)](https://github.com/mCaptcha/cache/actions/workflows/linux.yml)
2021-06-01 15:45:24 +03:00
[![AGPL License](https://img.shields.io/badge/license-AGPL-blue.svg?style=flat-square)](http://www.gnu.org/licenses/agpl-3.0)
2021-06-05 15:08:40 +03:00
[![dependency status](https://deps.rs/repo/github/mCaptcha/cache/status.svg)](https://deps.rs/repo/github/mCaptcha/cache)
<br />
2021-06-01 15:45:24 +03:00
[![Chat](https://img.shields.io/badge/matrix-+mcaptcha:matrix.batsense.net-purple?style=flat-square)](https://matrix.to/#/+mcaptcha:matrix.batsense.net)
</div>
2021-06-03 14:49:05 +03:00
## Features
- [x] Timers for individual count
2021-06-04 15:51:04 +03:00
- [x] Clustering
2021-06-03 14:49:05 +03:00
- [x] Persistence through RDB
- [ ] Persistence through AOF
2021-06-01 15:45:24 +03:00
## Motivation
[mCaptcha](https://github.com/mCaptcha/mCaptcha) uses a [leaky-
bucket](https://en.wikipedia.org/wiki/Leaky_bucket)-enabled counter to
keep track of traffic/challenge requests.
2021-06-02 13:24:22 +03:00
- At `t=0`(where `t` is time), if someone is visiting an mCaptcha-protected website, the
2021-06-01 15:45:24 +03:00
counter for that website will be initialized and set to 1.
- It should also automatically decrement(by 1) after a certain period, say
`t=cooldown`. We call this cool down period and is constant for a
website.
- If at `t=x`(where `x<cooldown`), another user visits the same website,
2021-06-02 13:24:22 +03:00
the counter becomes 2 and will auto decrement at `t = cooldown + x`
for second user.
2021-06-01 15:45:24 +03:00
Note that, for the decrement to work, we require two different timers
that goes off at two different instants. The current(`v0.1.3`) of
[`libmcaptcha`](https://github.com/mCaptcha/libmcaptcha/) implements
this with internal data structures and timers --- something that can't
be shared across several machines in a distributed setting.
So we figured we'd use Redis to solve this problem and get
synchronisation and persistence for free.
This Redis module implements auto decrement on a special
data type(which is also defined in this module).
## How does it work?
If a timer is supposed to go off to
decrement key `myCounter` at `t=y`(where y is an instant in future),
1. A hashmap called `mcaptcha_cache:decrement:y`(prefix might vary) is
created with key-value pairs `keyName: DecrementCount`(`myCounter: 1` in
our case)
2. A timer will be created to go off at `t=y`
3. Any further decrement operations that are scheduled for `t=y` are
registered with the same hashmap(`mcaptcha_cache:decrement:y`).
4. At `t=y`, a procedure will be executed to read
all values of the hashmap(`mcaptcha_cache:decrement:y`) and performs
all registered decrements. When its done, it cleans itself up.
This way, we are not spinning timers for every decrement operation but
2021-06-03 15:00:38 +03:00
instead, one for every "pocket".
2021-06-02 15:18:46 +03:00
2021-06-02 16:24:39 +03:00
### Gotchas:
2021-06-02 15:18:46 +03:00
2021-06-03 15:00:38 +03:00
This module creates and manages data of three types:
2021-06-02 15:18:46 +03:00
1. `mcaptcha_cache:captcha:y` where `y`(last character) is variable
2. `mcaptcha_cache:pocket:x` where `x`(last character) is variable
2021-06-03 15:00:38 +03:00
3. `mcaptcha:timer:z` where `z`(last character) is pocket name from
step 2(See [Hacks](#hacks)).
2021-06-02 15:18:46 +03:00
**WARNING: Please don't modify these manually. If you do so, then Redis
will panic**
This module is capable of cleaning up after itself so manual clean up is
unnecessary. If you have needs that are not met my this module and you
which access/mutate data manually, please open an
[issue](https://github.com/mCaptcha/cache/issues). I'd be happy to help.
2021-06-02 16:24:39 +03:00
## Usage
There are two ways to run `cache`:
1. [Using docker](#docker)
2. [On bare-metal](#bare-metal)
### Docker
2021-06-02 17:26:03 +03:00
Use image from DockerHub:
```bash
2021-06-05 16:18:42 +03:00
$ docker run -p 6379:6379 mcaptcha/cache
2021-06-02 17:26:03 +03:00
```
or build from source:
2021-06-02 16:24:39 +03:00
#### Build
```bash
2021-06-05 16:18:42 +03:00
$ docker build -t mcaptcha/cache .
2021-06-02 16:24:39 +03:00
```
#### Run
```bash
2021-06-05 16:18:42 +03:00
$ docker run -p 6379:6379 mcaptcha/cache
2021-06-02 16:24:39 +03:00
```
### Bare-metal
#### Build
Make sure you have Rust installed:
https://www.rust-lang.org/tools/install
Then, build as usual:
```bash
cargo build --release
```
#### Run
```
redis-server --loadmodule ./target/release/libcache.so
```
2021-06-02 15:18:46 +03:00
### Commands
Every counter has a name and a leak-rate in seconds.
## Create/Increment counter
If counter exists, then count is incremented. Otherwise, it is created.
```redis
2021-06-05 15:19:54 +03:00
MCAPTCHA_CACHE.COUNT <counter-name> <leak-rate-in-seconds>
```
## Get counter value
```redis
MCAPTCHA_CACHE.GET <counter-name>
2021-06-02 15:18:46 +03:00
```
2021-06-02 17:04:26 +03:00
## Benchmark
**NOTE:** These benchmarks are for reference only. Do not depend upon
them too much. When in doubt, please craft and run benchmarks that are
better suited to your workload.
2021-06-03 14:49:05 +03:00
To run benchmarks locally, launch Redis server with module loaded and:
2021-06-02 17:10:02 +03:00
2021-06-02 17:04:26 +03:00
```bash
2021-06-03 14:49:05 +03:00
$ ./scripts/bench.sh
2021-06-02 17:04:26 +03:00
```
2021-06-02 17:10:02 +03:00
2021-06-03 14:49:05 +03:00
- platform: `Intel core i7-9750h`
2021-06-02 17:10:02 +03:00
```bash
2021-06-03 14:49:05 +03:00
running set and get without pipelining
2021-06-05 11:47:24 +03:00
SET: 125046.89 requests per second, p50=0.199 msec
GET: 124502.00 requests per second, p50=0.199 msec
2021-06-03 14:49:05 +03:00
mCaptcha cache without piplining
2021-06-05 11:47:24 +03:00
MCAPTCHA_CACHE.COUNT mycounter 45: 124828.37 requests per second, p50=0.215 msec
2021-06-02 17:10:02 +03:00
2021-06-03 14:49:05 +03:00
running set and get with pipelining
2021-06-05 11:47:24 +03:00
SET: 1353179.88 requests per second, p50=0.487 msec
GET: 1633987.00 requests per second, p50=0.383 msec
2021-06-02 17:10:02 +03:00
2021-06-03 14:49:05 +03:00
mCaptcha cache with piplining
2021-06-05 11:47:24 +03:00
MCAPTCHA_CACHE.COUNT mycounter 45: 385653.69 requests per second, p50=1.959 msec
2021-06-02 17:10:02 +03:00
```
2021-06-03 15:00:38 +03:00
## Hacks
I couldn't find any ways to persist timers to disk(`RDB`/`AOF`). So I'm
using a dummy record(`mcaptcha:timer:*` see [Gotchas](#gotchas)) which
will expire after an arbitrary time(see `POCKET_EXPIRY_OFFSET` in
[`lib.rs`](./src/lib.rs)). When that expiry occurs, I derive the key of
the pocket from the values that are passed to expiration event handlers
and perform clean up of both the pocket and counters registered with the
pocket.
Ideally, I should be able to persist timers but I couldn't find ways to
do that.