These single-pass, “one-touch” algorithms are fast (see example) to enable real-time processing capability.
Sketches can be represented in an updatable or compact form. The compact form is smaller, immutable and faster to merge.
Some of the Java sketches have been designed to be instantiated and operated off-heap, whicn eliminates costly serialization and deserialization.
The sketch data structures are “additive” and embarrassingly parallelizable. Sketches can be merged without losing accuracy.
Hash Seed Handling. Additional protection for managing hash seeds which is particularly important when processing sensitive user identifiers. Available with Theta Sketches.
Pre-Sampling. Built-in up-front sampling for cases where additional contol is required to limit overall memory consumption when dealing with millions of sketches. Available with Theta Sketches.
Memory Package. Large query systems often require their own heaps outside the JVM in order to better manage garbage collection latencies. The Java sketches utilize this powerful package.
Built-in Upper-Bound and Lower-Bound estimators. You are never in the dark about how good of an estimate the sketch is providing. All the sketches are able to estimate the upper and lower bounds of the estimate given a confidence level.
User configurable trade-offs of accuracy vs. storage space as well as other performance tuning options.
Small Footprint Per Sketch. The operating and storage footprint for both row and column oriented storage are minimized with compact binary representations, which are much smaller than the raw input stream and with a well defined upper bound of size.