Use the following table to compare the capabilities of the different sketch families.
All sketches have a posteriori error bounds methods.
Sketch | Languages | Set Operations | System Integrations | Misc. | ||||||||||||||
Type | Class Name | Java | C++ | Python7 | Union | Inter-section | Difference | Jaccard | Hive | Pig | Druid1 | Spark2 | PostgreSQL (C++) | Con-current | Compact | Off Java Heap | ||
Major Sketches | ||||||||||||||||||
Cardinality/CPC | CpcSketch | Y | Y | Y | Y | Y | Y | Y | Y | |||||||||
Cardinality/HLL | HllSketch | Y | Y | Y | Y | Y | Y | Y | Y | Y | ||||||||
Cardinality/Theta | Sketch | Y | Y | Y | Y | Y | Y | Y4 | Y | Y | Y | Y | Y | Y | Y | Y | ||
Cardinality/Tuple | Sketch<S> | Y | Y | Y | Y | Y | Y | Y | ||||||||||
Quantiles/Cormode | DoublesSketch | Y | Y | Y | Y | Y | Y | Y | ||||||||||
Quantiles/Cormode | ItemsSketch<T> | Y | Y | Y | Y | |||||||||||||
Quantiles/KLL | KllDoublesSketch | Y | Y | Y6 | Y | Y | Y | Y | Y | |||||||||
Quantiles/KLL | KllFloatsSketch | Y | Y | Y6 | Y | Y | Y | Y | Y | Y | ||||||||
Quantiles/KLL | KLLSketch<T> | Y | Y | |||||||||||||||
Quantiles/REQ | FloatsSketch | Y | Y | Y6 | ||||||||||||||
Frequencies | LongsSketch | Y | Y | Y | Y | |||||||||||||
Frequencies | ItemsSketch<T> | Y | Y | Y | Y | Y | Y | Y5 | ||||||||||
Sampling/Reservior | ReservoirLongsSketch | Y | Y | |||||||||||||||
Sampling/Reservoir | ReserviorItemsSketch<T> | Y | Y | Y | ||||||||||||||
Sampling/VarOpt | VarOptItemsSketch<T> | Y | Y | Y | Y | Y | ||||||||||||
Specialty Sketches | ||||||||||||||||||
Cardinality/FM85 | UniqueCountMap | Y | ||||||||||||||||
Cardinality/Tuple | FdtSketch | Y | Y | Y | Y | |||||||||||||
Cardinality/Tuple | ArrayOfDoublesSketch | Y | Y | Y | Y | Y | Y | Y | Y | Y | ||||||||
Cardinality/Tuple | DoubleSketch | Y | Y | Y | Y | |||||||||||||
Cardinality/Tuple | IntegerSketch | Y | Y | Y | Y | |||||||||||||
Cardinality/Tuple | ArrayOfStringsSketch | Y | Y | Y | Y | |||||||||||||
Cardinality/Tuple | EngagementTest3 | Y | Y | Y | Y |
1 Integrated into Druid.
2 Spark Example Code on website. Theta Sketch is the only one we have tried in Spark, it doesn’t mean other sketches cannot be used.
3 Tuple Sketch: Example Code in test/…/tuple/aninteger.
4 Theta Sketch: C++/Python has no implementaion of the Jaccard, yet.
5 Frequent Items Sketch: PostgreSQL implemented for Strings only.
6 KLL & REQ Sketch: Python implemented for both just floats and ints.
7 See Python Install Instructions
See Research/References for references in […]