API Snapshots: Java Core, Memory, Pig, Hive,

Sketch Features Matrix

Use the following table to compare the capabilities of the different sketch families.

All sketches have a posteriori error bounds methods.

SketchLanguagesSet OperationsSystem IntegrationsMisc.
TypeClass NameJavaC++Python7UnionInter-sectionDifferenceJaccardHivePigDruid1Spark2PostgreSQL (C++)Con-currentCompactOff Java Heap
Major Sketches
Cardinality/CPC CpcSketch Y Y Y Y Y Y Y Y
Cardinality/HLL HllSketch Y Y Y Y Y Y Y Y Y
Cardinality/Theta Sketch Y Y Y Y Y Y Y4 Y Y Y Y Y Y Y Y
Cardinality/Tuple Sketch<S> Y Y Y Y Y Y Y
Quantiles/Cormode DoublesSketch Y Y Y Y Y Y Y
Quantiles/Cormode ItemsSketch<T> Y Y Y Y
Quantiles/KLL FloatsSketch Y Y Y6Y Y Y Y
Quantiles/KLL KLLSketch<T> Y Y
Quantiles/REQ FloatsSketch Y Y Y6
Frequencies LongsSketch Y Y Y Y
Frequencies ItemsSketch<T> Y Y Y Y Y Y Y5
Sampling/ReserviorReservoirLongsSketch Y Y
Sampling/ReservoirReserviorItemsSketch<T>Y Y Y
Sampling/VarOpt VarOptItemsSketch<T> Y Y Y Y Y
Specialty Sketches
Cardinality/FM85 UniqueCountMap Y
Cardinality/Tuple FdtSketch Y Y Y Y
Cardinality/Tuple ArrayOfDoublesSketch Y Y Y Y Y Y Y Y Y
Cardinality/Tuple DoubleSketch Y Y Y Y
Cardinality/Tuple IntegerSketch Y Y Y Y
Cardinality/Tuple ArrayOfStringsSketch Y Y Y Y
Cardinality/Tuple EngagementTest3 Y Y Y Y

1 Integrated into Druid.
2 Spark Example Code on website. Theta Sketch is the only one we have tried in Spark, it doesn’t mean other sketches cannot be used.
3 Tuple Sketch: Example Code in test/…/tuple/aninteger.
4 Theta Sketch: C++/Python has no implementaion of the Jaccard, yet.
5 Frequent Items Sketch: PostgreSQL implemented for Strings only.
6 KLL & REQ Sketch: Python implemented for both just floats and ints.
7 See Python Install Instructions

Definitions

Type

See Research/References for references in […]

  • Cardinality/CPC Implementation and extension of [LAN17].
  • Cardinality/HLL Derivation and extension of [FFGM07]
  • Cardinality/Theta Derivation and extension of [DLRT16].
  • Cardinality/Tuple An Extension of the Theta family that adds attributes to each hash-key.
  • Quantiles/Cormode Derivation and extension of [AC+13]
  • Quantiles/KLL Derivation and extension of [KLL16].
  • Frequencies Derivation and extension of [ABL+17].
  • Sampling/Reservior Derivation and extension of [K98], Vol 2, Section 3.4.2, Algorithm R.
  • Sampling/VarOpt Derivation and extension of [CDKLT09].