API Snapshots: Java Core, C++ Core, Python, Memory, Pig, Hive,

Apache Pinot Integration

Apache Pinot has built-in support for most major sketch families from Apache Datasketches as aggregation and transformation functions in its SQL dialect.


select distinctCountThetaSketch(
  'country'=''USA'' AND 'state'=''CA'', 'device'=''mobile'', 'SET_INTERSECT($1, $2)'
from table
where country = 'USA' or device = 'mobile...'

Cardinality Estimation


Frequent Items

Advanced Integration

Raw Output Mode

Supported functions have ‘raw’ variants which can output binary representations of sketches for further processing.


select percentileRawKll(ArrDelayMinutes, 90) as sketch
from airlineStats

Returns Base64 encoded string: BQEPC...

Output can be processed as:

byte[] decodedBytes = Base64.getDecoder().decode(encoded);
KllDoublesSketch sketch = KllDoublesSketch.wrap(Memory.wrap(decodedBytes));

System.out.println("Min, Median, Max values:");
System.out.println(Arrays.toString(sketch.getQuantiles(new double[]{0, 0.5, 1})));

Pre-built Sketch Ingestion

Apache Pinot can also ingest pre-built sketch objects either via Kafka (Realtime) or Spark (Batch) and merge them when doing aggregations.