API Snapshots: Java Core, C++ Core, Python, Memory, Pig, Hive,

Apache Pinot Integration

Apache Pinot has built-in support for most major sketch families from Apache Datasketches as aggregation and transformation functions in its SQL dialect.

Example:

select distinctCountThetaSketch(
  sketchCol,
  'nominalEntries=1024',
  'country'=''USA'' AND 'state'=''CA'', 'device'=''mobile'', 'SET_INTERSECT($1, $2)'
)
from table
where country = 'USA' or device = 'mobile...'

Cardinality Estimation

Quantiles

Frequent Items


Advanced Integration

Raw Output Mode

Supported functions have ‘raw’ variants which can output binary representations of sketches for further processing.

Example:

select percentileRawKll(ArrDelayMinutes, 90) as sketch
from airlineStats

Returns Base64 encoded string: BQEPC...

Output can be processed as:

byte[] decodedBytes = Base64.getDecoder().decode(encoded);
KllDoublesSketch sketch = KllDoublesSketch.wrap(Memory.wrap(decodedBytes));

System.out.println("Min, Median, Max values:");
System.out.println(Arrays.toString(sketch.getQuantiles(new double[]{0, 0.5, 1})));

Pre-built Sketch Ingestion

Apache Pinot can also ingest pre-built sketch objects either via Kafka (Realtime) or Spark (Batch) and merge them when doing aggregations.