U - Update typeS - Summary typepublic abstract class DataToSketch<U,S extends org.apache.datasketches.tuple.UpdatableSummary<U>>
extends org.apache.pig.EvalFunc<org.apache.pig.data.Tuple>
implements org.apache.pig.Accumulator<org.apache.pig.data.Tuple>
Note Strings as values are normally typed as DataType.CHARARRAY, which will be encoded as UTF-8 prior to being submitted to the sketch. If the user requires a different encoding for cross-platform compatibility, it is recommended that these values be encoded prior to being submitted in a DataBag and then typed as a DataType.BYTEARRAY.
| Constructor and Description |
|---|
DataToSketch(int sketchSize,
float samplingProbability,
org.apache.datasketches.tuple.SummaryFactory<S> summaryFactory)
Constructs a function given a sketch size, sampling probability and summary factory
|
DataToSketch(int sketchSize,
org.apache.datasketches.tuple.SummaryFactory<S> summaryFactory)
Constructs a function given a sketch size, summary factory and default
sampling probability of 1.
|
DataToSketch(org.apache.datasketches.tuple.SummaryFactory<S> summaryFactory)
Constructs a function given a summary factory, default sketch size and default
sampling probability of 1.
|
| Modifier and Type | Method and Description |
|---|---|
void |
accumulate(org.apache.pig.data.Tuple inputTuple) |
void |
cleanup() |
org.apache.pig.data.Tuple |
exec(org.apache.pig.data.Tuple inputTuple) |
org.apache.pig.data.Tuple |
getValue() |
allowCompileTimeCalculation, finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLoadCaster, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, getSchemaType, getShipFiles, isAsynchronous, needEndOfAllInputProcessing, outputSchema, progress, setEndOfAllInput, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warnpublic DataToSketch(org.apache.datasketches.tuple.SummaryFactory<S> summaryFactory)
summaryFactory - an instance of SummaryFactorypublic DataToSketch(int sketchSize,
org.apache.datasketches.tuple.SummaryFactory<S> summaryFactory)
sketchSize - parameter controlling the size of the sketch and the accuracy.
It represents nominal number of entries in the sketch. Forced to the nearest power of 2
greater than given value.summaryFactory - an instance of SummaryFactorypublic DataToSketch(int sketchSize,
float samplingProbability,
org.apache.datasketches.tuple.SummaryFactory<S> summaryFactory)
sketchSize - parameter controlling the size of the sketch and the accuracy.
It represents nominal number of entries in the sketch. Forced to the nearest power of 2
greater than given value.samplingProbability - parameter from 0 to 1 inclusivesummaryFactory - an instance of SummaryFactorypublic void accumulate(org.apache.pig.data.Tuple inputTuple)
throws IOException
accumulate in interface org.apache.pig.Accumulator<org.apache.pig.data.Tuple>IOExceptionpublic void cleanup()
cleanup in interface org.apache.pig.Accumulator<org.apache.pig.data.Tuple>public org.apache.pig.data.Tuple getValue()
getValue in interface org.apache.pig.Accumulator<org.apache.pig.data.Tuple>public org.apache.pig.data.Tuple exec(org.apache.pig.data.Tuple inputTuple)
throws IOException
exec in class org.apache.pig.EvalFunc<org.apache.pig.data.Tuple>IOExceptionCopyright © 2015–2020 The Apache Software Foundation. All rights reserved.