public class DataToSketch
extends org.apache.pig.EvalFunc<org.apache.pig.data.Tuple>
implements org.apache.pig.Accumulator<org.apache.pig.data.Tuple>, org.apache.pig.Algebraic
| Modifier and Type | Class and Description |
|---|---|
static class |
DataToSketch.Initial
Class used to calculate the initial pass of an Algebraic sketch operation.
|
static class |
DataToSketch.IntermediateFinal
Class used to calculate the intermediate or final combiner pass of an Algebraic sketch
operation.
|
| Constructor and Description |
|---|
DataToSketch()
Default constructor.
|
DataToSketch(int nomEntries,
float p,
long seed)
Base constructor.
|
DataToSketch(String nomEntriesStr)
String constructor.
|
DataToSketch(String nomEntriesStr,
String pStr)
String constructor.
|
DataToSketch(String nomEntriesStr,
String pStr,
String seedStr)
Full string constructor.
|
| Modifier and Type | Method and Description |
|---|---|
void |
accumulate(org.apache.pig.data.Tuple inputTuple)
An Accumulator version of the standard exec() method.
|
void |
cleanup()
Cleans up the UDF state after being called using the
Accumulator interface. |
org.apache.pig.data.Tuple |
exec(org.apache.pig.data.Tuple inputTuple)
Top-level exec function.
|
String |
getFinal() |
String |
getInitial() |
String |
getIntermed() |
org.apache.pig.data.Tuple |
getValue()
Returns the sketch that has been built up by multiple calls to
accumulate(org.apache.pig.data.Tuple). |
org.apache.pig.impl.logicalLayer.schema.Schema |
outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input) |
allowCompileTimeCalculation, finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLoadCaster, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, getSchemaType, getShipFiles, isAsynchronous, needEndOfAllInputProcessing, progress, setEndOfAllInput, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warnpublic DataToSketch()
public DataToSketch(String nomEntriesStr)
nomEntriesStr - See Nominal Entriespublic DataToSketch(String nomEntriesStr, String pStr)
nomEntriesStr - See Nominal EntriespStr - See Sampling Probability, ppublic DataToSketch(String nomEntriesStr, String pStr, String seedStr)
nomEntriesStr - See Nominal Entries.pStr - See Sampling Probability, p.seedStr - See Update Hash Seed.public DataToSketch(int nomEntries,
float p,
long seed)
nomEntries - See Nominal Entries.p - See Sampling Probability, p.seed - See Update Hash Seed.public org.apache.pig.data.Tuple exec(org.apache.pig.data.Tuple inputTuple)
throws IOException
If a large number of calls is anticipated, leveraging either the Algebraic or Accumulator interfaces is recommended. Pig normally handles this automatically.
Internally, this method presents the inner Datum Tuples to a new Sketch, which is returned as a Sketch Tuple
Input Tuple
Note Strings as values are normally typed as DataType.CHARARRAY, which will be encoded as UTF-8 prior to being submitted to the sketch. If the user requires a different encoding for cross-platform compatibility, it is recommended that these values be encoded prior to being submitted in a DataBag and then typed as a DataType.BYTEARRAY.
Sketch Tupleexec in class org.apache.pig.EvalFunc<org.apache.pig.data.Tuple>inputTuple - A tuple containing a single bag, containing Datum Tuples.IOException - from Pig.public org.apache.pig.impl.logicalLayer.schema.Schema outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
outputSchema in class org.apache.pig.EvalFunc<org.apache.pig.data.Tuple>public void accumulate(org.apache.pig.data.Tuple inputTuple)
throws IOException
accumulate in interface org.apache.pig.Accumulator<org.apache.pig.data.Tuple>inputTuple - A tuple containing a single bag, containing Datum Tuples.IOException - by Pigexec(org.apache.pig.data.Tuple),
"org.apache.pig.Accumulator.accumulate(org.apache.pig.data.Tuple)"public org.apache.pig.data.Tuple getValue()
accumulate(org.apache.pig.data.Tuple).getValue in interface org.apache.pig.Accumulator<org.apache.pig.data.Tuple>exec(org.apache.pig.data.Tuple) for return tuple format)public void cleanup()
Accumulator interface.cleanup in interface org.apache.pig.Accumulator<org.apache.pig.data.Tuple>public String getInitial()
getInitial in interface org.apache.pig.Algebraicpublic String getIntermed()
getIntermed in interface org.apache.pig.Algebraicpublic String getFinal()
getFinal in interface org.apache.pig.AlgebraicCopyright © 2015–2020 The Apache Software Foundation. All rights reserved.