We saw an example of this when the
Pow function detected overflow:
warn, a method of EvalFunc, takes a message that you provide as well as a warning code.
The warning codes are in org.apache.pig.PigWarning, including several user-defined
codes that you can use if none of the provided codes matches your situation. These
warnings are aggregated by Pig and reported to the user at the end of the job.
Warning and returning null is convenient because it allows your job to continue. When
you are processing billions of records, you do not want your job to fail because one
record out of all those billions had a chararray where you expected an int. Given enough
data, the odds are overwhelming that a few records will be bad, and most calculations
will be fine if a few data points are missing.
For errors that are not tolerable, your UDF can throw an exception. If Pig catches an
exception, it will assume that you are asking to stop everything, and it will cause the
task to fail. Hadoop will then restart your task. If any particular task fails three times,
Hadoop will not restart it again. Instead, it will kill all the other tasks and declare the
job a failure.
Pow function detected overflow:
for (int i = 0; i < exponent; i++) {
long preresult = result;
result *= base;
if (preresult > result) {
// We overflowed. Give a warning, but do not throw an
// exception.
warn("Overflow!", PigWarning.TOO_LARGE_FOR_INT);
// Returning null will indicate to Pig that we failed but
// we want to continue execution.
return null;
}
}
warn, a method of EvalFunc, takes a message that you provide as well as a warning code.
The warning codes are in org.apache.pig.PigWarning, including several user-defined
codes that you can use if none of the provided codes matches your situation. These
warnings are aggregated by Pig and reported to the user at the end of the job.
Warning and returning null is convenient because it allows your job to continue. When
you are processing billions of records, you do not want your job to fail because one
record out of all those billions had a chararray where you expected an int. Given enough
data, the odds are overwhelming that a few records will be bad, and most calculations
will be fine if a few data points are missing.
For errors that are not tolerable, your UDF can throw an exception. If Pig catches an
exception, it will assume that you are asking to stop everything, and it will cause the
task to fail. Hadoop will then restart your task. If any particular task fails three times,
Hadoop will not restart it again. Instead, it will kill all the other tasks and declare the
job a failure.