在调用处理器的 onTrigger 方法的时候,传入了两个参数,ProcessContext context, final ProcessSession session
进入到ProcessContext 的代码:
注释写的比较清楚,为处理器和NIFI框架提供了一个桥梁,实际使用中,主要用来在处理器中获取页面上为处理器设置的各个属性。我们主要来关注 session 变量。从第一张图中可以看出,每次处理器执行,都会重新生成一个session,多线程时候每个线程都会生成一个。
ProcessSession session
从注释中,可以看到,session 包含了处理器可以对流文件进行的各种操作,获取、克隆,读取,修改,移除等,同时保证这些操作是原子的。session同单个处理器绑定,确保一个流文件在同一时间只能被一个处理器的一个线程访问到。
选择一个实现:
public final class StandardProcessSession implements ProcessSession, ProvenanceEventEnricher {
private static final int SOURCE_EVENT_BIT_INDEXES = (1 << ProvenanceEventType.CREATE.ordinal())
| (1 << ProvenanceEventType.FORK.ordinal())
| (1 << ProvenanceEventType.JOIN.ordinal())
| (1 << ProvenanceEventType.RECEIVE.ordinal())
| (1 << ProvenanceEventType.FETCH.ordinal());
private static final AtomicLong idGenerator = new AtomicLong(0L);
private static final AtomicLong enqueuedIndex = new AtomicLong(0L);
// determines how many things must be transferred, removed, modified in order to avoid logging the FlowFile ID's on commit/rollback
public static final int VERBOSE_LOG_THRESHOLD = 10;
public static final String DEFAULT_FLOWFILE_PATH = "./";
private static final Logger LOG = LoggerFactory.getLogger(StandardProcessSession.class);
private static final Logger claimLog = LoggerFactory.getLogger(StandardProcessSession.class.getSimpleName() + ".claims");
private static final int MAX_ROLLBACK_FLOWFILES_TO_LOG = 5;
private final Map<Long, StandardRepositoryRecord> records = new ConcurrentHashMap<>();
private final Map<String, StandardFlowFileEvent> connectionCounts = new ConcurrentHashMap<>();
private final Map<FlowFileQueue, Set<FlowFileRecord>> unacknowledgedFlowFiles = new ConcurrentHashMap<>();
private final Map<ContentClaim, ByteCountingOutputStream> appendableStreams = new ConcurrentHashMap<>();
private final RepositoryContext context;
private final TaskTermination taskTermination;
private final Map<FlowFile, Integer> readRecursionSet = new HashMap<>();// set used to track what is currently being operated on to prevent logic failures if recursive calls occurring
private final Set<FlowFile> writeRecursionSet = new HashSet<>();
private final Map<FlowFile, Path> deleteOnCommit = new HashMap<>();
private final long sessionId;
private final String connectableDescription;
private Map<String, Long> countersOnCommit;
private Map<String, Long> immediateCounters;
private final Set<String> removedFlowFiles = new HashSet<>();
private final Set<String> createdFlowFiles = new HashSet<>();
private final StandardProvenanceReporter provenanceReporter;
private int removedCount = 0; // number of flowfiles removed in this session
private long removedBytes = 0L; // size of all flowfiles removed in this session
private long bytesRead = 0L;
private long bytesWritten = 0L;
private int flowFilesIn = 0, flowFilesOut = 0;
private long contentSizeIn = 0L, contentSizeOut = 0L;
private ContentClaim currentReadClaim = null;
private ContentClaimInputStream currentReadClaimStream = null;
private long processingStartTime;
// List of InputStreams that have been opened by calls to {@link #read(FlowFile)} and not yet closed
private final Map<FlowFile, InputStream> openInputStreams = new ConcurrentHashMap<>();
// List of OutputStreams that have been opened by calls to {@link #write(FlowFile)} and not yet closed
private final Map<FlowFile, OutputStream> openOutputStreams = new ConcurrentHashMap<>();
// maps a FlowFile to all Provenance Events that were generated for that FlowFile.
// we do this so that if we generate a Fork event, for example, and then remove the event in the same
// Session, we will not send that event to the Provenance Repository
private final Map<FlowFile, List<ProvenanceEventRecord>> generatedProvenanceEvents = new HashMap<>();
// when Forks are generated for a single parent, we add the Fork event to this map, with the Key being the parent
// so that we are able to aggregate many into a single Fork Event.
private final Map<FlowFile, ProvenanceEventBuilder> forkEventBuilders = new HashMap<>();
private Checkpoint checkpoint = null;
private final ContentClaimWriteCache claimCache;
//
}
这里只显示了 session 里边的成员变量
从这两个变量,我们已经能大致猜测出。当我们在处理器中创建、删除、路由流文件FlowFile的时候,只是先在session中记录了一下,等commit 之后才会真正生效。
从第一张图第 28 行代码可以看到,处理器的 onTrigger 方法调用结束之后,会自动地调用session的commit方法。
以下逐个探究 session 中的这些方法
get()
rollback()
commit()
小结
session 是处理器处理结果的暂存,并通过rollBack , commit方法决定处理结果是否生效。