NIFI源码学习-（三）ProcessSession

最新推荐文章于 2024-04-11 18:13:30 发布

东南_bit

最新推荐文章于 2024-04-11 18:13:30 发布

阅读量696

点赞数 1

分类专栏： NIFI 文章标签： java 大数据

本文链接：https://blog.csdn.net/qq_26394851/article/details/124176456

版权

NIFI 专栏收录该内容

10 篇文章 11 订阅

订阅专栏

在调用处理器的 onTrigger 方法的时候，传入了两个参数，ProcessContext context, final ProcessSession session

进入到ProcessContext 的代码：

注释写的比较清楚，为处理器和NIFI框架提供了一个桥梁，实际使用中，主要用来在处理器中获取页面上为处理器设置的各个属性。我们主要来关注 session 变量。从第一张图中可以看出，每次处理器执行，都会重新生成一个session，多线程时候每个线程都会生成一个。

ProcessSession session

从注释中，可以看到，session 包含了处理器可以对流文件进行的各种操作，获取、克隆，读取，修改，移除等，同时保证这些操作是原子的。session同单个处理器绑定，确保一个流文件在同一时间只能被一个处理器的一个线程访问到。

选择一个实现：

public final class StandardProcessSession implements ProcessSession, ProvenanceEventEnricher {
    private static final int SOURCE_EVENT_BIT_INDEXES = (1 << ProvenanceEventType.CREATE.ordinal())
        | (1 << ProvenanceEventType.FORK.ordinal())
        | (1 << ProvenanceEventType.JOIN.ordinal())
        | (1 << ProvenanceEventType.RECEIVE.ordinal())
        | (1 << ProvenanceEventType.FETCH.ordinal());

    private static final AtomicLong idGenerator = new AtomicLong(0L);
    private static final AtomicLong enqueuedIndex = new AtomicLong(0L);

    // determines how many things must be transferred, removed, modified in order to avoid logging the FlowFile ID's on commit/rollback
    public static final int VERBOSE_LOG_THRESHOLD = 10;
    public static final String DEFAULT_FLOWFILE_PATH = "./";

    private static final Logger LOG = LoggerFactory.getLogger(StandardProcessSession.class);
    private static final Logger claimLog = LoggerFactory.getLogger(StandardProcessSession.class.getSimpleName() + ".claims");
    private static final int MAX_ROLLBACK_FLOWFILES_TO_LOG = 5;

    private final Map<Long, StandardRepositoryRecord> records = new ConcurrentHashMap<>();
    private final Map<String, StandardFlowFileEvent> connectionCounts = new ConcurrentHashMap<>();
    private final Map<FlowFileQueue, Set<FlowFileRecord>> unacknowledgedFlowFiles = new ConcurrentHashMap<>();
    private final Map<ContentClaim, ByteCountingOutputStream> appendableStreams = new ConcurrentHashMap<>();
    private final RepositoryContext context;
    private final TaskTermination taskTermination;
    private final Map<FlowFile, Integer> readRecursionSet = new HashMap<>();// set used to track what is currently being operated on to prevent logic failures if recursive calls occurring
    private final Set<FlowFile> writeRecursionSet = new HashSet<>();
    private final Map<FlowFile, Path> deleteOnCommit = new HashMap<>();
    private final long sessionId;
    private final String connectableDescription;

    private Map<String, Long> countersOnCommit;
    private Map<String, Long> immediateCounters;

    private final Set<String> removedFlowFiles = new HashSet<>();
    private final Set<String> createdFlowFiles = new HashSet<>();

    private final StandardProvenanceReporter provenanceReporter;

    private int removedCount = 0; // number of flowfiles removed in this session
    private long removedBytes = 0L; // size of all flowfiles removed in this session
    private long bytesRead = 0L;
    private long bytesWritten = 0L;
    private int flowFilesIn = 0, flowFilesOut = 0;
    private long contentSizeIn = 0L, contentSizeOut = 0L;

    private ContentClaim currentReadClaim = null;
    private ContentClaimInputStream currentReadClaimStream = null;
    private long processingStartTime;

    // List of InputStreams that have been opened by calls to {@link #read(FlowFile)} and not yet closed
    private final Map<FlowFile, InputStream> openInputStreams = new ConcurrentHashMap<>();
    // List of OutputStreams that have been opened by calls to {@link #write(FlowFile)} and not yet closed
    private final Map<FlowFile, OutputStream> openOutputStreams = new ConcurrentHashMap<>();

    // maps a FlowFile to all Provenance Events that were generated for that FlowFile.
    // we do this so that if we generate a Fork event, for example, and then remove the event in the same
    // Session, we will not send that event to the Provenance Repository
    private final Map<FlowFile, List<ProvenanceEventRecord>> generatedProvenanceEvents = new HashMap<>();

    // when Forks are generated for a single parent, we add the Fork event to this map, with the Key being the parent
    // so that we are able to aggregate many into a single Fork Event.
    private final Map<FlowFile, ProvenanceEventBuilder> forkEventBuilders = new HashMap<>();

    private Checkpoint checkpoint = null;
    private final ContentClaimWriteCache claimCache;


    //
}

这里只显示了 session 里边的成员变量

从这两个变量，我们已经能大致猜测出。当我们在处理器中创建、删除、路由流文件FlowFile的时候，只是先在session中记录了一下，等commit 之后才会真正生效。

从第一张图第 28 行代码可以看到，处理器的 onTrigger 方法调用结束之后，会自动地调用session的commit方法。

以下逐个探究 session 中的这些方法

get()

rollback()

commit()

小结

session 是处理器处理结果的暂存，并通过rollBack , commit方法决定处理结果是否生效。

东南_bit

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
NIFI源码学习-（三）ProcessSession

在调用处理器的 onTrigger 方法的时候，传入了两个参数，ProcessContext context, final ProcessSession session进入到ProcessContext 的代码：注释写的比较清楚，为处理器和NIFI框架提供了一个桥梁，实际使用中，主要用来在处理器中获取页面上为处理器设置的各个属性。我们主要来关注 session 变量。从第一张图中可以看出，每次处理器执行，都会重新生成一个session，多线程时候每个线程都会生成一个。ProcessSes
复制链接

扫一扫

专栏目录