Apache NiFi has a processor, flow controller, and web server that execute on the JVM machine. Additionally, it also includes three repositories, as shown in the figure, which are the flow-file repository, the content repository, and the provenance repository. NiFi runs within a JVM (Java Virtual Machine) on a host operating system, and every piece of data or metadata is stored in repositories. The well-organized architecture of NiFi is as follows:


Flowfile Repository

This repository stores the current state and attributes of every flowfile that goes through the data flows of Apache NiFi. The default location of this repository is in the root directory of Apache NiFi. The location of this repository can be changed by changing the property named "nifi.flowfile.repository.directory".

Content Repository

This repository contains all the content present in all the flowfiles of NiFi. Its default directory is also in the root directory of NiFi, and it can be changed using the "org.apache.nifi.controller.repository.FileSystemRepository" property. This directory uses a lot of disk space, so it is advisable to have enough space on the installation disk.

Provenance repository

The provenance repository is the repository that stores all the provenance event data. Event data is indexed and searchable within each location. It allows the user to check information about FlowFile, which means it tracks and stores all the events of all flow files that flow in the Apache NiFi. It also enables the troubleshooting if any issue occurs while processing FlowFile

The provenance repository has been divided into two types:

  1. Volatile provenance repository - All provenance data is lost after restart in this repository.
  2. Persistence Provenance Repository: The default directory of persistence provenance is in the root directory of Apache NiFi. It can be changed using the "apache.nifi.provenance.PersistanceProvenanceRepository" property.


Related Articles and Resources

Apache Spark Tutorial

This Apache Spark tutorial explains what Apache Spark is, including the installation process and writing Spark applications with examples:We believe that learning the basics and …

Apache Spark Features

Developed in the AMPLab of the University of California, Berkeley, Apache Spark was developed for high speed, ease of use, and more in-depth analysis. Though …

Apache Nifi Introduction

Apache NiFi is a free and open-source data integration tool that enables users to automate the flow of data between disparate systems. It was created …

Apache Nifi Installation

Prerequisites:Make sure your computer has the following components installed before installing Apache Nifi:Java 8 or later must be installed and added to the PATH environment …

Apache Nifi Getting Started

Go to the "bin" folder inside the extracted folder, i.e., apache-nifi/bin. Click on the "run-nifi" batch file and run it to start NiFi.The run-nifi.bat file …

Apache Nifi Ui Components

Apache is a web-based platform that can be accessed by a user using a web interface. The NiFi UI is very interactive and provides a …

Machine Learning Tutorial

What is Machine LearningMachine learning is a subset of artificial intelligence (AI) that entails developing algorithms that allow computers to learn from and improve on …

Machine Learning Steps

Machine learning's ultimate goal is to create algorithms that automatically assist a system in gathering data and using that data to learn more. Systems are …

Applications Of Machine Learning

Machine learning has a wide range of applications across various industries. Some of the popular applications of machine learning include:Image and speech recognition: Machine learning …

Data Analytics And Machine Learning: Key Differences

Data Analytics and Machine Learning are two mighty forces that rule supreme in the quick-moving world of data science. Like enigmatic twins, they have similarities …

Trusted by digital leaders and practitioners from 100+ International Organizations