1 of 12

Napita

Napita is a data integration tool designed to automate data flow between systems in real-time. It provides an intuitive interface for designing, controlling, and monitoring data flows, making it ideal for simple data ingestion tasks and complex data transformation scenarios. HotWax Commerce OMS relies on Napita as a pivotal component for seamless communication with external systems like Netsuite ERP, enabling smooth data exchange and integration within the ecosystem.

Glossary

Discover a glossary of Napita terms.

DataFlow Manager (DFM)

In NiFi, a DFM has the authority to manage the flow of data. This includes tasks like adding, removing, and modifying various components within the data flow.

Canvas

In NiFi, the canvas refers to the graphical interface where DataFlow Managers (DFMs) design and visualize their dataflows. It's the workspace where components are added, connected, and configured to create data processing pipelines.

Component

Components in NiFi are the building blocks used to construct dataflows on the canvas. These include Processors, Ports, Connections, Process Groups, Remote Process Groups, Funnels, and others. Each component serves a specific function within the data flow and can be configured to tailor its behavior according to the data processing requirements.

FlowFile

A FlowFile in NiFi represents a piece of data. It consists of two main parts: FlowFile Attributes, which provide context or metadata about the data, and FlowFile Content, which is the actual data being processed.

Attributes

In NiFi, attributes provide metadata or contextual information about the data being processed. Each FlowFile in NiFi carries a set of attributes along with its content. These attributes are key-value pairs that describe various characteristics of the data. Common attributes include UUID (a unique identifier for the FlowFile), filename (a human-readable name for the data file), and path (a hierarchical value indicating the storage location). Attributes play a crucial role in routing, transformation, and decision-making within the data flow.

Processor

Processors are components responsible for performing actions on FlowFiles, such as listening for incoming data, transforming it, or routing it to different destinations.

Relationship

Each Processor in NiFi has Relationships associated with it, indicating the possible outcomes of processing a FlowFile. These relationships determine where the FlowFile should be routed next.

Connection

Connections in NiFi link components together, allowing the flow of data between them. Each connection has one or more Relationships, and it includes a FlowFile Queue to manage the data being transferred.

Controller Service

Controller Services provide reusable configurations or resources for other components in NiFi. For example, the StandardSSLContextService can be used to configure SSL settings across multiple processors.

Reporting Task

Reporting Tasks in NiFi generate background reports on various aspects of the data flow, providing insights into system performance and activity.

Parameter Provider

Parameter Providers supply external parameters to Parameter Contexts in NiFi, allowing for dynamic configuration of components.

Funnel

A Funnel component in NiFi merges data from multiple Connections into a single stream, simplifying the data.

Process Group

Process Groups allow for the organization and abstraction of components within the data flow. They enable DFMs to manage complex dataflows more effectively.

Port

Ports in NiFi provide connectivity between Process Groups and other components in the data flow, facilitating data exchange.

Remote Process Group

Remote Process Groups enable the transfer of data between different instances of NiFi, useful for distributed data processing scenarios.

Bulletin

Bulletins provide real-time monitoring and feedback on the status of components within NiFi, helping DFMs identify issues or concerns.

Template

Templates in NiFi allow DFMs to save and reuse portions of the data flow, streamlining the development process and promoting code reuse.

flow.xml.gz

The flow.xml.gz file stores the configuration of the dataflow in NiFi. It is automatically updated as changes are made and can be used for rollback purposes if needed.

Schedule Processors

Discover how to effectively manage Napita's Processors' operations using the Scheduling tab, influencing data flow within the platform.

The Scheduling tab within the Processor Configuration dialog in Napita offers crucial settings for managing how a Processor operates, impacting the flow of data within the platform. Let's break down its significance and provide step-by-step instructions on how users can utilize this feature effectively.

Step-by-Step Usage Instructions:

Accessing the Scheduling Tab:

Right-click on the Processor within Napita.
Select the Configure option from the context menu. Alternatively, double-click on the Processor.
Navigate to the Scheduling tab within the Configuration dialog.

Step-by-Step Usage Instructions:

Accessing the Scheduling Tab:

Right-click on the Processor within Napita.
Select the Configure option from the context menu. Alternatively, double-click on the Processor.
Navigate to the Scheduling tab within the Configuration dialog.

Selecting Scheduling Strategy:

Choose a scheduling strategy based on processing needs:
- Time Driven: Timer Driven scheduling, operates by scheduling the Processor to execute at regular intervals. This straightforward approach is suitable for tasks requiring periodic processing, such as batch data updates or routine maintenance activities. Users can configure the timing of execution using the Run Schedule option, defining the frequency at which the Processor operates based on predefined intervals.
- Event Driven: For scenarios demanding real-time responsiveness and dynamic processing, the Event Driven scheduling mode presents an experimental yet intriguing option. In this mode, the Processor is triggered to run by specific events, typically initiated when FlowFiles enter connections linked to the Processor. While offering potential benefits in terms of real-time data handling, users should exercise caution with this mode, as its experimental nature means it may not be supported by all Processors and could introduce unpredictability into production environments.
- CRON Driven: The CRON Driven scheduling mode provides the utmost flexibility, enabling users to define precise scheduling patterns using CRON expressions. This approach is particularly well-suited for complex scheduling requirements where specific timing and periodicity are essential. With CRON expressions, users can specify intricate schedules, encompassing various time intervals and patterns for Processor execution. However, it's important to note that the CRON Driven mode introduces increased configuration complexity compared to the other scheduling strategies, requiring users to understand the intricacies of CRON syntax. You can check cron expressions here.

Configuring Concurrent Tasks:

Determine the number of threads the Processor will use simultaneously with the Concurrent Tasks option.
Increasing this value can enhance data processing speed but may impact system resources.

Defining Run Schedule:

Specify how often the Processor should run:
- For Timer-driven strategy: define a time duration (e.g., 1 second, 5 minutes).
- For CRON-driven strategy: refer to CRON expression format for scheduling details.

Managing Execution:

Choose between All Nodes or Primary Node for Processor execution.
- All Nodes schedules the Processor on every node in the cluster, while Primary Node limits it to the primary node only.

Adjusting Run Duration:

Slide the Run Duration slider to balance between lower latency and higher throughput.
Prioritize lower latency for quicker processing or higher throughput for more efficient resource utilization.

Applying Changes:

After configuring settings, click Apply to implement changes or Cancel to discard them.

PreviousView and Manage ProcessorsNextFlow Definitions

Last updated 14 days ago

View and Manage Processors

Learn how managing processors in Napita streamlines dataflows. Organize them in groups for efficient monitoring and modification.

The ability to view and manage processors within the Napita platform is crucial for users engaged in managing complex dataflows and ensuring the smooth processing of data between various systems. Processors, as components in Napita, play a vital role in tasks like data ingestion, transformation, and routing. They act as workers, handling incoming data and performing actions on it based on defined configurations.

Organizing processors within process groups and parent process groups offers users a structured approach to managing their data workflows. This feature enhances efficiency by allowing users to easily locate, monitor, and modify processors as needed within Napita. By providing a clear overview of the processors involved in specific functions, users can quickly identify areas for optimization or troubleshooting.

Understanding the hierarchy of processors within process groups and parent process groups is essential for users to grasp the overall dataflow architecture within NiFi. It helps them comprehend how data moves through different stages of processing and where specific actions are performed. This visibility is invaluable for maintaining data integrity and ensuring the reliability of the overall system.

Locate processors

Access Napita Interface: Begin by accessing the Napita interface, where processors are managed. This is typically done through a web browser by entering the URL of your Napita instance.
Locate Parent Processor Group: Within Napita, navigate to the parent processor group associated with the instance you're managing. These groups are often named according to their function or the systems they interact with. For example, you may find a parent processor group named demo-oms for managing data flows related to the demo-oms instance.
Navigate Through Hierarchy: Double-click on the parent processor group to explore its contents. Inside, you'll find process groups organized based on specific functions or tasks, such as data ingestion, transformation, or routing.
Identify Relevant Process Group: Locate the process group that corresponds to the specific function or task you want to manage. For instance, if you're interested in monitoring flow for approved orders, look for the process group labeled Approved Orders Flow.
View Processors: Double-click on the identified process group to view all the processors contained within it. Processors are represented as individual components responsible for performing various tasks on data as it flows through the NiFi system.

Manage Processors

The feature to manage Process Groups within Napita is integral for users orchestrating complex data workflows. By offering a multitude of options through the context menu, users gain control over the configuration, monitoring, and optimization of their data pipelines.

Step-by-Step Usage Instructions:

Access Process Group Options:
- Right-click on the desired Process Group within Napita to open the context menu.
Configure:
- Choose this option to establish or modify the configuration of the Process Group, enabling customization according to specific business requirements.
Variables:
- Select this option to create or configure variables within Napita, providing flexibility in managing dynamic data processing scenarios.
Enter Group:
- Use this option to enter the Process Group and access its contents for configuration or monitoring purposes.
Start/Stop:
- Start or stop the Process Group based on operational requirements, ensuring efficient resource utilization and workflow execution.
Run Once
- Execute a selected Processor exactly once, based on configured execution settings. However, this only works with Timer-driven and CRON-driven scheduling strategies.
Enable/Disable:
- Enable or disable all processors within the Process Group to control data processing activities and optimize system performance.
View Status History:
- Open a graphical representation of the Process Group's statistical information over time, aiding in performance monitoring and troubleshooting.
View Connections:
- Navigate to upstream or downstream connections to visualize and analyze data flow within the Process Group, facilitating troubleshooting and optimization efforts.
Center in View:
- Center the view of the canvas on the selected Process Group for improved visibility and navigation within the interface.
Group:
- Create a new Process Group containing the selected Process Group and any other components selected on the canvas, facilitating organizational management of data workflows.
Download Flow Definition:
- Download the flow definition of the Process Group as a JSON file, enabling backup, restoration, and version control of configurations.
Create Template:
- Generate a template from the selected Process Group, allowing for reuse and standardization of data processing workflows.
Copy:
- Copy the selected Process Group to the clipboard for duplication or relocation within the canvas, providing flexibility in designing data workflows.
Empty All Queues:
- Remove all FlowFiles from all queues within the selected Process Group, facilitating maintenance and resource optimization.
Delete:
- Permanently delete the selected Process Group, enabling users to clean up outdated or unnecessary components from the system.

PreviousGlossaryNextSchedule Processors

Last updated 14 days ago

Flow Definitions

Learn how to efficiently manage and reuse data flow designs in Napita using Flow Definitions.

Flow Definitions are akin to Templates, referring to reusable data flow components and configurations for saving and reusing in various instances. They enable users to create reusable flow templates, which can be shared, imported, and customized across different Napita instances, fostering consistency, standardization, and reuse of data flow designs.

Users can customize imported Flow Definitions by adjusting parameter values, modifying connections, or adding components to meet specific requirements. The existing flows can be used both in the same instance or in altogether a new instance

Reusing Flow Definitions in Same Instance

Export Flow:

In Napita, you can download the flow definition by right-clicking on the processor desired for other instances and selecting 'Download Flow Definition'. A JSON file containing the flow definition will be downloaded.
Ensure downloading without external services, as their defined schemas for controller services may not suit all flow definitions. For instance, DB connection details are external services differing for clients. Hence, such details should not be saved in flow templates.

Import Flow Definition:

Once the flow definition is downloaded, import it into the processor for reuse in other instances.

Navigate to the parent process group where you want to create a new processor.
Drag Process Group from the menu to the canvas.
Click the Browse icon, upload the file, and provide appropriate naming as per the required process flow.

Create Parameter Context

When creating a processor using the flow definition within the same file, it is important to create a new parameter context for the new flow. If the parameter context remains the same, any changes in the parameter will be reflected in the source flow file. To avoid this, follow these steps:

Right-click on the Canvas and Select Configure to edit the processor configuration. Locate 'Process Group Parameter Context' in the General Tab:
Switch to the General tab. Here, you'll find the option labeled as Process Group Parameter Context.
Click on the dropdown menu next to "Process Group Parameter Context." Scroll down to the bottom of the list where you'll find the option to "Create New Parameter Context." Choose this option to create a new parameter context for the flow.
Add the Name of the Parameter Context. Choose a descriptive name that reflects the purpose or function of the flow to maintain clarity and organization.
Click on Apply to Save the Parameter Context:

This ensures that any modifications made to parameters within this flow will be isolated to its specific context, preventing unintended effects on other parts of the system.

Inherit Parameter Context

Parameter contexts manage dynamic values shared across processors or components within a data flow, containing details from the original flow definition. When transferring the flow definition between instances, replace the parent parameter context with the correct parent processor's parameter context for inheritance. Follow these steps:

Right-click on the processor's canvas.
Select parameter from the options.
Navigate to the Inheritance tab and remove the parameter contexts of the source processor.
Select the desired parameter context from the left.
Click Apply to save the inheritance of the parameter context.

Verify Parameter Context

Once the parameter context is inherited, you can verify through the following steps:

Right-click on the canvas and select Parameter from the options.
Navigate to the Parameters tab, where all the parameters of that processor are listed. Parameters with the edit icon belong to that processor, while parameters with an arrow icon belong to the parent processor.
Click on the arrow icon for any parameter. This action will lead you to the parameter context of the parent parameter.
Navigate to the settings page to verify that the correct parameter context is inherited.

Add Process Group Parameters

While most parameters are inherited from the parent processor groups, some parameters are specific to process groups. The following parameters need to be added to the processors:

Destination Path: This specifies the path for the flowfile where SFTP files will be placed. The destination path property needs to be further added in the remote path of the flow that puts the file in SFTP. Click on configure and add the remote path name.
Feed File Name with Prefix: Here, a meaningful file name with a prefix such as time needs to be added for easy identification.
Source SQL Query: This parameter contains the SQL query required for the processor to perform its action.
Date Time Format: Specifies the date time format for the files. It's crucial for accurate representation.
File Name Extension: Select whether the file is .csv or .json to ensure compatibility with other systems and accurate file reading.

To configure the file name, locate the processor named Update file name. Right-click and select the configure option. Go to properties and enter the query in the filename field.

Select Database Connection Pooling Service (DBCP service)

The Database Connection Pooling (DBCP) service enables efficient and reliable connections to relational databases. It essentially acts as a pool manager for database connections, allowing processors to reuse existing connections instead of creating new ones for each operation. DBCP services are not part of the parameters; therefore, Configure properties and select the DBCP service from the dropdown menu.

Set Record Writer

The Record Writers service facilitates writing data records to various data storage systems or destinations in a structured format. Since default controller services are removed when downloading flow definitions, configure the record writer property through the following steps:

Right-click on the processor and select Configure.
Navigate to the property Record writer.
Select the appropriate record writer from the dropdown menu.
Configure the record writer service by clicking the arrow against the service. Click on the settings icon to configure the service.
Navigate to the properties tab and update the service as per your requirements.

Services can only be updated once disabled. Be cautious as disabling the service affects all associated processors. Once the services are updated, right-click on the canvas and select enable all controller services to enable the services.

Verify DBCP and SFTP

Before executing any flow, check the DBCP and SFTP connections to ensure credentials are accurate.

Execute the Flow

Once all settings are set up and verified, run the processor to verify:

Right-click on the canvas of the processor group, then click on Start. All flows will start processing.
If you want to run each flow manually, right-click on the flow file.
Click on Run Once, and repeat the same for each flow file.

Verify Processor Properties

Learn how to configure and verify crucial properties like Database Connection Pooling (DBCP) and Secure File Transfer Protocol (SFTP) for efficient data handling in Napita.

Processors are components designed to execute tasks on data within a system's dataflows. They handle tasks like data ingestion, transformation, routing, and interaction. Properties within processors are settings dictating how a processor operates and handles data. These settings allow users to customize processor behavior, including parameters like database connections (DBCP), SFTP details, etc.

Users configure these properties through Napita during processor setup. Verifying processor properties during creation ensures that entered values are acceptable. While additional properties may need configuration based on specific requirements, database connection (DBCP) and SFTP properties are mandatory for processor execution. If a property's value is invalid, the processor cannot be executed or utilized until the value is verified.

Database Connection Pooling (DBCP)

Database Connection Pooling (DBCP) within Napita is crucial for efficient management and reuse of database connections. By implementing DBCP, users can enhance workflow efficiency, particularly when using processors like ExecuteSQLRecord and QueryDatabaseTableRecord.

This feature reduces the overhead of creating new database connections for each operation, optimizing resource utilization and improving performance. DBCP streamlines database operations by managing and sharing connections among different processors, reducing the time and resources needed for connection establishment.

Verifying the DBCP service at the parent level ensures consistency and validity of connection properties across different processors, minimizing errors or inconsistencies in the database configuration.

Step-by-Step Usage Instructions:

Access Processor Configuration: Right-click on the desired processor (e.g., ExecuteSQLRecord or QueryDatabaseTableRecord) and select Configure to open the Configure Processor window.
Select Database Pooling Service: Within the configuration window, specify the Database Pooling Service in settings related to database connections.
Choose Service from Dropdown: Select the appropriate Database Connection Pooling service from the dropdown menu to manage and reuse database connections efficiently.
Verify Properties: After selecting the Database Pooling Service, verify associated properties, by clicking the Verify Properties button to ensure the correctness of specified values, identifying potential issues or inconsistencies in the database configuration.

Secure File Transfer Protocol

The SFTP (Secure File Transfer Protocol) service in Napita facilitates secure file transfer between the platform and remote servers. By using SFTP, users can exchange files securely with external systems, ensuring data integrity and confidentiality. SFTP enables seamless and secure file transfer operations within the HotWax Commerce. Whether retrieving files from remote servers or uploading files securely, SFTP provides a reliable method for data exchange with external systems. This is relevant for integrating HotWax Commerce with other systems or performing data exchange operations with external partners. Verifying SFTP properties confirms that connection details are correctly configured, preventing data corruption or loss during file transfers.

Step-by-Step Usage Instructions:

Access Processor Configuration: Right-click on the processor associated with SFTP operations (e.g., GetSFTP or PutSFTP) and select Configure to open the Configure Processor window.
Enter SFTP Properties: Locate the fields for SFTP properties in the configuration window, including Hostname, Port, Username, Password, and Remote Path. Input relevant values for each property based on file transfer requirements.
Verify Properties: Once necessary SFTP credentials are entered, verify properties by clicking on the Verify Properties button to ensure correct and valid configuration.

Bulletins

The Bulletin feature in Napita provides users with real-time notifications about the status and events occurring within the data flow. This feature significantly enhances users' ability to track the health and performance of their data pipelines, enabling them to promptly address any issues or concerns.

Significance and Benefits:

Real-time Monitoring: The Bulletin feature offers users immediate visibility into events and issues happening within their data flow, allowing for proactive monitoring and management.
Enhanced Visibility: By displaying bulletins at both the component and system levels, users gain comprehensive insights into the status and health of their data flow, empowering them to make informed decisions.
Troubleshooting Assistance: Bulletins provide valuable context and information about warnings, errors, and other noteworthy events, facilitating efficient troubleshooting and problem resolution.
Customizable Alert Levels: Users can configure the bulletin level to suit their monitoring needs, ensuring they receive notifications for events of specific severity levels, such as warnings and errors.

Step-by-Step Usage Instructions:

Accessing Bulletin Settings:
- Navigate to the Processor Configuration dialog by selecting the desired processor.
- Click on the Settings tab within the Processor Configuration dialog.
Configuring Bulletin Level:
- Scroll down to locate the Bulletin level option.
- Choose the desired bulletin level (e.g., DEBUG, INFO, WARN, ERROR) based on your monitoring requirements.
- This setting determines the minimum severity level of bulletins that will be displayed in the User Interface.
Monitoring Bulletins:
- Observe the bulletin icons displayed on components in Napita
- Hover over the icon with your mouse to view a tooltip providing details such as the time, severity, message, and node (if clustered) associated with the bulletin.
Viewing System-Level Bulletins:
- Check the Status bar near the top of the page for system-level bulletins.
- Hover over the system-level bulletin icon to view relevant information.
Accessing the Bulletin Board Page:
- Open the Global Menu.
- Select the Bulletin Board Page to view and filter bulletins from all components.

By following these steps, users can effectively utilize the Bulletin feature within Napita to monitor their data flow and ensure smooth operation.

With the Bulletin feature, Napita users can maintain the reliability, performance, and efficiency of their data pipelines by staying informed about critical events and taking proactive measures to address them.

Data Provenance

Discover how Napita's Data Provenance feature enables users to monitor, troubleshoot, and optimize dataflows by tracking the journey of data objects in real-time.

View Data Provenance

Napita's Data Provenance feature is a critical tool for users involved in monitoring and troubleshooting dataflows. It provides detailed information about the journey of data objects (FlowFiles) as they move through the system, enabling users to track, analyze, and understand data transformations, routing decisions, and processing events in real-time. By offering insights into data lineage, event details, and attribute modifications, Data Provenance empowers users to ensure dataflow compliance, optimize performance, and swiftly identify and resolve issues.

Step-by-Step Usage Instructions:

Access Data Provenance Page:
- Right-click on the desired dataflow within Napita.
- Select the View Data Provenance option from the menu.
Explore Data Provenance Information:
- In the Data Provenance dialog window, review the most recent Data Provenance information available.
- Utilize search and filter options to locate specific items or events within the dataflow.
View Event Details:
- Click the View Details icon (i) for each event to open a dialog window with three tabs: Details, Attributes, and Content.
- Review event details on the Details tab, including event type, timestamp, component, and associated FlowFile UUIDs.
Analyze Attributes:
- Navigate to the Attributes tab to view the attributes present on the FlowFile at the time of the event.
- Optionally, select the Only show modified checkbox to display only the attributes that were modified as a result of the processing event.

Replay FlowFiles

Replaying FlowFiles in Napita empowers users to inspect, troubleshoot, and validate data processing within their workflows. Whether it's verifying the correctness of data transformations or testing configuration changes, the ability to replay FlowFiles provides users with a powerful tool for ensuring the reliability and efficiency of their dataflow.

Step-by-Step Usage Instructions:

Access FlowFile Details:

Right-click on the desired processor within Napita
Select the View Details option from the context menu.

Navigate to Content Tab:

In the View Details dialog window, navigate to the Content tab.

Replay FlowFile:

Review information about the FlowFile's content, such as its location and size.
Click the Submit button to replay the FlowFile at its current point in the flow.
Optionally, click the Download button to download a copy of the FlowFile's content.

Replay Last Event from Processor:

Right-click on the desired Processor within Napita
Select the Replay last event option from the context menu.
Choose whether to replay the last event from just the Primary Node or from all nodes.

Last updated 14 days ago

Troubleshooting

Data Export Errors

This document outlines the Standard Operating Procedure (SOP) for diagnosing and resolving issues where the data exported from HotWax Commerce does not match the required data specifications.

HotWax Commerce uses Napita to transform and export data. If the SQL query in NiFi (Napita) is incorrect, it can result in exporting data that does not meet the client's requirements. This SOP will guide you through the steps to identify and rectify such issues.

Steps to Diagnose and Resolve Data Export Discrepancies

Verify the Discrepancy

Access the Exported Data:
- Navigate to the location where the exported data is stored (e.g., SFTP location).
- Download and review the exported data file.
Compare with Required Data:
- Obtain the data requirements from the client.
- Compare the exported data against the required data specifications to identify discrepancies.

Initial Checks

Check the Last Sync:
- Verify the last sync time to ensure that the latest data has been exported.
Review Recent Changes:
- Check for any recent changes in the data requirements or the Napita setup.

Navigate to Napita Instance

Access NiFi:
- Log in to Napita Instance
Locate the Relevant Process Groups:
- Identify the parent process groups related to the data export.
- Drill down to the relevant root process groups where the data transformation occurs.

Diagnose the SQL Query

Stop the Processors:
- Right-click on the Napita canvas.
- Stop the processors to prevent further data export during troubleshooting.
Access Parameters:
- Select the parameters option to open a new module with all existing parameters of the group.
Search for the SQL Query:
- Look for the parameter named source.sql.query.
Review and Modify the SQL Query:
- Study the current SQL query to understand its logic.
- Modify the SQL query as per the client’s data requirements.

Validate the Changes

Run the Processors:
- Run the processors once to generate a new data export.
- Check the results in the SFTP location.
Verify the Data:
- Compare the newly exported data with the required data specifications.
- Ensure that the data now matches the client's requirements.

6. Schedule the Processors

Steps:

Resume Processors:
- If the data is accurate, schedule the processors to resume regular operation.
- Monitor the first few exports to ensure continued accuracy.

Queue Errors

In Napita a queue acts as a temporary storage buffer that facilitates the seamless transfer of data between processors within a data flow. As data moves from one processor to another, it is temporarily stored in these queues, allowing for efficient management of data flow and ensuring smooth processing. While queues offer several advantages, they may occasionally require maintenance to ensure the integrity and efficiency of the data flow. One common scenario that necessitates attention is when a processor fails due to a corrupted file present in the queue. For example, consider a data flow scenario where files are retrieved from an SFTP location from one processor and further processed by subsequent processors. If a file retrieved from the SFTP location have invalid file format, it can prevent the subsequent processor from executing successfully. In such instances, simply changing the invalid file on the SFTP location may not suffice, as the processor will still attempt to process the previous file, resulting in repeated failures. To address this issue effectively, it becomes essential to empty the queue containing the invalid file and replace it with a valid file from the data source. By doing so, the data flow can resume its operation with the latest and valid data, ensuring accurate processing and preventing further disruptions.

Resolution Steps:

Identify the Queue: Determine which queue in your data flow is holding the invalid file. This queue is typically located between the processor that retrieved the file and the subsequent processor that failed to execute.

Empty the Queue:

Right-click on the queue located before the failing processor.
Select the "Empty Queue" option to remove the corrupted file from the queue.

Re-run the Processor:

Right-click on the processor located prior to the emptied queue.
Choose the "Run Once" option to execute the processor again and list a new file from the source (e.g., SFTP location).
This action ensures that the latest file is listed in the queue for processing.

Verify Queue Contents:

To verify that the queue is now holding the correct file, right-click on the queue.
Select the "List Queues" option. This will display all files currently listed in the queue.
Click on the eye icon next to the file name to view and verify the data of the latest file.

Process the File:

After confirming that the correct file is in the queue, right-click on the subsequent processor that processes the file and click on Run Once button.
Ensure that the file is successfully processed by monitoring the processor's status and logs.

Schedule Processors:

Once the file is successfully processed, you can schedule the processors in your Napita flow as needed.
Verify that the data flow is functioning as expected by monitoring subsequent data processing steps.

By following these troubleshooting steps, you can effectively manage queues in Napita and ensure smooth data processing within your workflows, addressing issues such as Invalid files promptly and efficiently.

Fetch Put SFTP Retry

Purpose

This SOP outlines the steps required to configure and manage SFTP Retry for Fetch SFTP and Put SFTP processors in Apache NiFi, ensuring adherence to best practices. URL: https://napita.hotwax.io/nifi/

Configuration for the SFTP Processor

1. Access the SFTP Processor

Navigate to Apache NiFi > Processor Group > Fetch/Put SFTP Processor.

2. Configuration for Fetch SFTP Processor

Set the comms.failure relationship to Retry. Configure the following values:
- Number of Retry Attempts: 2
- Retry Back Off Policy: Penalize
- Retry Back Off Duration: 10 min (default)
- Penalty Duration: 30 sec (default)

Add a Funnel:

Add a funnel to the Fetch SFTP Processor.
Redirect the following relationships to the funnel:
- comms.failure
- permission.denied
- not.found
Name the connected relationship: SFTP Fetch Fail.

The relationship name must match exactly

3. Configuration for Put SFTP Processor

Set the [failure, reject] relationship to Retry. Configure the following values:
- Number of Retry Attempts: 2
- Retry Back Off Policy: Penalize
- Retry Back Off Duration: 10 min (default)
- Penalty Duration: 30 sec (default)

Add a Funnel:

Add a funnel to the Put SFTP Processor.
Redirect the following relationships to the funnel:
- failure
- reject
Name the connected relationship: SFTP Put Fail.

The relationship name must match exactly

---

Resolution Steps

When the SFTP Processor Starts Working:

Redirect Funnel Relationships:

Access the SFTP processor where the files are queued.
Redirect the funnel relationships (SFTP Fetch Fail or SFTP Put Fail) back to the original processor by connecting the funnel to the respective processor.
- This will create a loop to re-run the failures.
Process all the queued files.
Perform this action for both the Fetch and Put SFTP processors as applicable

Once the queue has been processed and cleared, remove the connection between the funnel and the original processor to prevent an infinite loop in case of future failures.

Validate:

Ensure the queued files are correctly processed after redirection.

Monitoring and Verification

1. Check Queued Files for Clients

Navigate to Summary:

Click on the hamburger icon in NiFi's main navigation bar.
Select Summary.

Open the Summary Window:

A new pop-up window titled "NiFi Summary" will appear.
Go to the Connections tab.

Locate Relationships:

Search for the relationships "SFTP Fetch Fail" or "SFTP Put Fail" in the list.
Select By Name.
Sort the Queue (Size) column in descending order by clicking the column header.

Redirect to Processor:

Click on the Arrow Icon corresponding to the desired relationship to directly navigate to the associated processor.

Review the queued files for the processor and follow the resolution steps mentioned above to ensure proper processing.

Key Notes

Consistency:

Ensure all relationship names and funnel configurations strictly adhere to the specified formats:
- SFTP Fetch Fail
- SFTP Put Fail

Regular Monitoring:

Check for queued files periodically to prevent bottlenecks in data flow.

Flow Definitions

Learn how to efficiently manage and reuse data flow designs in Napita using Flow Definitions.

Reusing Flow Definitions in Same Instance

Export Flow:

In Napita, you can download the flow definition by right-clicking on the processor desired for other instances and selecting 'Download Flow Definition'. A JSON file containing the flow definition will be downloaded.
Ensure downloading without external services, as their defined schemas for controller services may not suit all flow definitions. For instance, DB connection details are external services differing for clients. Hence, such details should not be saved in flow templates.

Import Flow Definition:

Once the flow definition is downloaded, import it into the processor for reuse in other instances.

Navigate to the parent process group where you want to create a new processor.
Drag Process Group from the menu to the canvas.
Click the Browse icon, upload the file, and provide appropriate naming as per the required process flow.

Create Parameter Context

Right-click on the Canvas and Select Configure to edit the processor configuration. Locate 'Process Group Parameter Context' in the General Tab:
Switch to the General tab. Here, you'll find the option labeled as Process Group Parameter Context.
Click on the dropdown menu next to "Process Group Parameter Context." Scroll down to the bottom of the list where you'll find the option to "Create New Parameter Context." Choose this option to create a new parameter context for the flow.
Add the Name of the Parameter Context. Choose a descriptive name that reflects the purpose or function of the flow to maintain clarity and organization.
Click on Apply to Save the Parameter Context:

This ensures that any modifications made to parameters within this flow will be isolated to its specific context, preventing unintended effects on other parts of the system.

Inherit Parameter Context

Right-click on the processor's canvas.
Select parameter from the options.
Navigate to the Inheritance tab and remove the parameter contexts of the source processor.
Select the desired parameter context from the left.
Click Apply to save the inheritance of the parameter context.

Verify Parameter Context

Once the parameter context is inherited, you can verify through the following steps:

Right-click on the canvas and select Parameter from the options.
Navigate to the Parameters tab, where all the parameters of that processor are listed. Parameters with the edit icon belong to that processor, while parameters with an arrow icon belong to the parent processor.
Click on the arrow icon for any parameter. This action will lead you to the parameter context of the parent parameter.
Navigate to the settings page to verify that the correct parameter context is inherited.

Add Process Group Parameters

While most parameters are inherited from the parent processor groups, some parameters are specific to process groups. The following parameters need to be added to the processors:

Destination Path: This specifies the path for the flowfile where SFTP files will be placed. The destination path property needs to be further added in the remote path of the flow that puts the file in SFTP. Click on configure and add the remote path name.
Feed File Name with Prefix: Here, a meaningful file name with a prefix such as time needs to be added for easy identification.
Source SQL Query: This parameter contains the SQL query required for the processor to perform its action.
Date Time Format: Specifies the date time format for the files. It's crucial for accurate representation.
File Name Extension: Select whether the file is .csv or .json to ensure compatibility with other systems and accurate file reading.

To configure the file name, locate the processor named Update file name. Right-click and select the configure option. Go to properties and enter the query in the filename field.

Select Database Connection Pooling Service (DBCP service)

Set Record Writer

Right-click on the processor and select Configure.
Navigate to the property Record writer.
Select the appropriate record writer from the dropdown menu.
Configure the record writer service by clicking the arrow against the service. Click on the settings icon to configure the service.
Navigate to the properties tab and update the service as per your requirements.

Verify DBCP and SFTP

Before executing any flow, check the DBCP and SFTP connections to ensure credentials are accurate.

Execute the Flow

Once all settings are set up and verified, run the processor to verify:

Right-click on the canvas of the processor group, then click on Start. All flows will start processing.
If you want to run each flow manually, right-click on the flow file.
Click on Run Once, and repeat the same for each flow file.