Você está na página 1de 15

What Is NiFi ?

● A data flow automation system maintained by Cloudera


● Written in Java
● Apache 2 License
● Cluster based and scaleable
● Has web based user interface
● Widely extendable
● Offers data flow monitoring
NiFi History

● Based on NiagaraFiles, developed by NSA


● Open sourced by NSA in 2014
● Commercialised by Onyara Inc
● Purchased by HortonWorks in 2015
● HortonWorks merged into Cloudera in 2018
● Cloudera plans full open source path
How does Nifi work ?

● NiFi runs in JVM on servers in cluster


● Uses ZooKeeper for configuration/coordination
– One node as a Cluster Coordinator
– One node as a primary
● JVM encapsulates
– Web server
– Processor / Extensions
– Repositories for
● FlowFile / Content / Data Provenance
Nifi Architecture 1
Nifi Architecture 2

● Web Server for monitoring and administration


● Flow controller manages extensions and resources
● FlowFile processor 1 .. N – actual data flow worker
– Each processor supports NiFi data flow
● Extensions allow remote system connectivity
– Can be user defined
● FlowFile Repo – tracks and maintains current flows
● Content Repo – maintains data in transit
● Provenance Repo – historic data flow information
Nifi Performance

● NiFi server RAM limited by JVM memory settings


● Garbage collection rate important
● Nifi.properties file for performance config i.e.
– nifi.ui.autorefresh.interval (browser performance)
– nifi.queue.swap.threshold (use of swap)
– nifi.provenance.repository.index.threads
● Change for high volume threads
– nifi.provenance.repository.implementation
● WriteAheadProvenance might cause Java
garbage collection issues
NiFi Flow Management

● Guaranteed data delivery


● Uses write ahead logs and content repositories
● Queue buffering / back pressure
● Queue priority configuration
● Flow configuration ( latency / throughput )
● UI based data flow builds
● UI based data flow monitoring
● UI based data provenance
NiFi Ease Of Use 1

● Visually create dataFlows in real time


● Changes take immediate effect
● Use flow templates for existing flow types
● Data provenance for
– Problem tracking
– Data compliance issues
– Step through historic data transforms
● Fine grained data investigation using UI & repositories
NiFi Ease Of Use – User Interface
NiFi Ease Of Use – Example Flow
NiFi Security

● DataFlow based encryption / decryption


● 2 way SSL
● User access control
● Pluggable / extendable authorization possible
● DataFlow level authorization supports
– Flow level component access
– Supports multi tenant access / sharing
– Even multi tenant support within a flow
NiFi Extensible / Scaleable

● Many NiFi points of extension


– Processors, Controller Services, Reporting Tasks
– Prioritizers, Customer User Interfaces
● NiFi S2S interface for distributed communication
● Extension conflicts avoided using NiFi Archives
● Scale out NiFi cluster instances
● Scale NiFi concurrent tasks up and down
NiFi Further information

● For further information see

– https://nifi.apache.org
– https://en.wikipedia.org/wiki/Apache_NiFi
– http://vision.cloudera.com/cloudera-dataflow/

I included the Cloudera link because CDF now uses NiFi for
edge data and flow management.
Available Books

● See “Big Data Made Easy”


– Apress Jan 2015

See “Mastering Apache Spark”
– Packt Oct 2015

See “Complete Guide to Open Source Big Data Stack
– “Apress Jan 2018”

● Find the author on Amazon


– www.amazon.com/Michael-Frampton/e/B00NIQDOOM/


Connect on LinkedIn
– nz.linkedin.com/pub/mike-frampton/20/630/385
Contact Us

● Feel free to contact at


– info@semtech-solutions.co.nz

● Or connect on LinkedIn

● Im always interested in
– New technology
– Opportunities
– Technology based issues
– Big data integration

Você também pode gostar