by boredgamer2 on 5/11/20, 5:13 PM with 135 comments
by cs02rm0 on 5/11/20, 6:52 PM
* I've seen customers fall into the trap of thinking they don't need expensive developers because you can drag and drop, just people who can use a mouse can crack on with NiFi.
* It persisted its config to an XML file, including the positions of boxes on the UI. Trying to keep this config in source control with multiple devs working on it was impossible.
* Some people take the view that you should use 'native' NiFi processors and not custom code. This results in huge graphs of processors with 1000s of boxes with lines between you have to follow. Made both better and worse by being able to descend and ascend levels in the graph. The complexity that way quickly becomes insane.
* You're essentially programming with it. I've no doubt you could use it to write, say, an XMPP server if so inclined. Which means you can do a great many things of huge complexity. Programming tools have developed models for inheritance and composition, abstraction, static analysis, etc. which NiFi just didn't have. The amount of repeated logic I've seen it's configuration accumulate is beyond anything I've seen from any novice programmer.
I ended up feeling like it could be an OK choice in a very small number of places, but I never got to work on one of those. The NSA linking together multiple systems with a light touch is possibly one such use case. For most everyone else, I couldn't recommend it.
by _57jb on 5/11/20, 6:13 PM
It installs like an appliance and feels like you are grappling with a legacy tool weighed down by a classic view on architecture and maintenance.
We had built a data pipeline and it was for very high-scale data. The theory of it was very much like a TIBCO type approach around data-pipelines.
Sadly the reality was also like a TIBCO type approach around data-pipelines.
One persons experience and opinion and I am super jaded by it due to some vendor cramming it down one of our directors throats who subsequently crammed it down ours when we warned how it would turn out. It ended up being a very leaky and obtuse abstraction that didn't belong in our data-pipeline when you planned how it was maintained longer-term.
I ultimately left that company. It had to do with as much of their leadership and tooling dictation as anything else, NiFi was one of many pains. I am sure there are places that are using NiFi who will never outgrow the tool so take it with a grain of salt.
Said company ultimately struggled for the very reasons those of us who left were predicting (the tooling pipeline was a mess and was thrashing on trying to get it right, constantly breaking by forcing this solution, along with others, into the flow. Lots of finger-pointing).
Sucks to have that: "I told you so..." moment when you never wanted that outcome for them....I just couldn't be a part of their spiral anymore.
by gopalv on 5/11/20, 6:43 PM
I like to think of it like Scribe from FB, but with an extremely dynamic configuration protocol.
The places where it really shines is where you can't get away with those 3 and the problem is actually something that needs a system which can back-pressure and modify flows all the way to the source - it is a spiderweb data collection tool.
So someone trying to Complex Event Processing workflows or time-range join operations with it, will probably succeed at the small scale, but start pulling their hair out at the 5-10GB/s rate.
So its real utility is in that it deploys outside your DC, not inside it.
This is the Site-to-Site functionality and MiniFI is the smallest chunk of it, which can be shrunk into a simple C++ something you can deploy it in every physical location (say warehouse or grocery store).
The actually useful part of that is the SDLC cycle for NiFi, which lets you push updates to a flow. So you might want to start with a low granularity parsing of your payment logs on the remote side as you start, but you can switch your attention over it to & remove sampling on the fly if you want.
If you're an airline flying over the arctic, you might have an airline rated MiniFI box on board which is sending low traffic until a central controller pushes a "give me more info on fuel rates".
Or a cold chain warehouse which is monitoring temperature on average, until you notice spikes and ask for granular data to compare to power fluctuations.
It is a data extraction & collection tool, not a processing and reporting tool (though it can do that, it is still a tool for bringing data after extraction/sampling, not enrichment).
by monstrado on 5/11/20, 6:21 PM
A good way to get started with NiFi is to use it as a highly available quartz-cron scheduler. For example, running "some process" every 5 seconds.
Disclaimer: I'm an Apache NiFi committer.
An article you might find interesting about it's ability to scale.
https://blog.cloudera.com/benchmarking-nifi-performance-and-...
Disclaimer v2: I used to work at Cloudera
by taftster on 5/11/20, 9:38 PM
NiFi gives insight to your enterprise data streams in a way that allows "active" dataflow management. If a system is down, NiFi allows dataflow operations to make changes and deal with problems directly, right at tier 1 support.
It's often the case that an enterprise software developer has an ongoing role of ensuring the healthy state of the applications from their team. They don't just develop, they are frequently on call and must ensure that data is flowing properly. NiFi helps decouple those roles, so that the operations of dataflow can be actively managed by a dedicated support team that is more tightly integrated with the "mission" of their dataflow.
NiFi additionally offers some features that most programmers skip to help with the resiliency of the application. For example:
- the concept of "back pressure" is baked into NiFi. This helps ensure that downstreams systems don't get overrun by data, allowing NiFi to send upstream signals to slow or buffer the stream.
- data provenance, the ability to see where every piece of data in the system originated and was delivered (the pedigree of the data). Includes the ability to "replay" data as needed.
- dynamic routing, allowing a dataflow operator to actively manage a stream, splicing it, or stopping delivery to one source and delivering to another. Sources and Sinks can be temporarily stopped and queued data placed into another route. Representational forms can be changed (csv -> xml -> json, avro), and even schemas can be changed based on stream.
Anyone can write a shell script that uses curl to connect with a data source, piping to grep/sed/awk and sending to a database. NiFi is more about visualizing that dataflow, seeing it in real-time, and making adjustments to it as needed. It also helps answer the "what happens when things go wrong" question, the ability to back-off if under contention, or replay in case of failure.
(disclaimer: affiliated with NiFi)
by banjoriver on 5/11/20, 7:52 PM
Out of the box it is incredibly powerful and easy to use; in particular it's data provenance, monitoring, queueing, and back pressure capabilities are hard to match; custom solution would take extensive dev to even come close to the features.
It is not code, and that means it is resistant to code based tooling. For years it's critical weakness was related to migrating flows between environments, but this has been mostly resolved. If you are in a place with dev teams and separate ops teams, and lots of process required to make prod changes, then this was problematic.
However, the GUI flow programming is insanely powerful and is ideal when you need to do rapid prototyping, or quickly adapt existing pipelines; this same power and flexibility means that you can shoot yourself in the foot. As others have said, this is not a tool for non technical people; you need to understand systems, resource management, and the principles of scaling high volume distributed workloads.
This flow based visual approach makes understanding what is happening easier for someone coming later. I've seen a solution that required a dozen containers of redis, two multiple programming languages, zookeeper, a custom gui, and and mediocre operational visibility, be migrated to a simple nifi flow that was 10 connected squares in a row. The complexity of the custom solution, even though it was very stable and had nice code quality, meant that that solution became a legacy debt quickly after it was deployed. Now that same data flow is much easier to understand, and has great operational monitoring.
Some suggestions: - limit NiFi's scope to data routing and movement, and avoid data transformations or ETL in the flow. This ensures you can scale to your network limits, and aren't cpu/memory bound by the transformation of content. - constrain the scope of each instance of nifi, and not deploy 100s of flows onto a single cluster. - you can do alot with a single node, only go to a cluster for HA and when you know you need the scale.
by unixhero on 5/12/20, 7:53 AM
I know a massive installation [0] which is about to be open sourced, where Apache NIFI is used in the middle of the stack as a key component. No dismissal of the capabilities this package offers intended.
[0] https://sikkerhetsfestivalen.no/bidrag2019/138
slides [slide #32]: https://static1.squarespace.com/static/5c2f61585b409bfa28a47...
by pacofvf on 5/11/20, 6:14 PM
by yawz on 5/11/20, 6:34 PM
by corndoge on 5/11/20, 6:03 PM
by rfsliva on 5/11/20, 7:55 PM
by endlessmike89 on 5/11/20, 6:39 PM
by haddr on 5/11/20, 6:44 PM
by sixhobbits on 5/11/20, 10:10 PM
> An easy to use, powerful, and reliable system.
This is the title. That's the most important sentence, and it's absolutely meaningless.
It's bad enough that everything has to "sell" - just describe to me what your product does and I'll decide if I need or not. Don't try to convince me.
If you have to sell, do it by differentiating yourself from your competitors. No one is calling themselves "Difficult to use, weak, and unreliable", so saying the opposite is not differentiation.
When did we accept that marketing-speak was default communication. Can't we have some landing pages that are essays? Or even a few paragraphs instead of trying-to-be-catchy bullet point phrases in large font?
by pazo on 5/11/20, 8:14 PM
by josephmosby on 5/11/20, 10:16 PM
* It doesn't need much in the way of dependencies to run. If you can get Java onto a machine, you can probably get NiFi to run on that machine. - That is HUGE if you are operating in an environment where getting any new dependencies installed on a machine is an operational nightmare.
* It doesn't require a lot of overhead. Specifically, no database.
* You can write components for it that don't require a whole lot of tweaking for small changes to the incoming data. So, if I have a machine processing a JSON file that looks like XXYX and another machine processing a nearly identical JSON file that looks like XYXX, the tweaks can be made pretty easily.
So, if you're looking for a lightweight, low overhead, easily configurable tool that may be running in an environment where you've got to run lots of little instances that are mostly similar but not quite, NiFi is great.
If you are running a centralized data pipeline where you have a dedicated team of data engineers to keep the data flowing, there are better options out there.
by tspann on 5/11/20, 7:27 PM
by Sodman on 5/12/20, 1:15 AM
They have a built-in source control product called "NiFi Registry", which can even be backed by git. The workflow for promoting flows between environments feels clunky though, especially as so much environment-specific configuration is required once your number of components gets high enough.
Moving our Java, Ruby or Go code between environments or handling versioning and releases was a piece of cake, in comparison.
by tomrod on 5/11/20, 6:32 PM
If so, how does it compare to SSIS, dbt, and other projects (please name!)?
Otherwise, what is an analogous toolset?
by benjaminwootton on 5/11/20, 6:23 PM
Think, if order value > 100 and the customer has ordered 3 times in the last hour and the product will be in stock tomorrow.
Kafka streams, Flink and Dataflow are super powerful and I think there is room for a GUI tool.
Would be great to hear experiences of NiFi in this domain or discuss the space with any experienced users. Will add contact details in my profile.
by kentosi on 5/12/20, 2:06 AM
I watched one of the explanation videos and it brought back memories.
My dislike of the phase back then, which I hope they've addressed now, is that while everything looked find and dandy while designing things on a UI, when something broke it was a whole heaps of generated XML no one could read.
by jszymborski on 5/11/20, 6:24 PM
I have a problem where I want to stream data to an ML layer and then stream that to a web app (e.g. Laravel or Django).
Reading the docs here, this seems like this would solve this problem, but was wondering if people had alternatives given that people seem to think poorly of this application.
by aasasd on 5/11/20, 8:52 PM
by ibishvintilli on 5/12/20, 11:04 AM
by dikei on 5/12/20, 1:22 AM
However, it does not handle small records well, and deploying custom processors is a pain, so don't use it to replace your stream processing framework.
by gatorbait83 on 5/11/20, 6:13 PM
by takeda on 5/11/20, 8:42 PM
by onetrickwolf on 5/11/20, 7:57 PM
by throwawaysea on 5/11/20, 6:58 PM
by fmakunbound on 5/12/20, 4:40 AM
by yalogin on 5/11/20, 11:51 PM
by dmtroyer on 5/11/20, 6:45 PM
by hestefisk on 5/11/20, 10:33 PM
by meh206 on 5/12/20, 7:52 AM
by iofiiiiiiiii on 5/12/20, 7:45 PM
But what is it?
by century19 on 5/11/20, 6:02 PM
by J0_k3r on 5/12/20, 3:04 AM
by mberning on 5/11/20, 6:08 PM