by jrcplus on 8/18/21, 12:39 AM with 3 comments
We're talking on the order of dozens/hundreds of devices posting data every min or so, not petabytes.
Should I be following a data lake architecture and doing MQTT → S3 / Redshift → ??? → PostgreSQL → Grafana? That seems like a lot of work / maintenance cost / overkill (?).
Or just write a MQTT → PostgreSQL bridge? Why isn't there an off-the-shelf solution, or am I missing something? I feel like this must be a common scenario?
Or should I be looking more closely at enterprise solutions e.g. Snowflake?
Known constraints: • Grafana supports PostgreSQL as a data source (but not Amazon Redshift). • Kinesis Firehose supports S3 and Redshift as destinations (but not RDS / PostgreSQL). • Telegraf doesn't (officially) support PostgreSQL as an output.
Bottom line: I am not a data engineer and want to avoid being one :) Maybe someday we'll be able to afford one, but in the meantime, I want to set things up so it's at least pointing in the right direction, without significant time/money cost.
by bsmth on 8/18/21, 12:16 PM
Instead of writing an MQTT -> PostgreSQL bridge, you could use Telegraf to listen to MQTT topics and write data to QuestDB over InfluxDB line protocol when it meets certain criteria. One of our users shared the tooling they use in industrial IoT and a tutorial here for exactly this use case: https://questdb.io/tutorial/2020/08/25/questitto/#stack
by Andys on 8/18/21, 3:54 AM
Ingest speed is tunable via number of shards and translog sync frequency, so no need for message queue if things get busy.
by mattbillenstein on 8/18/21, 12:58 AM