by harrisreynolds on 1/6/22, 7:37 PM with 10 comments
What is the best solution for storing this data that is fast and supports very large datasets?
For context the product competes in a geo-spatial market and loads GPS data from a large number of vehicles that are updating every 5-10 seconds.
We are considering Apache Pino but I am curious what the HN community would recommend here.
Thank you for any input!!
by stocktech on 1/6/22, 7:58 PM
by zX41ZdbW on 1/6/22, 8:42 PM
> the product competes in a geo-spatial market and loads GPS data from a large number of vehicles that are updating every 5-10 seconds
There are multiple companies from this field that are using ClickHouse: https://clickhouse.com/docs/en/introduction/adopters/
by ammar_x on 1/6/22, 10:22 PM
It's easy to use too and its version of SQL is quite powerful.
On AWS, there is Athena which works on data stored in S3 and has the same processing price as BigQuery ($5/TB.) However, from my experience, I recommend BigQuery.
by samspenc on 1/6/22, 9:31 PM
They allow you to store huge amounts of data, and as long as you design the primary key properly, allow you to make really fast queries to find the needle in the haystack (milliseconds) as well.
There are some tradeoffs of course: most engineers I've worked with who come from RDBMS to these tools find the lack of first-class support for secondary indices and SQL or SQL-like queries to be a bummer.
by karmakaze on 1/7/22, 12:55 AM
by evv555 on 1/6/22, 8:47 PM
by prirun on 1/6/22, 10:33 PM
by ubadair on 1/6/22, 10:51 PM
Not affiliated, but I know people who work there.
by nojito on 1/6/22, 8:11 PM