by gary4gar on 10/19/14, 3:43 PM with 51 comments
for this write-heavy app, we store all orders in redis which is working beautifully. We are growing at incredible pace, we keep adding new restaurants with hundreds of locations that keep sending us crazy amount of data. Except there is one problem -- redis keeps running out of memory every month! As, everything which doesn't have to be in memory is in memory.
This is why we contemplating to switch to mysql. As we really don't need to keep all data in memory. here are we numbers of current redis database:
used_memory_human:39.83G
dbsize: 34706870
Here is what we store in redis as Hash: id - integer
location_id - integer
stored_at - timestamp
token - string
transaction_no - integer
menu_items - string(comma seprated list of all menu items that customer ordered along with their price & Qty)
order_amount - decimal
order_subtotal_amount - decimal
order_amount_payable - decimal
order_datetime - timestamp
employee_id - integer
employee_name - string
pos_type - string
post_version - string
restaurant_id - integer
So, looking for some advice on:1. moving from redis to mysql is good idea? how will it effect us in long run as we will need to keep updating our indexes & partition scheme to cater to huge demand.
2. What other databases(relational or non-relational) would be suited for this use case than redis?
3. Or we are all wrong, as redis is made for storing this type of data. so, we just keep using redis & upgrading our machines every month?
by justinsb on 10/19/14, 4:23 PM
First of all, 3 million rows a month is not that much from a database point of view - if evenly distributed (though I'm sure it's not), it would be about 1 per second. (It is impressive to me from a business point of view though). You don't need Redis's throughput, and you are hitting Redis's limits with respect to memory.
1) This is _exactly_ the use case which relational databases were built and optimized for (fixed schema, OLTP, presumably some analytic queries). You might want to normalize the data a little (e.g. the menu_items, employee_name, pos_type & pos_version), but you don't have to - though this would enable faster & richer querying.
On your schema changes: I bet you'll be much happier using a relational database which takes care of this stuff for you, then you will be implementing it yourself in Redis. You probably won't need partitioning either. If you do end up needing partitioning, I would guess your DB is "trivial" to partition by restaurant.
I suspect you'll also find that a relational database stores the data more efficiently, particularly if you normalize a few things. Your data will probably fit into RAM again, but a relational database can cope even when it doesn't.
2) Any relational database would be a great match for this, and would allow rich querying (e.g. by employee, by date, by restaurant). My personal bias is to prefer Postgres.
3) I don't see what Redis is getting you here. It doesn't support rich querying, is memory bound, and you don't need the throughput it promises.
by thinkingkong on 10/19/14, 5:56 PM
1. As with most things it depends, but I'll make the mistake of assuming you don't need all of this data in memory and you won't be iterating over the entire set of data on a regular (<1m) basis. In that case, it most likely makes sense to move to a relational database. You should ask yourself what kind of queries you're running on the data and pick something appropriate. But for the data you're listing, then yeah - go with a relational DB.
2. Again "It depends". Postgresql is the most popular alternative to MySQL and I'd say at this point is more in vogue with the developer community. It's supported on AWS RDS and Heroku has a great postgres.app for OSX. Your data is so simple that I'd encourage you to start there. In the future when you have specific needs then you can research more appropriate database technologies.
3. From what you've shown us then you should most likely move to a relational database. The size should shrink dramatically, too.
by hkarthik on 10/19/14, 4:43 PM
by nielskrijger on 10/19/14, 8:23 PM
"Write-heavy" means your application processes thousands of records per minute. Three million orders on a monthly basis doesn't look like too big of a deal. Any database should be able to manage that easily. Redis would have been my last pick to do that. Redis I only use when to keep high-volatile non-essential data in memory, not much else.
Key in picking your database is how your data is being used. Are the most recent data records accessed most often and the rest almost never? MongoDB and an RDBMS will do quite nicely. Do you really expect _extremely_ heavy growth, a NoSQL datastore might be better. Are DBA's in short supply? Use a managed datastore.
As a big fan of NoSQL, I'd say; be cautious when using NoSQL datastores. Any SQL database will do the majority of workloads quite nicely and offer you with plenty of tools to do any type of query you might need to. NoSQL databases do analytical queries usually quite poorly; separating OLTP and OLAP is painful and costly for smaller apps, only for that reason it is best to avoid them in most circumstances.
Above all; use what you can run cheapest. The available skillset should play a role too, if most engineers are familiar with SQL Server, use SQL Server... I generally use MySQL or PostgreSQL on RDS (AWS) given DBA's are hard to find. I use DynamoDB when I have extremely high dataloads. I use MongoDB when developing a common app for NodeJS because NodeJS simply works very well with MongoDB… I consider other NoSQL datastores only when processing many millions of records per day. For all the rest (and majority of use cases); pick an RDBMS.
// In production I've used Oracle DB, MySQL, PostgreSQL, MongoDB, DynamoDB, Cassandra, Google Cloud Datastore, Redis, RedShift and Elasticsearch.
by sganguly on 10/20/14, 5:34 PM
Look for a NoSQL solution which will help in the flexi-schema model. Helps you add different types of data models relatively easily. Check out Aerospike (http://www.aerospike.com), the product scales like scaling problems have disappeared from this planet. All of the characteristics that you would need based on what you have stated is in the product. Best of all it is free for startups. Just give it shot. Open Source server and client libraries in multiple programming languages. (Am a big "C" and Python guys so like them client libs a lot). Lots of startups and many large software companies use them.
If you like it and works for your application tell others, if not tell us what did not work.
I ain't sellin and you ain't buyin;
Ciao
by kruk on 10/19/14, 6:51 PM
3) Redis is a great tool in its own right but it's more of a key-value data store than a full fledged database. It's fantastic for caching, tasks queuing or cross application communication.
by ColinCera on 10/20/14, 4:22 AM
You could continue to use Redis as a write cache, although a message queue or something like Kafka might be better for that purpose, and of course as a read cache, while using Cloudant or some other "real" database for permanent storage.
(On a side note, you described your application as "write-heavy" but you also said you get "about 3 Million individual customer orders on monthly basis" — which works out to just slightly more than one record to insert per second, on average, which doesn't seem "write-heavy" to me. I understand that you get orders submitted in batches, rather than evenly distributed, but it still doesn't seem especially write heavy. At any rate, a database like Cloudant can take as many writes as you want to throw at it.)
(Disclaimer: I don't work for Cloudant; I am a satisfied customer; I don't use Cloudant for everything.)
by rmetzler on 10/19/14, 6:46 PM
Your use case, running a restaurant is certainly a mixed use case. You'll certainly want to query the latest orders if this is going from the waiter to the kitchen. But yesterday's orders aren't that much important.
As others have mentioned, this data is exactly the kind of data you would store in a relational database. Just be sure to not query for the last 10 orders or something like that. I'm not a real database expert, but I've made this mistake before and as far as I know ordering by date usually means a full table scan.
RDBMS shine when you just store your data and run interesting queries on that data whenever you want to answer a question about that data. Redis is the other way around. If you want to query your data, you'll have to think about your question ahead of time and store it in an appropriate data structure.
What I was wondering is, do you shard your data in different databases? I would guess, every customer should have their on
by bendmorris on 10/19/14, 5:19 PM
Redis does have its own (official but not at a stable release yet) cluster implementation. (http://redis.io/topics/cluster-spec) I'm not sure how production ready it is - has anyone here tried it?
by itamarhaber on 10/20/14, 9:42 PM
Alternatively, if you want to continue using Redis - with the current or a slimmer version of the data model - without having to worry about machine upgrades or clusters, then I recommend that you look at http://redislabs.com/redis-cloud (disclaimer - I work at Redis Labs). Our managed service is built using production-proven Redis clusters that allow your database to scale well beyond the limits of any single node, without you having do anything or change a line of code.
by ScottBurson on 10/19/14, 6:19 PM
I can't comment on the non-relational options as I've never used them, but for your data volume I would expect Postgres to be adequate.
by auganov on 10/20/14, 2:34 AM
by revelation on 10/19/14, 6:11 PM
by jrochkind1 on 10/19/14, 6:41 PM
If you decide it makes sense to switch to an rdbms, you should probably consider postgres rather than MySQL.
by rubiquity on 10/19/14, 4:34 PM
by ayushghosh on 10/19/14, 7:04 PM
I think redis should have current data only which may be needed to process or crunch daily sales or so etc.
by sirdavidhuang on 10/19/14, 5:58 PM
Apache: The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.
http://planetcassandra.org/getting-started-with-apache-cassa...
by mud_dauber on 10/19/14, 7:30 PM
by diminish on 10/19/14, 5:56 PM
>> menu_items - string(comma seprated list of all menu items that customer ordered along with their price & Qty)
you may prefer to rework the menu_items as an independent table depending on your uses cases and if you don't have strong reason against.
by resca79 on 10/19/14, 6:55 PM
by kxo on 10/19/14, 6:19 PM
by Kiro on 10/19/14, 7:17 PM
by imanaccount247 on 10/19/14, 6:46 PM
Yes. It is a very poor choice for your needs.
>to mysql
No. It is a very poor choice in general.
>What other databases
Postgresql obviously. I'm honestly shocked that people are still considering mysql in 2014.
by davismwfl on 10/19/14, 11:53 PM
What I would consider though is keeping your Redis up front for accepting/processing the records in real time coming in and then batching them into your RDBMS. This provides you the ability to do reporting and other analytics through the RDBMS, but utilize Redis as a buffer for the writes. It also gives you the ability to massage the data into the specific schema that best fits the usage of the data in the RDBMS, but allowing the most efficient schema coming into Redis from the POS systems. It adds complexity though so that would have to be weighed.