from Hacker News

Ask HN: Going from Redis to MySQL. Good idea?

by gary4gar on 10/19/14, 3:43 PM with 51 comments

We are building a SaaS backend for restaurants using Rails. We integrate directly with POS, so each POS keeps sending customer orders that we store for later processing. we have this POS integration going at about 1,000 locations which send us about 3 Million individual customer orders on monthly basis.

for this write-heavy app, we store all orders in redis which is working beautifully. We are growing at incredible pace, we keep adding new restaurants with hundreds of locations that keep sending us crazy amount of data. Except there is one problem -- redis keeps running out of memory every month! As, everything which doesn't have to be in memory is in memory.

This is why we contemplating to switch to mysql. As we really don't need to keep all data in memory. here are we numbers of current redis database:

      used_memory_human:39.83G 
      dbsize: 34706870

Here is what we store in redis as Hash:

      id - integer
      location_id - integer
      stored_at  - timestamp
      token - string
      transaction_no - integer
      menu_items - string(comma seprated list of all menu items that customer ordered along with their price & Qty)
      order_amount - decimal
      order_subtotal_amount - decimal
      order_amount_payable - decimal
      order_datetime - timestamp
      employee_id - integer
      employee_name - string
      pos_type - string
      post_version - string
      restaurant_id - integer

So, looking for some advice on:

1. moving from redis to mysql is good idea? how will it effect us in long run as we will need to keep updating our indexes & partition scheme to cater to huge demand.

2. What other databases(relational or non-relational) would be suited for this use case than redis?

3. Or we are all wrong, as redis is made for storing this type of data. so, we just keep using redis & upgrading our machines every month?

by justinsb on 10/19/14, 4:23 PM
Yes, I think it's a great idea.
First of all, 3 million rows a month is not that much from a database point of view - if evenly distributed (though I'm sure it's not), it would be about 1 per second. (It is impressive to me from a business point of view though). You don't need Redis's throughput, and you are hitting Redis's limits with respect to memory.
1) This is _exactly_ the use case which relational databases were built and optimized for (fixed schema, OLTP, presumably some analytic queries). You might want to normalize the data a little (e.g. the menu_items, employee_name, pos_type & pos_version), but you don't have to - though this would enable faster & richer querying.
On your schema changes: I bet you'll be much happier using a relational database which takes care of this stuff for you, then you will be implementing it yourself in Redis. You probably won't need partitioning either. If you do end up needing partitioning, I would guess your DB is "trivial" to partition by restaurant.
I suspect you'll also find that a relational database stores the data more efficiently, particularly if you normalize a few things. Your data will probably fit into RAM again, but a relational database can cope even when it doesn't.
2) Any relational database would be a great match for this, and would allow rich querying (e.g. by employee, by date, by restaurant). My personal bias is to prefer Postgres.
3) I don't see what Redis is getting you here. It doesn't support rich querying, is memory bound, and you don't need the throughput it promises.
by thinkingkong on 10/19/14, 5:56 PM
I'll try and answer your 3 questions.
1. As with most things it depends, but I'll make the mistake of assuming you don't need all of this data in memory and you won't be iterating over the entire set of data on a regular (<1m) basis. In that case, it most likely makes sense to move to a relational database. You should ask yourself what kind of queries you're running on the data and pick something appropriate. But for the data you're listing, then yeah - go with a relational DB.
2. Again "It depends". Postgresql is the most popular alternative to MySQL and I'd say at this point is more in vogue with the developer community. It's supported on AWS RDS and Heroku has a great postgres.app for OSX. Your data is so simple that I'd encourage you to start there. In the future when you have specific needs then you can research more appropriate database technologies.
3. From what you've shown us then you should most likely move to a relational database. The size should shrink dramatically, too.
by hkarthik on 10/19/14, 4:43 PM
Any relational DB will work but I would look at Postgres before going with MySQL.
by nielskrijger on 10/19/14, 8:23 PM
Without going into any level of technical depth, I'll offer my two cents.
"Write-heavy" means your application processes thousands of records per minute. Three million orders on a monthly basis doesn't look like too big of a deal. Any database should be able to manage that easily. Redis would have been my last pick to do that. Redis I only use when to keep high-volatile non-essential data in memory, not much else.
Key in picking your database is how your data is being used. Are the most recent data records accessed most often and the rest almost never? MongoDB and an RDBMS will do quite nicely. Do you really expect _extremely_ heavy growth, a NoSQL datastore might be better. Are DBA's in short supply? Use a managed datastore.
As a big fan of NoSQL, I'd say; be cautious when using NoSQL datastores. Any SQL database will do the majority of workloads quite nicely and offer you with plenty of tools to do any type of query you might need to. NoSQL databases do analytical queries usually quite poorly; separating OLTP and OLAP is painful and costly for smaller apps, only for that reason it is best to avoid them in most circumstances.
Above all; use what you can run cheapest. The available skillset should play a role too, if most engineers are familiar with SQL Server, use SQL Server... I generally use MySQL or PostgreSQL on RDS (AWS) given DBA's are hard to find. I use DynamoDB when I have extremely high dataloads. I use MongoDB when developing a common app for NodeJS because NodeJS simply works very well with MongoDB… I consider other NoSQL datastores only when processing many millions of records per day. For all the rest (and majority of use cases); pick an RDBMS.
// In production I've used Oracle DB, MySQL, PostgreSQL, MongoDB, DynamoDB, Cassandra, Google Cloud Datastore, Redis, RedShift and Elasticsearch.
by sganguly on 10/20/14, 5:34 PM
Hi Gary4gar, Read the points that you stated in your question. If you focus on growth (clearly your wins are going to bring more data updates as your increase the customer base) you should look into a solution that does not tie you down with data model and also does not expect you to worry about sharding in the future. Having done sharding and when one keeps growing the data volume and having to do shards it becomes painful and hurts the development flow. Been there many time and done it quite a bit myself. (Am a technical guys myself, http://www.linkedin.com/in/sukantag)
Look for a NoSQL solution which will help in the flexi-schema model. Helps you add different types of data models relatively easily. Check out Aerospike (http://www.aerospike.com), the product scales like scaling problems have disappeared from this planet. All of the characteristics that you would need based on what you have stated is in the product. Best of all it is free for startups. Just give it shot. Open Source server and client libraries in multiple programming languages. (Am a big "C" and Python guys so like them client libs a lot). Lots of startups and many large software companies use them.
If you like it and works for your application tell others, if not tell us what did not work.
I ain't sellin and you ain't buyin;
Ciao
by kruk on 10/19/14, 6:51 PM
1,2) As so many have pointed out already, this type of data is what relational databases excel at manipulating. Unless you already have a MySQL server running, I'd recommend you try PostgreSQL. It offers great performance and scalability.
3) Redis is a great tool in its own right but it's more of a key-value data store than a full fledged database. It's fantastic for caching, tasks queuing or cross application communication.
by ColinCera on 10/20/14, 4:22 AM
If you don't want to worry about database/server administration and scaling issues, I'd recommend looking at Cloudant. It's quite inexpensive, takes virtually no administration effort, scales automatically, and is fast and flexible.
You could continue to use Redis as a write cache, although a message queue or something like Kafka might be better for that purpose, and of course as a read cache, while using Cloudant or some other "real" database for permanent storage.
(On a side note, you described your application as "write-heavy" but you also said you get "about 3 Million individual customer orders on monthly basis" — which works out to just slightly more than one record to insert per second, on average, which doesn't seem "write-heavy" to me. I understand that you get orders submitted in batches, rather than evenly distributed, but it still doesn't seem especially write heavy. At any rate, a database like Cloudant can take as many writes as you want to throw at it.)
(Disclaimer: I don't work for Cloudant; I am a satisfied customer; I don't use Cloudant for everything.)
by rmetzler on 10/19/14, 6:46 PM
TL;DR: you don't have to use one or another, use both!
Your use case, running a restaurant is certainly a mixed use case. You'll certainly want to query the latest orders if this is going from the waiter to the kitchen. But yesterday's orders aren't that much important.
As others have mentioned, this data is exactly the kind of data you would store in a relational database. Just be sure to not query for the last 10 orders or something like that. I'm not a real database expert, but I've made this mistake before and as far as I know ordering by date usually means a full table scan.
RDBMS shine when you just store your data and run interesting queries on that data whenever you want to answer a question about that data. Redis is the other way around. If you want to query your data, you'll have to think about your question ahead of time and store it in an appropriate data structure.
What I was wondering is, do you shard your data in different databases? I would guess, every customer should have their on
by bendmorris on 10/19/14, 5:19 PM
Why did you choose Redis in the first place?
Redis does have its own (official but not at a stable release yet) cluster implementation. (http://redis.io/topics/cluster-spec) I'm not sure how production ready it is - has anyone here tried it?
by itamarhaber on 10/20/14, 9:42 PM
Regarding 3 - I don't know if you (or anyone for that matter) are wrong or right about this (or anything), but there are ways to use Redis w/o upgrading to bigger machines. Redis cluster, as mentioned in @bendmorris comment, is one way to go about it.
Alternatively, if you want to continue using Redis - with the current or a slimmer version of the data model - without having to worry about machine upgrades or clusters, then I recommend that you look at http://redislabs.com/redis-cloud (disclaimer - I work at Redis Labs). Our managed service is built using production-proven Redis clusters that allow your database to scale well beyond the limits of any single node, without you having do anything or change a line of code.
by ScottBurson on 10/19/14, 6:19 PM
I wouldn't even consider using MySQL. Postgres (aka PostgreSQL) is an industrial-strength open-source RDBMS.
I can't comment on the non-relational options as I've never used them, but for your data volume I would expect Postgres to be adequate.
by auganov on 10/20/14, 2:34 AM
Would be helpful if you shared what your reads are like. Are they limited to a single customer or span the whole db? How fast do you need the "read-side" [or some portion of it] to be aware of the new writes? Also remember that there's nothing stopping you from using multiple databases. I.e. if you run complex analytically queries that are just impossible to execute even in soft real time it's very reasonable to have a separate database for that does that very well.
by revelation on 10/19/14, 6:11 PM
I don't have an opinion re redis to MySQL, but it seems like right now the cheapest options is to simply buy a server with 256+ GB of RAM. These are pretty common by now.
by jrochkind1 on 10/19/14, 6:41 PM
There's no easy answer i don't think. it depends.
If you decide it makes sense to switch to an rdbms, you should probably consider postgres rather than MySQL.
by rubiquity on 10/19/14, 4:34 PM
Riak[0] could also potentially be a good fit for your use case. Unlike Redis, Riak is distributed. I've heard of several companies in the POS space using it.
0 - http://basho.com/riak/
by ayushghosh on 10/19/14, 7:04 PM
Use a document store like mongoDB, use redis for frequently used data only, Mysql is also great, maria db would be better or percona.
I think redis should have current data only which may be needed to process or crunch daily sales or so etc.
by sirdavidhuang on 10/19/14, 5:58 PM
I think Apache Cassandra DB is what you are looking for.
Apache: The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.
http://cassandra.apache.org/
http://planetcassandra.org/getting-started-with-apache-cassa...
by mud_dauber on 10/19/14, 7:30 PM
Here's another vote. Definitely move away from Redis for the reasons already stated. Choosing MySQL or Postgres depends on your dev team's relative comfort with each.
by diminish on 10/19/14, 5:56 PM
mysql, mariadb or postgresql would be ok.
>> menu_items - string(comma seprated list of all menu items that customer ordered along with their price & Qty)
you may prefer to rework the menu_items as an independent table depending on your uses cases and if you don't have strong reason against.
by resca79 on 10/19/14, 6:55 PM
I think redis is excellent choice for evil writes. Redis has many kinds of configuration[1] Do you use the base package configuration or you add additional config options?
[1]http://redis.io/topics/persistence
by kxo on 10/19/14, 6:19 PM
You could try FoundationDB, with or without the SQL layer.
by Kiro on 10/19/14, 7:17 PM
2. MongoDB.
by imanaccount247 on 10/19/14, 6:46 PM
>moving from redis
Yes. It is a very poor choice for your needs.
>to mysql
No. It is a very poor choice in general.
>What other databases
Postgresql obviously. I'm honestly shocked that people are still considering mysql in 2014.
by davismwfl on 10/19/14, 11:53 PM
I don't think you are to this point yet, but a RDBMS under heavy write load can run into a lot of headaches too, requiring a very carefully designed schema and proper disk sub-system (lots of IOPS), plus the need to have a solid DBA to maintain it. However, non of that is a vote against moving to the RDBMS, so I do think you should seriously consider Postgres or MySQL though as it seems like a good fit.
What I would consider though is keeping your Redis up front for accepting/processing the records in real time coming in and then batching them into your RDBMS. This provides you the ability to do reporting and other analytics through the RDBMS, but utilize Redis as a buffer for the writes. It also gives you the ability to massage the data into the specific schema that best fits the usage of the data in the RDBMS, but allowing the most efficient schema coming into Redis from the POS systems. It adds complexity though so that would have to be weighed.