from Hacker News

Ask HN: What made SQL databases so popular?

by jmilinion on 3/10/13, 9:33 AM with 55 comments

I propose a persistent storage solution for ANY idea to a traditional technical community. Immediately, a SQL relational database is proposed as the solution. I suggest other solutions. I'm sad now because everyone is telling me that I MUST implement it in SQL otherwise I'm wasting my time. According to them, once SQL stops handling the solution, then can I look at the other "eccentric" solutions.

I'm trying to figure out how did SQL get to be the de facto solution to every single persistent data problem out there? Data can be stored in so many ways but a SQL relational database is pushed by so many people that their voice can be overwhelming at times. Why is this?

Added Note: I feel sad for the SQL language since it is one of the few ways many programmers learn about declarative programming. With that being said, this SQL language is different from the more commonly used form of "SQL" which is the nickname people call the well known SQL "relational" databases

  • by buster on 3/10/13, 10:36 AM

    Quite honestly i often do not understand the extreme focus HN puts on NoSQL databases, as if it would be the new solution that fits every problem. The point is that SQL solves a lot of problems rather well and every NoSQL database has its own drawbacks. In the end you have a lot of NoSQL databases that solve some specific problems particularly well but have major drawbacks in other situations. SQL databases tend to fit a lot of solutions quite well, they are tested, stable, do transactions, can even scale and (for me) most important: SQL is a standard, the query language is well understood and documented, it is relatively easy to switch between SQL databases. But it's for practical purposes impossible to switch between a given set of NoSQL databases, let alone you need to learn a lot of new things or need another set of developers. Recent example in my project: I had a problem that did fit particularly well into Redis (small, self contained writes) but i also needed good querying capabilites. So after fiddling around with Redis and trying to make it work i had to abandon Redis. Next choice would have been: MongoDB or MySQL. The project already uses a MySQL server and i am easily able to replace MySQL with SQLite, Postgres or Oracle by using an ORM (SQLAlchemy). So the choice was easy. MongoDB would have been the worse choice but a possibility.

    I'd wish someone would come up with a common query language that could be reused on different NoSQL databases, that would make NoSQL much more an option. UnSQL[1] looks abandoned :(

    So after all: Every problem has its solutions, every program a good fit of a database. Either NoSQL or SQL, don't just point to some NoSQL database because it is the current buzzword, but ONLY because it can solve a problem better then a given SQL database. Then it doesn't matter if you call it SQL or NoSQL. Your NoSQL solution simply needs to redeem the benefits of SQL (standardized query language, transactions) with other benefits.

    [1] http://www.unqlspec.org/display/UnQL/Home

  • by justinsb on 3/10/13, 10:16 AM

    "Democracy is the worst form of government except all those other forms that have been tried from time to time."

    Relational DBs & SQL aren't perfect, but they've proven better than 30 years of attempts to replace them. I think the burden is (rightly) on you to explain why your alternative is better. There are applications where other approaches are appropriate (e.g. web search); if you can't explain why something is better solved using a non-relational system, then perhaps it isn't.

  • by indymike on 3/10/13, 12:14 PM

    Short answer: SQL databases were easier to use for developers and non-developers than other client server relational database systems in the late 80s and early 90s (dbase derivatives, btrieve and ctree were common then). Tools like Btrieve and ctree were largely libraries, and really didn't have the ad-hoc query capability that SQL has. Dbase suffered from incompatible implementations... so an application written on one Dbase clone wouldn't work exactly right in another. Old school client-server databases like Btrieve and ctree were extremely tightly coupled, and there was little separation of concerns between storage and logic. Lots of messy, messy code... and writing the storage and retrieval parts of your application were non-trivial. Since end users had no way to do their own queries (outside Dbase), most reporting was written in C, pascal or even BASIC. There were a few reporting tools like Crystal Reports that tried to solve the reporting problem, but it was often just as easy to roll the report in C...

    When the internet took off, you had MS-SQL server on Windows and several open source SQL servers on Linux (MySQL, mSQL and Postgres) that freed web developers from having to write lots of custom data storage/retrieval code. A singe SELECT or UPDATE command did what 150-200 lines of C could do. It also meant that web apps could go to market without advanced reporting capabilities or CRUD for detailed configurations because you could always just roll an SQL query to deal with that later.

    SO, here is what you are up against in selling noSQL vs SQL:

    * Old developers who lived through the 80s and 90s database hell and see SQL as the thing that freed us.

    * An extremely mature toolchain vs. new applications without the toys needed by non-developers (reporting, ad-hoc query for end users, integration with spreadsheets, etc...)

    * Familiarity with the query language. Lots of devs know SQL

    * Risk of something new.

    * Fear of having your job turn into writing reports constantly because no one else in the company can.

    SO if you want to be persuasive in selling a noSQL solution:

    * Define the problem clearly and be prepared to show why Mongo, Couch or whatever is the right solution.

    * Avoid arguing that SQL isn't relational. You might be right, but Oracle, IBM, Microsoft, Ashton-Tate, Sybase, Progress and everyone else with an SQL product has outspent you on marketing.

    * Be ready to deal with questions about reports, ad-hoc queries, stored procedures and security. Each of these issues is solved and documented in mature SQL based systems.

    * Giving Sally in accounting access to a unix command line is not going to be as well recieved as giving her access to Excel, Access, MS SQL Management Studio or Navicat.

  • by omarqureshi on 3/10/13, 10:17 AM

    In no particular order:

    1. Transactions are not implemented in most of the "eccentric" solutions

    2. SQL solutions are more battle tested.

    3. The relational model fits most solutions quite well

    Where the relational model doesn't fit or transactions are not required, then it may be worth looking at other solutions.

    Polyglot persistence is also another moving part too, it needs time, expertise and infrastructure to deal with, so even if a non relational model makes sense and there is currently no need for transactions. Who knows when transactions would be a good idea?

  • by antirez on 3/10/13, 12:34 PM

    SQL is popular because for many classes of problems it is awesome and uses a very powerful abstraction (tables) and query language. Most implementations are also very robust and able to provide very interesting guarantees (like ACID).

    The only problem of SQL was that the computer industry was trying to solve everything with SQL databases, that is not a good idea.

    It's good that there are other databases now, but I don't think SQL is going to face away. For instance many NoSQL databases like MongoDB, RethinkDB, may well be SQL databases with different implementation tradeoffs but just picked a different query language because... well because... I don't know why honestly. Not that's a problem but the model is very very similar to objects with fields (tables) I can query.

  • by benologist on 3/10/13, 10:12 AM

    It's the de facto solution because for most use cases it is a valid solution and for many it is the best solution.

    If it wasn't developers would leave it in the past like any other obsolete technology. Especially today technology competes on merit before marketing or anything else.

  • by emin_gun_sirer on 3/10/13, 1:29 PM

    My group developed the latest, fastest NoSQL system on the scene (http://hyperdex.org), one that provides ACID transactions as well as a ton of other exciting features like scalability, consistency and fault-tolerance.

    Yet when someone asks this question, I usually respond with "try a traditional RDBMS first."

    The simple reason for this is that there is additional, implicit information in the question: the fact that someone is asking it indicates that they are not up-to-date on one of the most exciting revolutions taking place in data management. Such users will require extensive hand-holding, and there are a lot of SQL-related resources out there that can help them with this.

    But there is a twist: what people are doing is kind of like how some religions have to turn you away before you become a true convert. Will an RDBMS user be completely happy? Perhaps, but if the application is at all demanding, they will wake up in the middle of the night because of a wedged DB, running out of internal table space. Or have to frantically hire DBAs to "optimize" their queries. Or hand money hand over fist to Oracle. Or discover that an AWK script is faster than their multi-million dollar RDBMS installation.

    So it's not like people are turning you away from NoSQL altogether -- you will end up with a NoSQL system eventually, they're just making sure that you do this for the right reasons.

  • by davidad_ on 3/10/13, 10:57 AM

    I recommend this book: http://amzn.to/13NgjU9 (affiliate link), which I bought for inspiration on how to build my own database from scratch and finally put down, after reading several chapters, with a reasoned appreciation for the way things are.

    To simplify, the SQL data model exquisitely balances:

    * expressivity - queries involving both GROUP BYs and subqueries, which I've needed more than once, are challenging at best to translate into Mongo's query model, which is one of the most expressive outside SQL

    * speed - as long as your query can run on a single machine, SQL query planners are the best, period. Other models tend to be more horizontally scalable, but this was not a priority for most of database history, nor for most real-world situations.

    * compactness - the on-disk overhead of a SQL database is fairly limited, about 40-50% in practice. Obviously, a flat file has even less overhead (<10%, generally), but some non-SQL systems consume storage willy-nilly (it's not uncommon to see XML overheads surpass 500%, and Mongo overhead hovers around 90-100%).

    * robustness - in the SQL standard, "undefined behavior" is kept to an absolute minimum. In particular, transactions are invaluable anti-Heisenbugs, but cascading rules, sanity constraints, and fixed schemas also reduce production gotchas (at the cost of development agility, of course).

  • by atsaloli on 3/10/13, 10:08 AM

  • by RyanZAG on 3/10/13, 10:58 AM

  • by jamespitts on 3/10/13, 12:09 PM

    Relational databases provide a _framework_ for structuring data and retrieval. And it is an approach informed by decades of practical experience and, well, plain mathematics.

    Would you try to program a web application without a framework? Most of us do, at least once or twice in our career, not thinking about the next guy who needs to maintain our POS spaghetti. Perhaps if you're very bright or experienced you could do a good job structuring a web app from scratch, but in effect you would writing your own framework. Still it makes sense to depend on and learn from the great web frameworks -- rails, django, etc. -- so you can think about getting your real work done.

    Same goes for sql. It is structured to keep us from being an idiot.

    Stand on the shoulders of giants.

  • by glimmung on 3/10/13, 11:05 AM

    You seem to be asking this question in a vacuum, which makes it impossible to address the costs and benefits of relational vs. alternative approaches with any context. What types of problem are you trying to solve? What alternatives to an RDBMS are you proposing to solve them with?

    Having said that, I write transactional business applications (often dismissively referred to as "CRUD Apps), and in that context an RDBMS is a sensible default, but not for every part - so on Google App Engine for example I'm using CloudSQL for the business data, but Google's data store for the session data - because that has very different characteristics.

    [1] I'm reading "SQL" as shorthand for relational, although the two are hardly synonymous.

  • by michaelochurch on 3/10/13, 1:10 PM

    Relational databases have a sound mathematical model. You may not think of it this way, but a table can be seen as a predicate in logical terms. You can frame most logical queries as a chaining of predicates and functions, and you can create arbitrary (finite) predicates using tables. That's powerful.

    NoSQL came into style with the web, in which you have large data sets but don't need most of the relational features. You might have a trillion Wuphfs, but you'll never need ad-hoc queries because you're doing mostly CRUD on the Wuphfs. Sharded SQL is a painful way to deal with that problem space. NoSQL systems are often better adapted to it. NoSQL is almost always less powerful abstractly but easier to scale.

    The major problem that I've seen with NoSQL systems is that they tend not to handle the edge cases very well. Remember the old joke about the man who goes to the doctor and says, "It hurts when I do this", so the doctor says "Don't do that". Sometimes, you have to "do that". Many NoSQL products are also quite low-level, requiring that more traditional database functionality be written in client code. That gets nasty-- slow, flimsy, hard to coordinate. You don't want, for example, clients to be responsible for generating sequential unique IDs because the clients all have to agree on what the rules are, in addition to generating a lot of communication overhead each time a new block of IDs is requested.

    Furthermore, if you're using NoSQL to build something that will be marketed as a database product (e.g. within your company) and are exposed to requirement accretion, you'll find that the requirements are often inspired by expectations derived from relational models: ACID transactions being the big one.

    I am not saying that SQL is the best for all problems, but it is a damn good technology. Personally, I think that a lot of SQL hate comes from the limitations of MySQL. If that's the source of your headaches, look into Postgres. Postgres is surprising in how modern and powerful it is.

  • by drucken on 3/10/13, 4:56 PM

    Nothing else offered Codd's relational model of:

    0. simple user commands

    1. data independence (from hardware and software implementation)

    2. automatic navigation (instead of single record access).

    The rest was competitive history with the giant of the time, IBM, who created and used SQL in their relational System R product. SQL variants rapidly became commercial successes, with the earliest external adoption and most notable being that by Oracle.

    In time, SQL became a standard and, as a relatively simple declarative language (the HTML of data if you will), it is easy to learn and widespread.

  • by 7952 on 3/10/13, 12:14 PM

    This is just a personal opinion. But as apps scale you run into concurrency problems. You want to maintain a reliable consistent state but due to latency and resource limits that is difficult to do within your own code. So the responsibility is moved onto another system designed for that purpose. NoSQL simply solves the same problem in a slightly different way. The consistent state is allowed to be abstract and doesn't need to be transactional.

    The real dichotomy is between handling state yourself in code and letting outside software do it for you.

  • by TheZenPsycho on 3/10/13, 10:48 AM

    Whoah whoah whoah. SQL IS NOT RELATIONAL. Let's put that to rest right now. You can have relational databases if you like, but put SQL in front of it and now you don't have a relational database system anymore. you have an SQL database. This is because SQL actually violates a few of CODD's axioms for what constitutes a "relational" database. The tragedy of NoSQL is really the setting up of this dichotomy between "SQL/Relational" and "NoSQL/NotRelational" which ignores the possibility that "SQL CAN'T BE RELATIONAL" and "Relational with no SQL"
  • by xd on 3/10/13, 8:25 PM

    The majority of HN won't like me for saying it .. but the major factor in NoSQLs "popularity" is not needing to learn SQL. Which is a crying shame, as it's one of the corner stones of computer science.
  • by arvit on 3/10/13, 10:35 AM

    Relational DBs work for processing of business transactions -- when you need to get the amount of money and all the other details exactly right, when you need to retrieve the right record according to a variety of criteria which may come from other types of data, and when you need to be sure the info is inserted in exactly the right format.

    For this, there are DB transactions, SQL SELECT with joins, and schemas.

    Much of this is possible in other technologies, but RDB does it well.

  • by zobzu on 3/10/13, 10:27 AM

    Simple. Same reason people code in C or Python or Bash. And not C#, Ruby, ZSH, or much worse: their own implementation or whatever: It standard. Everyone knows how it works. It's proven.

    So unless it's really not good enough, diversity for the sake of diversity is the opposite of being efficient.

    Hence, you're asking the wrong question. In fact, you shouldn't ask a question. Show or prove your solution brings a REAL benefit. If you can't, you're indeed wasting your time.

  • by manidoraisamy on 3/10/13, 7:35 PM

    Tools. All data processing tools right from analytics to integration need to work with the datastore. RDBMS has accumulated these tools for so many years. It is not a level playing field for NoSQL to beat that ecosystem.
  • by amrnt on 3/10/13, 12:20 PM

    Direct answer, SQL is the only thing that learnt at schools.
  • by ExpiredLink on 3/10/13, 10:32 AM

    - Codd's relational model

    - ACID

    - SQL

    - Chen's ERM

  • by rbanffy on 3/10/13, 10:58 AM

    What exactly are you trying to store? Depending on what you are persisting, SQL may make perfect sense.
  • by VLM on 3/10/13, 12:35 PM

    Amazing no one's quoted the inner platform effect.

    http://en.wikipedia.org/wiki/Inner-platform_effect

    Hmm, SQL query too slow takes 10 seconds for one query. I know, I can use a non-SQL solution. Each query will take 100 milliseconds. Whoops turns out the users need all the relational "stuff" for my reporting and also in my CRUD to keep it consistent enough. I mean, everybody knows databases are consistent and codd normal form and relational, and nosql is a database, therefore when the users spec a database they'll be fine either way? Well no big deal I'll just write my own JOIN. After all, I'm as good of a programmer as the team at Oracle who's been working on JOINs since before I was born, so I'm sure a couple cans of red bull and I'll have something better. Whoops, each "nosql" query does in fact take 100 milliseconds but now I need to run 1e6 of them and shove them into my homemade JOIN routine, so I've now "improved" my total overall system time from 10 seconds to 10 kiloseconds. Whoops.

    I've never been able to implement a nosql solution at work because I always get stuck in inner-platform tar pit, or massive consistency issues making the CRUD a nightmare, etc. Usually if you complain about that in the nosql community you get pretty intense complaints that your business is doin' it wrong, lol at a paying gig thats not going to work very well... I'd like to try something new but I haven't had the business case for it yet, or rephrased a deep enough analysis has so far always excluded nosql solutions and that Might indicate a positive analysis isn't deep enough. Hear hooves, think horses not zebras. I've bought the "seven databases in seven weeks" book and its an excellent introduction unfortunately its something like a guided introduction into smooshing up against the business requirements limits of each DB as much as a technical intro.

    Another point is the original request was "persistent storage" not why does nosql not work most of the time. Back in the olden days we called that a "filesystem" but lots of $5/mo webhosters don't allow that type of thing, but they do give you mysql... Also its hard for the PHP coder types to not shoot themselves in their foot with filesystem access, for example, allow the user of the order.php form to store their picture in username.gif what could go wrong other than lack of input sanitizing and some goofball is going to register the username order and upload a "picture" with the filename order.php that actually does something rather naughty, etc. Even if it works, next week, the boss asks to store two image files now, one full size and one icon size now. Hmm the filesystem soln doesn't scale... Rather than all that input sanitation, wouldn't it be simpler to just dump it into a blob column in a mysql table?

  • by ucee054 on 3/10/13, 12:35 PM

    "I'm sad now because"

    Why would you be sad? SQL databases are just a tool, and happen to be the right tool. Why does that make you sad?

    It's like saying "I wanted to eat soup with chopsticks and everyone told me to use a spoon instead. That made me sad."

  • by martinced on 3/10/13, 1:36 PM

    Woaw. HN, the place created by pg, who advocated trying to "beat the average" and have "frighteningly ambitious startup ideas", etc.

    Yet HN has become the place for the status-quo. Where people are going to vote everything encouraging the status-quo (like articles about the benefits of C# and SQL) as if it was the holy gospel...

    "Immediately, a SQL relational database is proposed as the solution."

    For a start you're probably not trying to solve a problem involving big-data ; )

    Ask the big data players who are using fast and gigantic key/value stores how did SQL work for them...

    For smaller needs, then SQL offers, basically: ACID. Most companies do not care about what SQL can really do: they'll wrap everything into ORMs and perform business logic outside of SQL and in crazy models that aren't relational at all and then store back everything into SQL.

    What most business care about is ACID, not being truly relational (which SQL DBs ain't anyway), not set theory.

    The most saddening thing about traditional SQL DBs is that there's no basis for queries: the same query shall return different results depending on when the query is made (because most SQL DBs are "update in place" DBs / value-oriented DBs).

    In the Real-World [TM] this is very problematic because when you can't reproduce a problem you have to try to reproduce the state the DB was in at a particular time and query from there. It is a gigantic time waster.

    In my view for really big data needs SQL simply doesn't cut it while for smaller needs I still want ACID but I want something CRA ("create read append-only"), not CRUD.

    I have nothing against using something like Datomic (which is CRA) backed with a SQL datastore.

    But the "traditional" SQL DB that the herd --bent on never creating anything disruptive-- is using? Not for me.

    CRUD is dead baby. CRUD is dead.

    There are, today, people who've read and understood pg's writings who are using Lisp dialects (like Clojure) and CRA DBs (like Datomic, backed or not by SQL) to gain a competitive edge.

    This kind of stuff is the reason why I come to HN. Not to read why the status-quo is acceptable (even if it's all most of the HN crowd knows).