by jmilinion on 3/10/13, 9:33 AM with 55 comments
I'm trying to figure out how did SQL get to be the de facto solution to every single persistent data problem out there? Data can be stored in so many ways but a SQL relational database is pushed by so many people that their voice can be overwhelming at times. Why is this?
Added Note: I feel sad for the SQL language since it is one of the few ways many programmers learn about declarative programming. With that being said, this SQL language is different from the more commonly used form of "SQL" which is the nickname people call the well known SQL "relational" databases
by buster on 3/10/13, 10:36 AM
I'd wish someone would come up with a common query language that could be reused on different NoSQL databases, that would make NoSQL much more an option. UnSQL[1] looks abandoned :(
So after all: Every problem has its solutions, every program a good fit of a database. Either NoSQL or SQL, don't just point to some NoSQL database because it is the current buzzword, but ONLY because it can solve a problem better then a given SQL database. Then it doesn't matter if you call it SQL or NoSQL. Your NoSQL solution simply needs to redeem the benefits of SQL (standardized query language, transactions) with other benefits.
by justinsb on 3/10/13, 10:16 AM
Relational DBs & SQL aren't perfect, but they've proven better than 30 years of attempts to replace them. I think the burden is (rightly) on you to explain why your alternative is better. There are applications where other approaches are appropriate (e.g. web search); if you can't explain why something is better solved using a non-relational system, then perhaps it isn't.
by indymike on 3/10/13, 12:14 PM
When the internet took off, you had MS-SQL server on Windows and several open source SQL servers on Linux (MySQL, mSQL and Postgres) that freed web developers from having to write lots of custom data storage/retrieval code. A singe SELECT or UPDATE command did what 150-200 lines of C could do. It also meant that web apps could go to market without advanced reporting capabilities or CRUD for detailed configurations because you could always just roll an SQL query to deal with that later.
SO, here is what you are up against in selling noSQL vs SQL:
* Old developers who lived through the 80s and 90s database hell and see SQL as the thing that freed us.
* An extremely mature toolchain vs. new applications without the toys needed by non-developers (reporting, ad-hoc query for end users, integration with spreadsheets, etc...)
* Familiarity with the query language. Lots of devs know SQL
* Risk of something new.
* Fear of having your job turn into writing reports constantly because no one else in the company can.
SO if you want to be persuasive in selling a noSQL solution:
* Define the problem clearly and be prepared to show why Mongo, Couch or whatever is the right solution.
* Avoid arguing that SQL isn't relational. You might be right, but Oracle, IBM, Microsoft, Ashton-Tate, Sybase, Progress and everyone else with an SQL product has outspent you on marketing.
* Be ready to deal with questions about reports, ad-hoc queries, stored procedures and security. Each of these issues is solved and documented in mature SQL based systems.
* Giving Sally in accounting access to a unix command line is not going to be as well recieved as giving her access to Excel, Access, MS SQL Management Studio or Navicat.
by omarqureshi on 3/10/13, 10:17 AM
1. Transactions are not implemented in most of the "eccentric" solutions
2. SQL solutions are more battle tested.
3. The relational model fits most solutions quite well
Where the relational model doesn't fit or transactions are not required, then it may be worth looking at other solutions.
Polyglot persistence is also another moving part too, it needs time, expertise and infrastructure to deal with, so even if a non relational model makes sense and there is currently no need for transactions. Who knows when transactions would be a good idea?
by antirez on 3/10/13, 12:34 PM
The only problem of SQL was that the computer industry was trying to solve everything with SQL databases, that is not a good idea.
It's good that there are other databases now, but I don't think SQL is going to face away. For instance many NoSQL databases like MongoDB, RethinkDB, may well be SQL databases with different implementation tradeoffs but just picked a different query language because... well because... I don't know why honestly. Not that's a problem but the model is very very similar to objects with fields (tables) I can query.
by benologist on 3/10/13, 10:12 AM
If it wasn't developers would leave it in the past like any other obsolete technology. Especially today technology competes on merit before marketing or anything else.
by emin_gun_sirer on 3/10/13, 1:29 PM
Yet when someone asks this question, I usually respond with "try a traditional RDBMS first."
The simple reason for this is that there is additional, implicit information in the question: the fact that someone is asking it indicates that they are not up-to-date on one of the most exciting revolutions taking place in data management. Such users will require extensive hand-holding, and there are a lot of SQL-related resources out there that can help them with this.
But there is a twist: what people are doing is kind of like how some religions have to turn you away before you become a true convert. Will an RDBMS user be completely happy? Perhaps, but if the application is at all demanding, they will wake up in the middle of the night because of a wedged DB, running out of internal table space. Or have to frantically hire DBAs to "optimize" their queries. Or hand money hand over fist to Oracle. Or discover that an AWK script is faster than their multi-million dollar RDBMS installation.
So it's not like people are turning you away from NoSQL altogether -- you will end up with a NoSQL system eventually, they're just making sure that you do this for the right reasons.
by davidad_ on 3/10/13, 10:57 AM
To simplify, the SQL data model exquisitely balances:
* expressivity - queries involving both GROUP BYs and subqueries, which I've needed more than once, are challenging at best to translate into Mongo's query model, which is one of the most expressive outside SQL
* speed - as long as your query can run on a single machine, SQL query planners are the best, period. Other models tend to be more horizontally scalable, but this was not a priority for most of database history, nor for most real-world situations.
* compactness - the on-disk overhead of a SQL database is fairly limited, about 40-50% in practice. Obviously, a flat file has even less overhead (<10%, generally), but some non-SQL systems consume storage willy-nilly (it's not uncommon to see XML overheads surpass 500%, and Mongo overhead hovers around 90-100%).
* robustness - in the SQL standard, "undefined behavior" is kept to an absolute minimum. In particular, transactions are invaluable anti-Heisenbugs, but cascading rules, sanity constraints, and fixed schemas also reduce production gotchas (at the cost of development agility, of course).
by atsaloli on 3/10/13, 10:08 AM
by RyanZAG on 3/10/13, 10:58 AM
In particular, you can read this book: http://www.nap.edu/openbook.php?record_id=6323&page=159
by jamespitts on 3/10/13, 12:09 PM
Would you try to program a web application without a framework? Most of us do, at least once or twice in our career, not thinking about the next guy who needs to maintain our POS spaghetti. Perhaps if you're very bright or experienced you could do a good job structuring a web app from scratch, but in effect you would writing your own framework. Still it makes sense to depend on and learn from the great web frameworks -- rails, django, etc. -- so you can think about getting your real work done.
Same goes for sql. It is structured to keep us from being an idiot.
Stand on the shoulders of giants.
by glimmung on 3/10/13, 11:05 AM
Having said that, I write transactional business applications (often dismissively referred to as "CRUD Apps), and in that context an RDBMS is a sensible default, but not for every part - so on Google App Engine for example I'm using CloudSQL for the business data, but Google's data store for the session data - because that has very different characteristics.
[1] I'm reading "SQL" as shorthand for relational, although the two are hardly synonymous.
by michaelochurch on 3/10/13, 1:10 PM
NoSQL came into style with the web, in which you have large data sets but don't need most of the relational features. You might have a trillion Wuphfs, but you'll never need ad-hoc queries because you're doing mostly CRUD on the Wuphfs. Sharded SQL is a painful way to deal with that problem space. NoSQL systems are often better adapted to it. NoSQL is almost always less powerful abstractly but easier to scale.
The major problem that I've seen with NoSQL systems is that they tend not to handle the edge cases very well. Remember the old joke about the man who goes to the doctor and says, "It hurts when I do this", so the doctor says "Don't do that". Sometimes, you have to "do that". Many NoSQL products are also quite low-level, requiring that more traditional database functionality be written in client code. That gets nasty-- slow, flimsy, hard to coordinate. You don't want, for example, clients to be responsible for generating sequential unique IDs because the clients all have to agree on what the rules are, in addition to generating a lot of communication overhead each time a new block of IDs is requested.
Furthermore, if you're using NoSQL to build something that will be marketed as a database product (e.g. within your company) and are exposed to requirement accretion, you'll find that the requirements are often inspired by expectations derived from relational models: ACID transactions being the big one.
I am not saying that SQL is the best for all problems, but it is a damn good technology. Personally, I think that a lot of SQL hate comes from the limitations of MySQL. If that's the source of your headaches, look into Postgres. Postgres is surprising in how modern and powerful it is.
by drucken on 3/10/13, 4:56 PM
0. simple user commands
1. data independence (from hardware and software implementation)
2. automatic navigation (instead of single record access).
The rest was competitive history with the giant of the time, IBM, who created and used SQL in their relational System R product. SQL variants rapidly became commercial successes, with the earliest external adoption and most notable being that by Oracle.
In time, SQL became a standard and, as a relatively simple declarative language (the HTML of data if you will), it is easy to learn and widespread.
by 7952 on 3/10/13, 12:14 PM
The real dichotomy is between handling state yourself in code and letting outside software do it for you.
by TheZenPsycho on 3/10/13, 10:48 AM
by xd on 3/10/13, 8:25 PM
by arvit on 3/10/13, 10:35 AM
For this, there are DB transactions, SQL SELECT with joins, and schemas.
Much of this is possible in other technologies, but RDB does it well.
by zobzu on 3/10/13, 10:27 AM
So unless it's really not good enough, diversity for the sake of diversity is the opposite of being efficient.
Hence, you're asking the wrong question. In fact, you shouldn't ask a question. Show or prove your solution brings a REAL benefit. If you can't, you're indeed wasting your time.
by manidoraisamy on 3/10/13, 7:35 PM
by amrnt on 3/10/13, 12:20 PM
by ExpiredLink on 3/10/13, 10:32 AM
- ACID
- SQL
- Chen's ERM
by rbanffy on 3/10/13, 10:58 AM
by VLM on 3/10/13, 12:35 PM
http://en.wikipedia.org/wiki/Inner-platform_effect
Hmm, SQL query too slow takes 10 seconds for one query. I know, I can use a non-SQL solution. Each query will take 100 milliseconds. Whoops turns out the users need all the relational "stuff" for my reporting and also in my CRUD to keep it consistent enough. I mean, everybody knows databases are consistent and codd normal form and relational, and nosql is a database, therefore when the users spec a database they'll be fine either way? Well no big deal I'll just write my own JOIN. After all, I'm as good of a programmer as the team at Oracle who's been working on JOINs since before I was born, so I'm sure a couple cans of red bull and I'll have something better. Whoops, each "nosql" query does in fact take 100 milliseconds but now I need to run 1e6 of them and shove them into my homemade JOIN routine, so I've now "improved" my total overall system time from 10 seconds to 10 kiloseconds. Whoops.
I've never been able to implement a nosql solution at work because I always get stuck in inner-platform tar pit, or massive consistency issues making the CRUD a nightmare, etc. Usually if you complain about that in the nosql community you get pretty intense complaints that your business is doin' it wrong, lol at a paying gig thats not going to work very well... I'd like to try something new but I haven't had the business case for it yet, or rephrased a deep enough analysis has so far always excluded nosql solutions and that Might indicate a positive analysis isn't deep enough. Hear hooves, think horses not zebras. I've bought the "seven databases in seven weeks" book and its an excellent introduction unfortunately its something like a guided introduction into smooshing up against the business requirements limits of each DB as much as a technical intro.
Another point is the original request was "persistent storage" not why does nosql not work most of the time. Back in the olden days we called that a "filesystem" but lots of $5/mo webhosters don't allow that type of thing, but they do give you mysql... Also its hard for the PHP coder types to not shoot themselves in their foot with filesystem access, for example, allow the user of the order.php form to store their picture in username.gif what could go wrong other than lack of input sanitizing and some goofball is going to register the username order and upload a "picture" with the filename order.php that actually does something rather naughty, etc. Even if it works, next week, the boss asks to store two image files now, one full size and one icon size now. Hmm the filesystem soln doesn't scale... Rather than all that input sanitation, wouldn't it be simpler to just dump it into a blob column in a mysql table?
by ucee054 on 3/10/13, 12:35 PM
Why would you be sad? SQL databases are just a tool, and happen to be the right tool. Why does that make you sad?
It's like saying "I wanted to eat soup with chopsticks and everyone told me to use a spoon instead. That made me sad."
by martinced on 3/10/13, 1:36 PM
Yet HN has become the place for the status-quo. Where people are going to vote everything encouraging the status-quo (like articles about the benefits of C# and SQL) as if it was the holy gospel...
"Immediately, a SQL relational database is proposed as the solution."
For a start you're probably not trying to solve a problem involving big-data ; )
Ask the big data players who are using fast and gigantic key/value stores how did SQL work for them...
For smaller needs, then SQL offers, basically: ACID. Most companies do not care about what SQL can really do: they'll wrap everything into ORMs and perform business logic outside of SQL and in crazy models that aren't relational at all and then store back everything into SQL.
What most business care about is ACID, not being truly relational (which SQL DBs ain't anyway), not set theory.
The most saddening thing about traditional SQL DBs is that there's no basis for queries: the same query shall return different results depending on when the query is made (because most SQL DBs are "update in place" DBs / value-oriented DBs).
In the Real-World [TM] this is very problematic because when you can't reproduce a problem you have to try to reproduce the state the DB was in at a particular time and query from there. It is a gigantic time waster.
In my view for really big data needs SQL simply doesn't cut it while for smaller needs I still want ACID but I want something CRA ("create read append-only"), not CRUD.
I have nothing against using something like Datomic (which is CRA) backed with a SQL datastore.
But the "traditional" SQL DB that the herd --bent on never creating anything disruptive-- is using? Not for me.
CRUD is dead baby. CRUD is dead.
There are, today, people who've read and understood pg's writings who are using Lisp dialects (like Clojure) and CRA DBs (like Datomic, backed or not by SQL) to gain a competitive edge.
This kind of stuff is the reason why I come to HN. Not to read why the status-quo is acceptable (even if it's all most of the HN crowd knows).