from Hacker News

Ask HN: What are the best way to learn advanced SQL?

by slinger on 3/21/18, 5:03 PM with 10 comments

As a web developer what are the best resources and exercises to learn advanced SQL? How do you choose which open source database best fits an application use-case?
  • by lastofus on 3/22/18, 6:09 AM

    PostgreSQL is hands down the way to go when you need a relational DB (which is 99% of the time).

    Some features to read up on:

      * COALESCE, CASE
      * Aggregate functions + GROUP BY + HAVING
      * Subqueries
      * The LEFT/RIGHT/INNER JOINs, and things you can achieve with JOIN conditions
      * Different index types and how to use them
      * Different constraint types and how they will save your ass
      * Using temporary tables
      * Views + Materialized Views
      * CTEs, including recursive
      * Full text search
      * Trigger functions
      * Window function
      * DISTINCT ON vs DISTINCT & GROUP BY (a PG extension - why it's useful)
      * Partial/functional indexes
      * Schemas - especially useful for multi-tenancy
      * COPY to load in datasets fast 
      * Transaction isolation levels
      * Manual locking
      * EXPLAIN + EXPLAIN ANALYZE
    
    The above is just scratching the surface, but it's what I've found most useful doing web dev. The PG manual covers all these things well, along with supplemental stuff that can be googled. If you read the PG manual cover to cover, your mind you be blown by what the DB can accomplish.

    For learning about performance and what's going on under the hood, check out "PostgreSQL 9.0 High Performance".

    Most of the really complex queries I've written consist of writing several CTEs/sub-queries/views/temp tables which create a number of intermediate tables containing transformed data, which in turn are available to the final top level query. There's a lot of overlap with functional programming here: build small queries, and use them as reusable building blocks.

    If you find yourself needing to dynamically generate complicated queries, instead of trying to construct them w/ string concatenation in your app, try building temporary views, and then referencing those views in later queries, as opposed to doing it all in one go. I usually have one function/method per view in the app, and I can call the needed functions to create the foundation for a later query.

  • by harperlee on 3/21/18, 5:34 PM

    Just use postgres.

    Read this (covers performance and internals): https://use-the-index-luke.com

    And this (covers "advanced" SQL, more than the usual functions): http://www.windowfunctions.com

  • by combatentropy on 3/22/18, 2:56 AM

    Like any language it takes a mixture of reading and practice. If you just read, it won't stick, because you have nothing to hang it onto. If you just practice, you'll adopt hacks that work but aren't the best way to do it. So it would be nice if you had a job that demands you regularly come up with new SQL. Try to prepare your data as much as possible in the database, instead of partly in the database and partly in your middle language (say, Python). So, for example, try to make web pages that use the Mustache template system (a very strict and minimal one) and use Python (or whatever) only to pull the data from the database and stuff it into the template. That will make you learn the esoteric features of your database.

    I agree with others that you should try to use Postgres. It's easy to get going on Linux, but if for whatever reason you can't right now, SQLite is good, too. Really, Postgres or SQLite, there is no technical need for any other database.

    Supposing that you can go with Postgres, then your reading assignment is easy: just read the Postgres documentation from start to finish. Of course you won't get it the first time, and it might be a multimonth affair, but it's about as good as any book I've found. For some reason, all books on SQL are really boring. My very first book was Databases Demystified, which was a fine book and helped me learn "normalization," which is a fancy word for Don't Repeat Yourself --- except it's talking about reducing redundancy in data instead of code.

  • by smt88 on 3/21/18, 5:22 PM

    If you need a relational schema (and you usually do), then you should use Postgres. Don't make it complicated for no reason. It can also store documents as JSONB.

    If you aren't certain that you need a graph DB, don't use one.

    What do you mean by advanced SQL? The language itself is simple once you understand subqueries. The underlying math can be hard, but modern RDBMS abstract a lot of the performance considerations away by optimizing your queries for you.

  • by zer00eyz on 3/21/18, 6:51 PM

    Sql is a bit more "binary" than that - if you can write one query, and read the manual "advanced" isn't really a thing.

    However, it is all going to be for naught if you don't understand what is underneath the database - for that I highly recommend "database design for mear mortals" - I think I have bought about 6-8 copies of the book and every single one has been "relocated" (stolen, or not returned or given away).

    Understanding HOW an RDBMS works is critical to understanding how to make the best use of one, and then measure the trade offs between any given one.

    Unless your starting out with a problem/domain that requires you to do things that might be feature bound - as an example if your working with geospatial data, then by all means that should be a driver in your selection. If your doing basics like sign up, auth, and cms work then pick one and give it a whirl (they are mostly interchangeable at that level).

    EVERY database has idiosyncrasies. Learning those takes time, and experiences and sometimes running face first into them.

  • by JosephHatfield on 3/21/18, 10:12 PM

    Check out Joe Celko's book, SQL for Smarties; There are some mind-blowing SQL techniques in that book.
  • by gigatexal on 3/22/18, 9:26 PM

    What I do is write the query that gets me the data to answer the question that I need no matter how crappy. And then using the engine and statistics and maybe diagramming out the relations between tables and the indexes try to refactor it to be better. In the end what matters is that the data returned is right. Optimizing after you have that makes most sense in my head and that's when you can go off and feel confident in doing so ( because your deliverable is done, the correct data returned ) and research more "advanced" sql to get your query better.