from Hacker News

Where is all the book data?

by hkhn on 11/10/22, 8:02 AM with 11 comments

  • by eslaught on 11/11/22, 4:49 PM

    Oh, hey. I was just looking at this article the other day.

    I think this article makes some good points, but it's a little too absolutist about this data. As far as I can tell, if you are an author or industry professional, you can get access to the following data:

    * If you want data on your own books: sign up for Amazon Author Central and they'll give you the BookScan data on your books. This is free. https://press.aboutamazon.com/2010/12/weekly-nielsen-booksca...

    * If you want data on comps (i.e., comparable books, or books you are competing against): sign up for Publishers Marketplace and pay for the monthly package ($25/month on top of the PM subscription). This gives you the ability to track 5 ISBNs (and I assume, you can pick new ISBNs every month). https://www.publishersmarketplace.com/bookscan/about.cgi#dat...

    * If you want public library data checkout data: as linked in the article, go to the Seattle Public Library. Free. https://data.seattle.gov/Community/Checkouts-by-Title/tmmm-y...

    The situation only really gets ugly if you want access to broad market data (i.e., across all ISBNs for a given time window, and covering a majority of retail outlets). The best I'm able to find is this comment by Kristen McLean from NPD: https://countercraft.substack.com/p/no-most-books-dont-sell-...

    But this data is (a) limited, and (b) I think it has some pretty serious issues [1]. I sent an email to Kristen to try to address this, but so far no response. (If anyone has any connections that might help, please contact me!)

    And if you want to get access to the data yourself, you're talking about something to the tune of $2,500 USD. And the terms are pretty restrictive. https://www.publishersmarketplace.com/bookscan/about.cgi#pri... https://www.publishersmarketplace.com/bookscan/terms.shtml

    I am actively working on improving this situation, and I've got some ideas for what we could do while still abiding by NPD's terms. If that's something that interests you, please contact me (see my profile).

    [1]: My issue with Kristen's analysis is that it follows (in her words) a "conveyor belt" pattern. That is, the time window is fixed. Within that time window, some books have been on the market 364 days. Some may have been on the market 1 day. So it's not surprising that some books have very few sales: they may simply have not been on the market long enough. And you can't just say, "well, multiply the data by 2x to account for the average case" because I'm pretty sure that doesn't work. But without real data I can't fix this.

  • by afandian on 11/11/22, 3:03 PM

    Not about sales data, per this article, but bibliographic metadata. Check out POSI, the Principles for Open Scholarly Infrastructure:

    https://openscholarlyinfrastructure.org/

    There are people dedicated to open metadata and open systems to work with it.

    https://openscholarlyinfrastructure.org/posse/

    (I work at Crossref)

  • by epaulson on 11/11/22, 4:16 PM

    The book data that I wish was more accessible was bibliographic data. I wish there was a cheap ISBN API (cheap enough for an individual to afford) that I could use to look up all of the data for my book from just a barcode scan. I know there are some API providers for that, but the plans are clearly meant for big users and not for someone who just wants to use it a couple hundred times.

    This would be something the Library of Congress should run, or maybe one of the university library consortia, like the formerly-named Committee on Institutional Cooperation, (which has renamed itself the 'Big Ten Academic Alliance', because football - https://btaa.org/library/Libraries )

  • by dglass on 11/11/22, 3:55 PM

    While I agree with the general argument in the article that sales data and similar metrics should be public, I think there's a lot more that can be done to unlock all of the knowledge stored in books. There are vast amounts of knowledge that humanity has built up over centuries that are either hard to find or hard to access unless you know where to look. How does someone like me discover that knowledge for a topic I'm interested in?

    I wrote a book that was recently published to help junior and mid-level programmers build up their soft-skills to advance their career[0]. The book was published by Holloway[1]. They have an interesting platform to solve this problem, which is why I chose to publish with them. They publish works primarily through their online reader, which is indexable by search engines. So someone searching for "How to get up to speed on a new codebase" in their preferred search engine could stumble across the chapter titled "How to read unfamiliar code"[2] and read a free preview of the book. Over time, people can discover and access the knowledge stored in any book that is published on Holloway's platform.

    Another nice side effect of the platform is that it can be updated over time, so outdated knowledge or content can be revised, updated, and re-indexed by the search engines as knowledge about topics evolve.

    If you're considering writing a book, or have a manuscript and are looking for a publisher, I'd recommend giving Holloway a look to see if it would be a good fit.

    [0]: https://www.holloway.com/b/junior-to-senior

    [1]: https://www.holloway.com/

    [2]: https://www.holloway.com/g/junior-to-senior/sections/how-to-...

  • by kmeisthax on 11/11/22, 4:55 PM

    One of the bigger complaints about AI art generation is that "oh, it'll become a closed loop system, and then we'll all be sitting at our chairs watching a neural network spew art all day while human artists starve to death". This is kind of funny because, if Public Books' article here is even remotely true, the existing publishing system is already a closed loop. Publishers only commission or purchase works that match the particular taste profiles that are already trained into the sales data. If you want to make something new, the publishing companies have already boycotted and cancelled you.