by cconcepts on 5/3/19, 10:56 AM with 135 comments
He buys bulk lots from deceased estates and bookstores that are closing down. Entire shipping containers are being gifted to him and showing up at his barn. The barn is full and he is now storing in shipping containers outside.
There is great quality books among this quagmire but it takes hours of searching to find them. I figured HN might be able to point me to a solution where I could quickly photograph the front cover and have a script/google images compare the image to online info to index the title and author and then perhaps list them online...
I dunno, it just seems like such a treasure trove of books that he will sell for practically nothing because he loves books and hopes that they will find their way to people who want them - the barrier is allowing customers to find what they are looking for.
Thoughts?
by walterbell on 5/5/19, 2:57 PM
Source: people who have spent years trying to find the names of the 8,000 books in R.A. Lafferty's personal library, lost after he died. About 300 title names have been recovered, https://www.ralafferty.org/tulsa-books/
by pjc50 on 5/5/19, 9:06 AM
I've seen a few of these, and the basic minimum difference between "pulp waiting to happen" and "bookshop" is basic shelving. Different shelves by category: fiction vs non-fiction and their subdivisions. Within the shelves, alphabetise. Now it's possible for browsers to actually find things. When you put them on Amazon this will also help find them for shipping.
This process will also help you find the stacks of duplicates. You'll have a crate of 50 Shades and Twilight and Stephen King. The Stephen King will eventually resell; the others won't.
This page from the excellent Barter Books on their acceptance policy may be of some help: https://www.barterbooks.co.uk/html/About%20Us/Incoming%20Boo...
by vessenes on 5/5/19, 2:01 PM
A number of years ago a west coast startup spent quite a lot of time on a product that could identify books by their spine image which I think is what you want here; finding isbns and barcode scanning them is totally impractical at this scale.
A few months before they closed up shop, I introduced them to Brewster kahle at the internet archive and convinced them to leave a copy of their database with the internet archive.
I have no idea what happened after that, but I believe they did send the data over. Machine learning is vastly different today than when they launched, and even back then they had enough data that they could get 10/12 of my books in a single photo; I really encourage you to get in touch with the archive.
The company was called bitlit then Shelfie.
As a side note I got interested because I thought it would be great to get a spine image as an api for rendering my ebooks as a library in vr/ar - I still think this would be cool.
by crispyambulance on 5/5/19, 11:53 AM
> the barrier is allowing customers to find what they are looking for.
I am really glad that there are people like the old man who are willing to do stuff like this and people like yourself who are willing to help.The real barrier, I think, is a bit more complicated than just being able to find stuff. It is also the fact he will be running out of space and that as more and more people find what they want the undesirable stuff (that no one wants) will just keep growing. There does need to be regular culling, I think, to keep weeding out the duplicates or books that no one wants. Also, there needs to be some effort to discover and sell the really valuable books which could produce occasional windfall funds to keep the endeavor going.
"The Book Thing" in Baltimore (https://bookthing.org/) which I have visited many times seems to be tackling this problem. It's basically a "free book" exchange. Massive. In a warehouse. It is a fairly popular place and is run by an interesting eccentric fellow with very particular ideals. I would recommend see how they do this stuff.
As for thoughts, I think that regardless of what he does, he will need one or more employees (or dedicated volunteers) to actually perform the indexing and physically organize the books.
by profsnuggles on 5/5/19, 11:43 AM
Cataloging a large number of books is not going to be an easy process unless they are all relatively new popular books. According to librarything my library is 439 books, every few years I delete my catalog and re-import them it takes about a full weekend. Older books don't have barcodes, old paperbacks have the ISBN barcode on the inside cover. Some books don't have ISBN numbers or Library of Congress numbers. So you will still end up doing a fair amount of manual entry and searching.
by flurdy on 5/5/19, 10:50 AM
I worked at a small startup in the early 2000s that somehow got massive contract to digitalise a Middle Eastern Oil & Gas company's very very extensive documentation library. We had an e-learning product where you could use a scanner to digitalise a printed book into online documentation.
Demos of scanning a book or two was really impressive. So surely scanning more than a million books/manuals/charts will be just as easy. Not quite.
Think we calculated it would take years as the bottleneck is the manual unbinding and re-binding before and after scanning. Scale that to a million and it was not the 2 months project initially forecasted. Buying more scanners and hiring more local staff scaled that part horizontally and improved the speed but still a long project.
However the client "forgot" to pay us for a few months, the bank and our accountants forgot to check and we went bankrupt soon before we really got started. Though at least I got a trip to the Middle East for a few weeks.
by Freak_NL on 5/5/19, 8:50 AM
If a book has an ISBN, often Zotero will manage to find it using the magic lookup button. Just enter the ISBN (DOI's work too!) and it will usually find the book you meant. That covers about 90% of books with an ISBN.
The rest would have to be entered manually.
Zotero is not a full-blown inventory manager, but it may suit your needs.
by 1k on 5/5/19, 12:43 PM
Some scenarios:
1. Generally lookup should return something. Store these book by categories, e.g. business, children, fiction, etc. in their shelves/containers for physical browsing by your customers. The more subcategories you can do the better.
2. If price is bigger than some threshold then store these books privately and list for sale directly in an online marketplace. There’s an industry around book scalping (forgot the actual term) where traders buy books from fairs and sell online based solely on margin.
3. The lookup returns nothing - these books are probably very valuable or worthless. Some manual action required.
I was actually considering doing something like this for remainders before, but never got it going. I’d love to know more about your eventual solution.
by nsomaru on 5/5/19, 8:43 AM
The API is severely rate-limited (1rps), non-standard oath and badly documented, but you should be able to get some xml out of it and parse that however you'd like.
by emmanueloga_ on 5/5/19, 8:16 PM
May be worth for your neighbor to check that sort of thing too. Apparently there are dogs that are trained to sniff bedbugs... those furry guys can sniff anything :-) [1]
1: https://www.nytimes.com/2012/12/06/garden/bedbugs-hitch-a-ri...
by bloak on 5/3/19, 12:11 PM
by MayeulC on 5/5/19, 11:01 AM
Otherwise, as stated elsewhere in this thread, Zotero can usually find books with very little information:ISBN or title. It might be worth trying to set up an OCR with it.
In any case, if you go to the length of taking a picture for each book, you might as well save them and make the dataset public, for OCR training purposes (and a second pass). There is also the mechanical Turk option if you go this way.
And as someone stated already, you should plan the physical layout in advance.
by cconcepts on 5/5/19, 9:16 PM
Again, thanks HN for the overwhelming response.
by ghr on 5/5/19, 10:36 AM
by mikepurvis on 5/5/19, 1:09 PM
I'm not sure what options there are for hardware integrations, but Abe provides at least online inventory and ordering capabilities. I assume if you had a barcode scanner capable of acting as a USB keyboard and entering ISBNs, it would go pretty quickly.
by rdl on 5/5/19, 11:04 AM
by good-idea on 5/5/19, 6:03 PM
by 8_hours_ago on 5/5/19, 11:23 AM
by achenatx on 5/5/19, 12:13 PM
https://www.collectorz.com/book/isbn_database.php
https://bookriot.com/2016/01/14/8-reasons-catalog-books/
https://www.goodreads.com/blog/show/913-goodreads-hack-scan-...
by GnarfGnarf on 5/5/19, 2:52 PM
Sort the boxes by height, line them up in a row. Put slats of wood between the rows to distribute and stabilize the load.
I wrote a program to automatically generate simple HTML files to display the images. See sample:
Use OCR to digitize title & author..
by callmeal on 5/5/19, 7:18 PM
Was a simple process of having enough boxes and labels, and I did that anytime I had some free time. Scan a bunch of books, drop them in a box, slap a label on the box, wait for booxter to find and fetch the metadata, update the label in booxter and repeat.
Will take time, but is easily doable.
by 52-6F-62 on 5/5/19, 5:09 PM
That story also reminds me of this fellow (who actually might get me to the middle of nowhere SK): https://www.macleans.ca/news/canada/canadas-most-inconvenien...
(Edited to add Apple News links without ads if anyone uses it:
Free version: https://apple.news/Ar3trUQ-cR7C-c9L3YzPYjg
Issue version: https://apple.news/AQD4nDgB4SKi6yTcZy60tPw)
There seems to be a ton of relevant help in this thread and that seems exciting.
Like someone pointed out—something like OCR might be a best first step as it seems like a data entry task at a glance.
It does sound like there may be a significant amount of physical, tedious work involved no matter what software solution you find. Sometimes you have to accept that aspect and push through. Your best bet might be to recruit some physical help there—start a fund or a labour drive or something. Recruit book lovers, etc. Seems worthwhile. Maybe he would donate books to helpers.
by Adamantcheese on 5/5/19, 9:26 AM
That being said, if you do want to do image comparison for covers, books without covers usually have a copyright page with most of the info on it. Use that to determine what a book is when the other method fails. Throwing together some cheap bookshelves with plywood and 2x4's will greatly help with the finding part, but while scanning use some big bins to do a rough sort.
And I can't stress enough you HAVE to throw out books. It's clear that there's a space issue and if he's willing to get them for free but has a hard time getting rid of them, that's hoarder behavior, not just eccentricity.
by tsjq on 5/5/19, 12:38 PM
once you've started this sorting / cataloging work: request visitors not to reshelf the books. have a central location (table / bins) for them to put the books, so the volunteers / barn-man can keep back in the right shelves.
also, what exactly is the barn-man's objective? just collect books and not bother about further? or, be the most helpful to book-lovers? or, make good money from these books ?
by teddyr009 on 5/5/19, 8:47 AM
by cik on 5/5/19, 6:35 PM
If you have a Mac, get a copy of Delicious Library (https://www.delicious-monster.com/) and a compatible barcode scanner, like the Flic.
If you have an Android phone, and you're happy with dealing with your phone and CSV export, you'll probably be okay with Libib (https://play.google.com/store/apps/details?id=com.libib.app).
The biggest issue is that there are tonnes of books (especially if like me you have older ones) that predate ISBN. That kinda sucks - but it's life.
by bartimus on 5/5/19, 5:37 PM
by jccalhoun on 5/5/19, 2:14 PM
A couple times a year the local Half Price Books Outlet does a "fill a bag for $20" event and every time there are at least a couple people there with shopping carts full of bags of books.
They have dedicated bar code scanners attached to their phones and will scan books at around 1 a second. I don't know what software they are using but clearly they are looking up prices to see what they can get to sell for a profit.
I use goodreads to keep track of my own book collection and using the camera and the goodreads app usually takes 30 seconds plus to focus on the bar code and then to look it up. So whatever they are using is much faster than that.
by thaumaturgy on 5/5/19, 8:55 AM
If they really do need to be cataloged, then the next thing is to forget all about trying to inventory the entire thing. Instead, you're going to partition the collection into "easy to catalog" and "hard to catalog": pick a section of the barn and make this the organized area. Get a barcode scanner (https://www.newegg.com/Barcode-Scanner/SubCategory/ID-583) and throw together a quick API client that'll take an ISBN and display a title, author, edition, and picture. If it comes up correct, great: book goes into the cataloged section. If it doesn't, it goes somewhere else. Make it really simple, so that a single keystroke can accept that book into inventory.
Grocery stores have to regularly inventory everything on the shelves. I worked for an outfit once that wanted to do it all in-house, so we bought the commercial Telxon handheld wireless devices and I set about figuring out their software. Turned out that they just wanted to speak basic telnet to a server at a pre-configured IP address, so I put together a sloppy little telnet server interface and staff were able to count the entire store right on the devices in a few hours. That's way more complicated than what you'll need to do, so, y'know, your thing is doable. You'll have the added benefit of free online book databases and better hardware and easier-to-hack-together software.
Also might not be a bad idea to talk to your local librarian. They're book nerds too and he or she might have an actual library science degree. This would be right up their alley.
by westondeboer on 5/5/19, 7:17 PM
I am grading them by reading level A-Z. Currently I am googling the book and then adding "reading level" to the end and then if it has it, it will show up, or I can find the Lexile number and use that as a grade also. I am using the speech to text command in google, so it doesn't take that much time.
This is a hassle and am looking into other ways to speed up this process. And or get other parents to volunteer if it was an easier process.
by sandreas on 5/5/19, 2:04 PM
Perhaps there are users with experience...
by influx on 5/5/19, 12:30 PM
There’s apps which allow you to scan UPC codes and look up a price on Amazon. I’d personally sort the books by market value. Sell the books that are profitable, trash the ones which are not, save the ones which are very rare or have no UPC code, and use the money to grow the storage space.
by niedzielski on 5/5/19, 2:33 PM
by xiconfjs on 5/5/19, 1:12 PM
by qubex on 5/5/19, 5:47 PM
by ryanmarsh on 5/5/19, 5:05 PM
1. In order to get a good scan (back then) we had to lay each page flat against a piece of glass (no matter the orientation). This tended to damage or destroy the binding by the time scanning was complete.
2. An average of ten seconds per scan (from page flip to page flip) is blazing fast (including rescans). For a 200 page book this is 33 minutes. To scan a library of 200 books at this rate requires 3.2 man years of work (normal 40 hour work week + holidays).
One way to speed this process drastically is to use a bulk scanner. This requires slicing the binding off the book and feeding in the book as a stack of pages, scanning the cover separately. Obviously this completely destroys the book.
Good luck.
by juskrey on 5/5/19, 12:07 PM
by Floralegeium on 5/11/19, 2:34 AM
While at this meeting they were displaying a book scanner that you could place in the machine, it would flip each page then take a high resolution photo, and had options for OCR software wihich would read the entire page and present any questionable words or characters the OCR could not identify. This machine and software was pitched to Museums and large libraries. I would highly suggest asking a local Museum or Library if they have any hardware that would be able to archive the books your describing.
I tried searching for the exact machine but I could not locate it, I want to say Canon was the vendor.
I wish success with your en devour.
by thecupisblue on 5/5/19, 11:38 AM
It also shouldn't be hard to up the speed by taking pictures of a stack of books - if you take an image of a stack of books and crop it book by book, training models to recognise books shouldn't be that hard but you could also use a CV solution (firebase, amazon, azure again) and then from the books it found in the stack ping the API for each one. This could probably be the fastest way if you can take a panorama and have it search from that.
Anyways, if you do it - try to get the price, ISBN and editions from the results.
by jcelerier on 5/5/19, 9:55 AM
by swayvil on 5/5/19, 6:00 PM
(This isn't my idea. Either Rudy Rucker, Vernor Vinge or Cory Doctorow thought it. I forget exactly who.)
by 80mph on 5/5/19, 9:11 AM
by mongol on 5/5/19, 8:20 AM
by zimpenfish on 5/5/19, 12:53 PM
(Although I suspect the range of book covers is somewhat larger than the range of wine labels...)
by Grustaf on 5/5/19, 12:17 PM
by jonsen on 5/5/19, 5:08 PM
I once entered used-book store looking for and old math book. Noone there except a grumpy looking man with a wild hair and a big beard sitting at a desk in the far corner.
I start browsing the shelves.
“WHAT ARE YOU LOOKING FOR?”
“Um, eh, Play with Eternity.”
“WE DON’T HAVE IT.”
by anoncow on 5/5/19, 8:18 AM
by GBiT on 5/5/19, 8:43 PM
by bshep on 5/5/19, 5:48 PM
by szafranek on 5/5/19, 6:22 PM
Yes, some older books, especially in languages other than English, are not in its database, leaving you with the manual option, but it will let you index the books that are there in no time.
by andylynch on 5/5/19, 8:33 AM
by georgespencer on 5/5/19, 10:03 AM
by krekligit on 5/5/19, 3:25 PM
by giarc on 5/5/19, 6:07 PM
by BigBalli on 5/10/19, 4:10 PM
by dredmorbius on 5/5/19, 1:26 PM
Numerous queryable catalogues of book and other matrials exist, with Worldcat arguably the most developed of those:
https://www.oclc.org/developer/develop/web-services/classify...
The US Library of Congress also has a huge (if intimidating) amount of information available.
http://fortune.com/2017/05/17/library-of-congress-free-recor...
Figuring out what you hope to accomplish, how, with what resurces (people, software, equipment, space, etc.), in what timeframe, and with what throughput (how fast are materials arriving and leaving, whatis the current backlog) are all considerations. And what end this will serve; book sales, in-person or online, and what is sufficient to that end is also significant.
by __initbrian__ on 5/5/19, 3:58 PM
by rajkpal on 5/5/19, 11:29 AM
though this is to scan whole books in case there are rare ones..
by ejdanderson on 5/5/19, 8:50 AM
by anonu on 5/5/19, 5:12 PM
Then you'd have to run some algo to match the name with the isbn and tag it with the general location.
Once you get the process down you could run a new video every few weeks.
This is kind of like Google Street view for the book barn.
Am I dreaming? Could this work on practice.?
by secfirstmd on 5/5/19, 10:31 AM
by emmelaich on 5/5/19, 1:50 PM
At a first parse, just take a picture of a whole bunch of books spines. Some will be ok, some not.
by elcomet on 5/5/19, 4:01 PM
by paulcarroty on 5/5/19, 11:41 AM
by sandwall on 5/5/19, 1:22 PM
2nd, barcode scanning would certainly be the most effective method.
by kmfrk on 5/5/19, 4:19 PM
by FZ_BA on 5/6/19, 7:14 AM
by thegabriele on 5/5/19, 1:09 PM
by oflebbe on 5/5/19, 11:59 AM
by viraptor on 5/5/19, 8:20 AM
by libguy on 5/5/19, 5:40 PM
I’d like to have a look. Thanks!
by Theodores on 5/5/19, 12:45 PM
What Hay on Wye is known for is a literary festival. So there was a pivot to this a few decades ago that has worked.
This is the guy that started it:
https://en.wikipedia.org/wiki/Richard_Booth
Note the way it started, buying library stock from America that was available due to libraries closing.
Some of the Hay book shops are really good, there is a former cinema that you could get lost for hours in. Some other book shops are more like 'extras'. They might have books in cases on the pavement with prices being pennies. This stuff could be fairly pulped, but, collectively it gives the whole town this aura of literature that is way beyond what the local sheep farmers necessarily go in for.
Now if a tourist visits Hay for the festival then they spend £££ on books but they also spend a lot more on cups of tea, admission fees to see performances, accommodation and whatever else. A given tourist might spend pennies on books but in so doing spend many pounds in the town. They might not even read the books purchased, they might become more souvenir value, and far from generic souvenirs.
The reputation from the festival is enough to bring a respectable amount of tourists to Hay throughout the rest of the year.
It also works with a sponsor, normally the Hay festival works with people who have a vested interest in it being successful, so you get a lot of coverage on BBC's Radio 4.
Hay also has splendid scenery going for it as well as it being in Wales, proper. There are towns nearby that are just as pretty with similar scenic backdrops but nobody remembers the names of those places. The books thing - which is the effort and inspiration of one man - has put the place on the map, literally.
So, rather than the high tech solution, maybe the preinternet solution has some pointers. Get some local store fronts that are closed premises to become book shops. Segment the collection so that some shops are more specialist than others. Have some shops in less prime locations so that collectively there is the same thing going on as with Hay on Wye. Create a fake literary hub and then make it into a real literary hub by putting on the ten day festival.
If you get the council and local businesses in on the act then you might be able to get the whole thing started. Build it and they will come works for Hay even though it is the middle of nowhere with just sheep for local population.
Trying to shift the product online for a pittance is no fun at all, the festival and tourist location thing could be much more exciting. Try and twin the town with Hay to get started...