from Hacker News

Ask HN: How can I automatically scan and catalog a mountain of books?

by cconcepts on 5/3/19, 10:56 AM with 135 comments

This really kind, eccentric guy in my neighbourhood is stockpiling books and has been doing so for years. He has an enourmous barn that he is obsessively filling with whatever reasonable quality books he can get his hands on but he is completely overwhelmed in terms of cataloging/indexing them so customers have to go through his barn sifting through cartons full of books. He charges $1 or $2 for whatever book you dig out.

He buys bulk lots from deceased estates and bookstores that are closing down. Entire shipping containers are being gifted to him and showing up at his barn. The barn is full and he is now storing in shipping containers outside.

There is great quality books among this quagmire but it takes hours of searching to find them. I figured HN might be able to point me to a solution where I could quickly photograph the front cover and have a script/google images compare the image to online info to index the title and author and then perhaps list them online...

I dunno, it just seems like such a treasure trove of books that he will sell for practically nothing because he loves books and hopes that they will find their way to people who want them - the barrier is allowing customers to find what they are looking for.

Thoughts?

by walterbell on 5/5/19, 2:57 PM
Please, please tag each batch of books with a unique lot number, so they can be associated with a specific estate or deceased bookstore. One or more humans spent a lot of time curating those collections. If one of the lots was well curated, then anyone who finds a book that they like will want to see the other books in that lot.
Source: people who have spent years trying to find the names of the 8,000 books in R.A. Lafferty's personal library, lost after he died. About 300 title names have been recovered, https://www.ralafferty.org/tulsa-books/
by pjc50 on 5/5/19, 9:06 AM
One of these days I need to write my essay titled "Rubbish has no SKU".
I've seen a few of these, and the basic minimum difference between "pulp waiting to happen" and "bookshop" is basic shelving. Different shelves by category: fiction vs non-fiction and their subdivisions. Within the shelves, alphabetise. Now it's possible for browsers to actually find things. When you put them on Amazon this will also help find them for shipping.
This process will also help you find the stacks of duplicates. You'll have a crate of 50 Shades and Twilight and Stephen King. The Stephen King will eventually resell; the others won't.
This page from the excellent Barter Books on their acceptance policy may be of some help: https://www.barterbooks.co.uk/html/About%20Us/Incoming%20Boo...
by vessenes on 5/5/19, 2:01 PM
Oh, I am very pleased to see this request, and I may have some actual help for you.
A number of years ago a west coast startup spent quite a lot of time on a product that could identify books by their spine image which I think is what you want here; finding isbns and barcode scanning them is totally impractical at this scale.
A few months before they closed up shop, I introduced them to Brewster kahle at the internet archive and convinced them to leave a copy of their database with the internet archive.
I have no idea what happened after that, but I believe they did send the data over. Machine learning is vastly different today than when they launched, and even back then they had enough data that they could get 10/12 of my books in a single photo; I really encourage you to get in touch with the archive.
The company was called bitlit then Shelfie.
As a side note I got interested because I thought it would be great to get a spine image as an api for rendering my ebooks as a library in vr/ar - I still think this would be cool.
by crispyambulance on 5/5/19, 11:53 AM
```
    >  the barrier is allowing customers to find what they are looking for.
```
I am really glad that there are people like the old man who are willing to do stuff like this and people like yourself who are willing to help.
The real barrier, I think, is a bit more complicated than just being able to find stuff. It is also the fact he will be running out of space and that as more and more people find what they want the undesirable stuff (that no one wants) will just keep growing. There does need to be regular culling, I think, to keep weeding out the duplicates or books that no one wants. Also, there needs to be some effort to discover and sell the really valuable books which could produce occasional windfall funds to keep the endeavor going.
"The Book Thing" in Baltimore (https://bookthing.org/) which I have visited many times seems to be tackling this problem. It's basically a "free book" exchange. Massive. In a warehouse. It is a fairly popular place and is run by an interesting eccentric fellow with very particular ideals. I would recommend see how they do this stuff.
As for thoughts, I think that regardless of what he does, he will need one or more employees (or dedicated volunteers) to actually perform the indexing and physically organize the books.
by profsnuggles on 5/5/19, 11:43 AM
I would check out https://www.librarything.com/ also. They have a decent app for scanning barcodes and retrieve data from multiple sources. Their own database which consists of lots of imported marc records from university libraries I believe, library of congress, amazon, etc. Then they have another project librarycat where you can set up your books as a lending library.
Cataloging a large number of books is not going to be an easy process unless they are all relatively new popular books. According to librarything my library is 439 books, every few years I delete my catalog and re-import them it takes about a full weekend. Older books don't have barcodes, old paperbacks have the ISBN barcode on the inside cover. Some books don't have ISBN numbers or Library of Congress numbers. So you will still end up doing a fair amount of manual entry and searching.
by flurdy on 5/5/19, 10:50 AM
It may very expensive in time and resources to scan it all even if it is just the covers. You need to work out how long it takes to fetch a book from the barn or container, flatten/unbind if necessary, scan the cover, rebind, and put back. Then multiply by how many books...
I worked at a small startup in the early 2000s that somehow got massive contract to digitalise a Middle Eastern Oil & Gas company's very very extensive documentation library. We had an e-learning product where you could use a scanner to digitalise a printed book into online documentation.
Demos of scanning a book or two was really impressive. So surely scanning more than a million books/manuals/charts will be just as easy. Not quite.
Think we calculated it would take years as the bottleneck is the manual unbinding and re-binding before and after scanning. Scale that to a million and it was not the 2 months project initially forecasted. Buying more scanners and hiring more local staff scaled that part horizontally and improved the speed but still a long project.
However the client "forgot" to pay us for a few months, the bank and our accountants forgot to check and we went bankrupt soon before we really got started. Though at least I got a trip to the Middle East for a few weeks.
by Freak_NL on 5/5/19, 8:50 AM
Zotero (the free software reference manager) hooks into a bunch of online catalogues. You can use Zotero to manage books (I manage my own collection with it, but that's just a small personal home library of around 1000 books).
If a book has an ISBN, often Zotero will manage to find it using the magic lookup button. Just enter the ISBN (DOI's work too!) and it will usually find the book you meant. That covers about 90% of books with an ISBN.
The rest would have to be entered manually.
Zotero is not a full-blown inventory manager, but it may suit your needs.
by 1k on 5/5/19, 12:43 PM
Get a barcode scanner, scan the ISBN and use that to do a API query on Amazon to retrieve title, author, category, price, etc. Store this info in a DB.
Some scenarios:
1. Generally lookup should return something. Store these book by categories, e.g. business, children, fiction, etc. in their shelves/containers for physical browsing by your customers. The more subcategories you can do the better.
2. If price is bigger than some threshold then store these books privately and list for sale directly in an online marketplace. There’s an industry around book scalping (forgot the actual term) where traders buy books from fairs and sell online based solely on margin.
3. The lookup returns nothing - these books are probably very valuable or worthless. Some manual action required.
I was actually considering doing something like this for remainders before, but never got it going. I’d love to know more about your eventual solution.
by nsomaru on 5/5/19, 8:43 AM
Goodreads has a scanner in their app (on iOS/Android) that can scan covers although for some reason it automagically adds those books into a "to-read" shelf but I guess this isn't a problem for you if you create an account for the purpose.
The API is severely rate-limited (1rps), non-standard oath and badly documented, but you should be able to get some xml out of it and parse that however you'd like.
by emmanueloga_ on 5/5/19, 8:16 PM
A barn full of old books sounds like the perfect breeding ground for all sorts of bugs... I had an acquaintance that had a problem with bed bugs that apparently started when she got books off of those "Take one book" boxes people put on the front of their houses.
May be worth for your neighbor to check that sort of thing too. Apparently there are dogs that are trained to sniff bedbugs... those furry guys can sniff anything :-) [1]
1: https://www.nytimes.com/2012/12/06/garden/bedbugs-hitch-a-ri...
by bloak on 5/3/19, 12:11 PM
If they are recent books (from about 1980) then they probably have a barcode on the back cover, so use that. My guess is that it won't be worth trying to automatically recognise older books from the cover: a lot of them had a dust jacket, that goes missing, and a cover under the dust jacket that is not at all distinctive. The title might be on the spine, but how many online images show the spine clearly?
by MayeulC on 5/5/19, 11:01 AM
This sounds like a use-case inventaire.io ought to support. I'll try to ask them about it. They use wikidata for filling up book metadata.
Otherwise, as stated elsewhere in this thread, Zotero can usually find books with very little information:ISBN or title. It might be worth trying to set up an OCR with it.
In any case, if you go to the length of taking a picture for each book, you might as well save them and make the dataset public, for OCR training purposes (and a second pass). There is also the mechanical Turk option if you go this way.
And as someone stated already, you should plan the physical layout in advance.
by cconcepts on 5/5/19, 9:16 PM
Wow, judging by the response this is a problem a lot of people think about. Am overwhelmed by the helpful info. Obviously have to start at the low hanging fruit as I am working with non-technical people and am relatively non-technical myself. I just tested LibraryThing and it seems very fast and accurate so will give it a whirl.
Again, thanks HN for the overwhelming response.
by ghr on 5/5/19, 10:36 AM
https://www.reddit.com/r/DataHoarder/ and https://www.reddit.com/r/datacurator/ are good resources for this kind of thing.
by mikepurvis on 5/5/19, 1:09 PM
Surprised to see no mention of AbeBooks yet. We have an indie bookstore in Waterloo which is integrated with them and it seems to work pretty well. He tells me he still does most of his business in the IRL shop, but there's a steady stream of people buying online as well. Plus, it's nice for him to be able to quickly check how many of something he already has before committing to buying a bunch more of them. See: https://www.abebooks.com/old-goat-books-waterloo-on-canada/1...
I'm not sure what options there are for hardware integrations, but Abe provides at least online inventory and ordering capabilities. I assume if you had a barcode scanner capable of acting as a USB keyboard and entering ISBNs, it would go pretty quickly.
by rdl on 5/5/19, 11:04 AM
My plan for books is to pull the rare/valuable ones, then subscribe for the $100/mo 100 book/mo plan at http://1dollarscan.com/ and send them all the rest, produce PDFs, and pulp the books. I have maybe 3000 books in storage and this would be preferable to anything else I've found, as I ultimately would rather consume them electronically.
by good-idea on 5/5/19, 6:03 PM
A lot of people are suggesting querying Amazon for ISBN data - another option is the ISBNdb API: https://isbndb.com/ There's also the OpenLibrary API (from the Internet Archive) which may include some more info https://openlibrary.org/
by 8_hours_ago on 5/5/19, 11:23 AM
Don’t forget about the Dewey decimal system. For the books with ISBNs, you can sort them into boxes by their Dewey decimal. If you don’t have time manually categorize the books without ISBNs, they can be put into “other” boxes and left unsorted
by achenatx on 5/5/19, 12:13 PM
For books with an ISBN. Some of these can scan the cover
https://www.collectorz.com/book/isbn_database.php
https://bookriot.com/2016/01/14/8-reasons-catalog-books/
https://www.goodreads.com/blog/show/913-goodreads-hack-scan-...
by GnarfGnarf on 5/5/19, 2:52 PM
Photograph the books, a dozen at a time. Put a box number label next to the books. Put the books in their box, glue the label to the box. Stack the boxes.
Sort the boxes by height, line them up in a row. Put slats of wood between the rows to distribute and stabilize the load.
I wrote a program to automatically generate simple HTML files to display the images. See sample:
http://kyber.ca/b/index.html
Use OCR to digitize title & author..
by callmeal on 5/5/19, 7:18 PM
When I did this for my (admittedly medium sized) collection, I used Booxter (https://www.deepprose.com/) and a cuecat scanner to catalog all those books.
Was a simple process of having enough boxes and labels, and I did that anytime I had some free time. Scan a bunch of books, drop them in a box, slap a label on the box, wait for booxter to find and fetch the metadata, update the label in booxter and repeat.
Will take time, but is easily doable.
by 52-6F-62 on 5/5/19, 5:09 PM
Where is that? I really would like to pay a visit...
That story also reminds me of this fellow (who actually might get me to the middle of nowhere SK): https://www.macleans.ca/news/canada/canadas-most-inconvenien...
(Edited to add Apple News links without ads if anyone uses it:
Free version: https://apple.news/Ar3trUQ-cR7C-c9L3YzPYjg
Issue version: https://apple.news/AQD4nDgB4SKi6yTcZy60tPw)
There seems to be a ton of relevant help in this thread and that seems exciting.
Like someone pointed out—something like OCR might be a best first step as it seems like a data entry task at a glance.
It does sound like there may be a significant amount of physical, tedious work involved no matter what software solution you find. Sometimes you have to accept that aspect and push through. Your best bet might be to recruit some physical help there—start a fund or a labour drive or something. Recruit book lovers, etc. Seems worthwhile. Maybe he would donate books to helpers.
by Adamantcheese on 5/5/19, 9:26 AM
Use one of the solutions listed below, but you HAVE to do sorting on the fly. You need to have places to put books and sort them by some general genres and you HAVE to throw out books that aren't worth the time due to damage or any other reason a book would be deemed a recyclable. With that many books, a proper library style cataloging system may be your best bet.
That being said, if you do want to do image comparison for covers, books without covers usually have a copyright page with most of the info on it. Use that to determine what a book is when the other method fails. Throwing together some cheap bookshelves with plywood and 2x4's will greatly help with the finding part, but while scanning use some big bins to do a rough sort.
And I can't stress enough you HAVE to throw out books. It's clear that there's a space issue and if he's willing to get them for free but has a hard time getting rid of them, that's hoarder behavior, not just eccentricity.
by tsjq on 5/5/19, 12:38 PM
that'll need a bunch of volunteers / friends to help with this work. also, check this podcast episode. might get some info / contacts https://www.npr.org/sections/money/2014/11/10/363103753/text...
once you've started this sorting / cataloging work: request visitors not to reshelf the books. have a central location (table / bins) for them to put the books, so the volunteers / barn-man can keep back in the right shelves.
also, what exactly is the barn-man's objective? just collect books and not bother about further? or, be the most helpful to book-lovers? or, make good money from these books ?
by teddyr009 on 5/5/19, 8:47 AM
simple approach would be to ask book lovers around the locality to volunteer with this task. Borrow some barcode scanners and computers. Give'em whatever books they like and it's kinda get-together for bibliophiles.
by cik on 5/5/19, 6:35 PM
Been there... since my library is now a little over a thousand physical books, and in multiple languages.
If you have a Mac, get a copy of Delicious Library (https://www.delicious-monster.com/) and a compatible barcode scanner, like the Flic.
If you have an Android phone, and you're happy with dealing with your phone and CSV export, you'll probably be okay with Libib (https://play.google.com/store/apps/details?id=com.libib.app).
The biggest issue is that there are tonnes of books (especially if like me you have older ones) that predate ISBN. That kinda sucks - but it's life.
by bartimus on 5/5/19, 5:37 PM
Sell 5 random surprise books for $15 (ex shipping). Include a box to send back any books they don't want (or any other book). Process returned books (take pictures, put barcode, register title+author). For the next order give $3 discount for every book they sent back previously.
by jccalhoun on 5/5/19, 2:14 PM
It can definitely be done but I don't know the details.
A couple times a year the local Half Price Books Outlet does a "fill a bag for $20" event and every time there are at least a couple people there with shopping carts full of bags of books.
They have dedicated bar code scanners attached to their phones and will scan books at around 1 a second. I don't know what software they are using but clearly they are looking up prices to see what they can get to sell for a profit.
I use goodreads to keep track of my own book collection and using the camera and the goodreads app usually takes 30 seconds plus to focus on the bar code and then to look it up. So whatever they are using is much faster than that.
by thaumaturgy on 5/5/19, 8:55 AM
First, is this really a problem that needs to be solved? Personally, his place sounds like my favorite kind of book shop. A lot of bookworms prefer wandering through dense forests of precariously-balanced piles of books. Is he getting those people, or is he getting people that are expecting Barnes & Noble?
If they really do need to be cataloged, then the next thing is to forget all about trying to inventory the entire thing. Instead, you're going to partition the collection into "easy to catalog" and "hard to catalog": pick a section of the barn and make this the organized area. Get a barcode scanner (https://www.newegg.com/Barcode-Scanner/SubCategory/ID-583) and throw together a quick API client that'll take an ISBN and display a title, author, edition, and picture. If it comes up correct, great: book goes into the cataloged section. If it doesn't, it goes somewhere else. Make it really simple, so that a single keystroke can accept that book into inventory.
Grocery stores have to regularly inventory everything on the shelves. I worked for an outfit once that wanted to do it all in-house, so we bought the commercial Telxon handheld wireless devices and I set about figuring out their software. Turned out that they just wanted to speak basic telnet to a server at a pre-configured IP address, so I put together a sloppy little telnet server interface and staff were able to count the entire store right on the devices in a few hours. That's way more complicated than what you'll need to do, so, y'know, your thing is doable. You'll have the added benefit of free online book databases and better hardware and easier-to-hack-together software.
Also might not be a bad idea to talk to your local librarian. They're book nerds too and he or she might have an actual library science degree. This would be right up their alley.
by westondeboer on 5/5/19, 7:17 PM
I am also grading books in my kids library. They don't have a librarian and they have a stack of 1,000 books that have been donated.
I am grading them by reading level A-Z. Currently I am googling the book and then adding "reading level" to the end and then if it has it, it will show up, or I can find the Lexile number and use that as a grade also. I am using the speech to text command in google, so it doesn't take that much time.
This is a hassle and am looking into other ways to speed up this process. And or get other parents to volunteer if it was an easier process.
by sandreas on 5/5/19, 2:04 PM
You also ask at the forum of
http://diybookscanner.org/
Perhaps there are users with experience...
by influx on 5/5/19, 12:30 PM
There’s a whole universe of folks selling used books on Amazon FBA. I suggest start with a google search of exactly that.
There’s apps which allow you to scan UPC codes and look up a price on Amazon. I’d personally sort the books by market value. Sell the books that are profitable, trash the ones which are not, save the ones which are very rare or have no UPC code, and use the money to grow the storage space.
by niedzielski on 5/5/19, 2:33 PM
I have a related problem on a much smaller scale (only a few hundred books) in that I wish to make full digital copies of my books. I reached out to Archive.org but they can't use them due to copyrights. I'm looking into https://1dollarscan.com/ but it's a destructive scanning technique.
by xiconfjs on 5/5/19, 1:12 PM
While we are a bit on this topic: is there a alternative to calibre [1] for managing a shitload (50000+) of ebooks which is still performant? Specially ebooks which have no ISBN (PDF, whitepapers, etc) which only information about them is in their EXIF file data.
[1] https://calibre-ebook.com
by qubex on 5/5/19, 5:47 PM
I raised the same question several years ago on this very forum. I got some good answers but none that ultimately satisfied my needs, but they could be useful for you: https://news.ycombinator.com/item?id=9631362
by ryanmarsh on 5/5/19, 5:05 PM
I worked for a company that scanned and catalogued many books in the ‘00s. There’s two primary challenges to solve, nondestructive scanning and speed.
1. In order to get a good scan (back then) we had to lay each page flat against a piece of glass (no matter the orientation). This tended to damage or destroy the binding by the time scanning was complete.
2. An average of ten seconds per scan (from page flip to page flip) is blazing fast (including rescans). For a 200 page book this is 33 minutes. To scan a library of 200 books at this rate requires 3.2 man years of work (normal 40 hour work week + holidays).
One way to speed this process drastically is to use a bulk scanner. This requires slicing the binding off the book and feeding in the book as a stack of pages, scanning the cover separately. Obviously this completely destroys the book.
Good luck.
by juskrey on 5/5/19, 12:07 PM
This would be my MVP: I'd implement simple inventory app based on ISBN scanning and simply enumerated boxes with, say, 50 books each. Scan ISBN - put in the next empty box, take another box when full, and so on. Then based on title demand, I'll sort popular titles in their own boxes.
by Floralegeium on 5/11/19, 2:34 AM
I work with scanning documents for business purposes. I went to sales meeting with a company called Biels, which has now been bought and is called Instream.
While at this meeting they were displaying a book scanner that you could place in the machine, it would flip each page then take a high resolution photo, and had options for OCR software wihich would read the entire page and present any questionable words or characters the OCR could not identify. This machine and software was pitched to Museums and large libraries. I would highly suggest asking a local Museum or Library if they have any hardware that would be able to archive the books your describing.
I tried searching for the exact machine but I could not locate it, I want to say Canon was the vendor.
I wish success with your en devour.
by thecupisblue on 5/5/19, 11:38 AM
Use a OCR service such as Firebase ML Text kit or the Amazon's similar offering or something and take pictures cover by cover, ping an API - even amazon or ebay might do to see if it exists and price of the book on average.
It also shouldn't be hard to up the speed by taking pictures of a stack of books - if you take an image of a stack of books and crop it book by book, training models to recognise books shouldn't be that hard but you could also use a CV solution (firebase, amazon, azure again) and then from the books it found in the stack ping the API for each one. This could probably be the fastest way if you can take a panorama and have it search from that.
Anyways, if you do it - try to get the price, ISBN and editions from the results.
by jcelerier on 5/5/19, 9:55 AM
I used Tellico to scan my library, it can automatically lookup the books from Amazon with their ISBN if you have a barcode scanner (else you have to type them by hands...)
http://tellico-project.org/
by swayvil on 5/5/19, 6:00 PM
Dump them all in a shredder. Blow the shreds through a well-lit tunnel full of digital cameras. Assemble the books from the images. Now it's just a software problem.
(This isn't my idea. Either Rudy Rucker, Vernor Vinge or Cory Doctorow thought it. I forget exactly who.)
by 80mph on 5/5/19, 9:11 AM
LibraryThing has an app, or you can order a CueCat scanner. https://wiki.librarything.com/index.php/Adding_and_importing...
by mongol on 5/5/19, 8:20 AM
Google has a Books API. Look into that. There are smartphone apps that solve the problem of books that have barcodes. No matter what, this will be a huge task to complete. I scanned my small library (2-3 shelves) and was quite tired of it in the end.
by zimpenfish on 5/5/19, 12:53 PM
Surprised that no-one has produced a Vivino-alike scanner for books.
(Although I suspect the range of book covers is somewhat larger than the range of wine labels...)
https://www.vivino.com
by Grustaf on 5/5/19, 12:17 PM
My thought is that the entire point of an old fashioned second hand book shop is to be able to wander around and explore. IF everything is catalogued, his barn instantly turns into a very bleak version of Amazon.
by jonsen on 5/5/19, 5:08 PM
Tangential anecdote:
I once entered used-book store looking for and old math book. Noone there except a grumpy looking man with a wild hair and a big beard sitting at a desk in the far corner.
I start browsing the shelves.
“WHAT ARE YOU LOOKING FOR?”
“Um, eh, Play with Eternity.”
“WE DON’T HAVE IT.”
by anoncow on 5/5/19, 8:18 AM
Not sure if this will help - https://aws.amazon.com/rekognition/
by GBiT on 5/5/19, 8:43 PM
I have experience working with document storage business. You need to barcode and index all books. Barcode all locations and scan all books to that container they are. It will be like an excel table with 3 columns. Book name, barcode (ISBN or smth) and location barcode. If someone will look up from the catalog you will know the location. If they have ISBN it's possible to write a script to pull book info by it.
by bshep on 5/5/19, 5:48 PM
A while back i wrote an ISBN barcode scanner which would lookup the item on amazon and fond the price, the script would use your webcam as the source for the image, im sure you could adapt it to your needs, its very simple and has minimal error checking so beware.
https://github.com/bshep/ISBNbarcodescanner
by szafranek on 5/5/19, 6:22 PM
I'm surprised nobody mentioned https://www.libib.com/. It comes with a mobile app that has a barcode scanner with an option for manual entry.
Yes, some older books, especially in languages other than English, are not in its database, leaving you with the manual option, but it will let you index the books that are there in no time.
by andylynch on 5/5/19, 8:33 AM
For books that are not that old, you will often find the info you need on the copyright page - for US publications, the Library of Congress CIP info is there; see http://www.loc.gov/publish/cip/ . Other countries have similar programmes eg the British Library does the same.
by georgespencer on 5/5/19, 10:03 AM
You might try Delicious Library: https://delicious-monster.com
by krekligit on 5/5/19, 3:25 PM
Internet archives? https://www.atlasobscura.com/articles/marion-stokes-televisi...
by giarc on 5/5/19, 6:07 PM
My local library has a 24/7 return system. You simply put the books on a conveyor belt and it takes them in and scans them. I imagine it reads the RFID but you could get a similar system to scan the barcode, you just have to insert them back side up. Would be quicker than scanning with a handheld barcode scanner.
by BigBalli on 5/10/19, 4:10 PM
Please do let me know more regarding your pain points. I released http://mybooklist.club Obviously it already includes manual insert and barcode scanning but now i'm working to implement adding by image recognition (of the cover).
by dredmorbius on 5/5/19, 1:26 PM
This is a complex project, though also a well-developed space -- it's much of what libraries do.
Numerous queryable catalogues of book and other matrials exist, with Worldcat arguably the most developed of those:
https://www.oclc.org/developer/develop/web-services/classify...
The US Library of Congress also has a huge (if intimidating) amount of information available.
http://eresources.loc.gov
http://fortune.com/2017/05/17/library-of-congress-free-recor...
Figuring out what you hope to accomplish, how, with what resurces (people, software, equipment, space, etc.), in what timeframe, and with what throughput (how fast are materials arriving and leaving, whatis the current backlog) are all considerations. And what end this will serve; book sales, in-person or online, and what is sufficient to that end is also significant.
by __initbrian__ on 5/5/19, 3:58 PM
Maybe look into how public libraries get books back onto shelves https://www.bibliotheca.com/library-return-sorting/
by rajkpal on 5/5/19, 11:29 AM
http://www.k2.t.u-tokyo.ac.jp/vision/BFS-Auto/
though this is to scan whole books in case there are rare ones..
by ejdanderson on 5/5/19, 8:50 AM
The amazon seller app does exactly this. I think eBay might have it built in as well, but I’m fairly certain with amazon you can scan the barcode or cover of a book.
by anonu on 5/5/19, 5:12 PM
Ideally you'd just do a pass through with a high res video camera, generally make sure the book spines are facing out to capture their names, run it through some image filter to pickup the book names.
Then you'd have to run some algo to match the name with the isbn and tag it with the general location.
Once you get the process down you could run a new video every few weeks.
This is kind of like Google Street view for the book barn.
Am I dreaming? Could this work on practice.?
by secfirstmd on 5/5/19, 10:31 AM
The easiest way by far is to download Goodreads and use the barcode scanner in their app and their lists feature.
by emmelaich on 5/5/19, 1:50 PM
Google docs will ocr.
At a first parse, just take a picture of a whole bunch of books spines. Some will be ok, some not.
by elcomet on 5/5/19, 4:01 PM
You should ask on reddit on r/DataHoarder. They are the best for this kind of stuff.
by paulcarroty on 5/5/19, 11:41 AM
Maybe light hardware scanner will be helpful, something like workers use in warehouses.
by sandwall on 5/5/19, 1:22 PM
1st, where is this treasure trove?
2nd, barcode scanning would certainly be the most effective method.
by kmfrk on 5/5/19, 4:19 PM
Photograph the spines of the books on the shelves and OCR the titles and authors?
by FZ_BA on 5/6/19, 7:14 AM
You could scan the barcodes and use something like TELLICO to make a catalogue
by thegabriele on 5/5/19, 1:09 PM
Yep, charge each customer 1$ + at least 5 casual books indexed into a database
by oflebbe on 5/5/19, 11:59 AM
i would recommend to check with archive.org, if the books are already available online and if not if they are interested scanning or get the scans.
by viraptor on 5/5/19, 8:20 AM
Ask a professional. And by that I mean - get in touch with Jason Scott: https://twitter.com/textfiles
by libguy on 5/5/19, 5:40 PM
So where is this barn of books?
I’d like to have a look. Thanks!
by Theodores on 5/5/19, 12:45 PM
Hay On Wye is a town known for the book shops in Wales. Maybe this is the business model that you need to look into.
What Hay on Wye is known for is a literary festival. So there was a pivot to this a few decades ago that has worked.
This is the guy that started it:
https://en.wikipedia.org/wiki/Richard_Booth
Note the way it started, buying library stock from America that was available due to libraries closing.
Some of the Hay book shops are really good, there is a former cinema that you could get lost for hours in. Some other book shops are more like 'extras'. They might have books in cases on the pavement with prices being pennies. This stuff could be fairly pulped, but, collectively it gives the whole town this aura of literature that is way beyond what the local sheep farmers necessarily go in for.
Now if a tourist visits Hay for the festival then they spend £££ on books but they also spend a lot more on cups of tea, admission fees to see performances, accommodation and whatever else. A given tourist might spend pennies on books but in so doing spend many pounds in the town. They might not even read the books purchased, they might become more souvenir value, and far from generic souvenirs.
The reputation from the festival is enough to bring a respectable amount of tourists to Hay throughout the rest of the year.
It also works with a sponsor, normally the Hay festival works with people who have a vested interest in it being successful, so you get a lot of coverage on BBC's Radio 4.
Hay also has splendid scenery going for it as well as it being in Wales, proper. There are towns nearby that are just as pretty with similar scenic backdrops but nobody remembers the names of those places. The books thing - which is the effort and inspiration of one man - has put the place on the map, literally.
So, rather than the high tech solution, maybe the preinternet solution has some pointers. Get some local store fronts that are closed premises to become book shops. Segment the collection so that some shops are more specialist than others. Have some shops in less prime locations so that collectively there is the same thing going on as with Hay on Wye. Create a fake literary hub and then make it into a real literary hub by putting on the ten day festival.
If you get the council and local businesses in on the act then you might be able to get the whole thing started. Build it and they will come works for Hay even though it is the middle of nowhere with just sheep for local population.
Trying to shift the product online for a pittance is no fun at all, the festival and tourist location thing could be much more exciting. Try and twin the town with Hay to get started...