from Hacker News

Show HN: ReadToMe (iOS) turns paper books into audio

by kolchinski on 2/4/24, 11:56 PM with 47 comments

I'm launching something that started as a side project publicly today: ReadToMe, which is an iPhone app that turns paper books and other printed text into audio.

Originally this was a Christmas present for my fiancée, who loves books but has an eye problem that makes it hard for her to read more than a few pages at a time. She mostly listens to audiobooks while following along with the paper book, but some books aren't available in audiobook or even e-book form, and all of the existing apps we tried were surprisingly bad at scanning paper books into audio — they make lots of mistakes, include footnotes and page numbers, etc., in a way that really degrades the experience.

Being an AI-oriented engineer by training, I had a crack at solving the problem myself, and was pleasantly surprised at how well the proof of concept worked. I then had some time free while shutting down my previous company (Mezli, YC W21), during which I polished up the app to the point you see it at now.

The way it works:

On the front end, it's a SwiftUI app (mostly written by ChatGPT!) that consists mostly of a document scanner (VNDocumentCameraViewController) and a custom-built audio player.

The back end is more complex — book photos are first sent to an OCR API, then some custom code I wrote does a first pass at stitching together and correcting the results. Then, the corrected OCR results are sent to GPT-3.5-turbo for further post-processing and re-stitching together, and finally to a text-to-speech API for conversion to audio.

The hardest part of this process was actually getting the GPT calls right — I ended up writing a custom LLM eval framework for making sure the LLM wasn't making edits relative to the true text of the book.

A few issues remain, which I'll work on fixing if the app gets a significant amount of traction, including:

1) It can take multiple minutes to get audio back from a scan, especially if it's on the longer side (10+ pages). I'll be able to bring this down by spinning up dedicated servers for the OCR and TTS back-end.

2) The LLM sometimes does TOO good of a job at correcting "mistakes" in book text. This issue crops up particularly often when an author deliberately uses improper grammar, e.g. in dialogue.

The app is priced at $9.99/month for up to 250 pages/month right now, which I estimate will just about cover the costs of API calls. I'll be bringing the price point down as the pricing of the required AI APIs comes down. There's also a 3-day free trial if you want to try it out.

If you do find this useful, or know somebody who might, I'd appreciate you giving it a try or letting them know! And please let me know if you have any feedback, including issues or feature requests.

by spacemanspiff01 on 2/7/24, 1:22 AM
It seems to me that there are 3 independent issues.
1 scanning the books to text.
2 reading text to the user.
3 having a good interface.
Number 1 seems to be where you put the most effort, along with 3.
I guess at least for me, there are often digital copies of books, either in epub or Kindle. When that's available those should be used.
And if it is not available, wouldn't it make more sense to have document scanner to epub?
I guess I'm just thinking that it is relatively rare that you really need to document scanning in order to get an audio book. Since most of the cost seems to be from document scanner side, it seems worthwhile to split them up.
And also seems like it would make sense to think of these as 2 separate products. Specialized document scanning, and audio generation. I can definitely see uses for one without the other.
by LeoNatan25 on 2/7/24, 11:27 AM
“Scan up to 250 printed pages per month for $9.99/mo”
I’m sorry, but LOL. Not even a full book.
That has to be one of the most terrible business models. I guess it’s in line with most app subscription models these days, only much worse. And if the excuse is “well it costs me too much on Azure and the phone native APIs are not good enough”, perhaps the answer is “don’t do it then”. No thanks.
by broth on 2/6/24, 11:57 PM
Love this but I have concerns with the price. You can usually find an audiobook corresponding to a paper book for relatively cheap. Services like Audible are a little more per month but you get more audio books. Given the 250 page per month limit at $9.99, how will this compete?
by moritz64 on 2/7/24, 9:20 AM
Is there something like this for epubs or pdfs with a truly high-quality TTS?
All apps that I know of use iOS internal TTS (sounds awful, not as good as Siri). Then is also Voice Dream Reader and even with the paid premium voices it is still not pleasant to listen to. Siri-grade TTS or Elevenlabs would be pleasant enough, though.
by ummonk on 2/6/24, 11:47 PM
Were the onboard text recognition and speech synthesis APIs not good enough for this task?
by ssttoo on 2/7/24, 9:37 AM
Next step: turn the book into a 3D video.
I recently read an Isaac Asimov book where he was describing a device that takes a book and acts it out for you. Made me think we’re probably pretty close.
by closetkantian on 2/5/24, 8:24 AM
Could you make a video showing how it works? I don't have any iOS devices but would love to recommend to friends/family. Thanks.
by carbone_12 on 2/9/24, 5:50 AM
OP - this is an incredible project! I worked on something similar (https://oration.app) and really love your idea of using CV/OCR. I'll certainly be giving your app a try
by rickcarlino on 2/7/24, 1:46 PM
I have been looking for a product like this for years, I hope you can bring the price down eventually. In the past I used one of those OCR pens that you can find on Amazon but I found that they were too slow to be of practical use.
Very excited to see all the cool things people publish once LLM pricing drops.
by aryamaan on 2/5/24, 7:32 AM
If you don’t mind me asking what do you use TTS?
by blatherard on 2/7/24, 2:23 AM
Sounds cool, have you looked into potential copyright issues?
by Gys on 2/6/24, 11:30 PM
Love it! But should be for all languages
by tamimio on 2/5/24, 12:24 AM
> Turn any book into an audiobook
English book.
by quickthrower2 on 2/7/24, 9:50 AM
Funny. Felt like another (eyeroll) AI thing, until I read your story here. So definitely use this story in your marketing too! Also the story gives the impression of attention to detail because of why you did it, which is good to know.