by dvcoolarun on 2/5/24, 7:24 PM with 43 comments
[Edit] Sample PDF :: https://drive.google.com/file/d/1n7M1TKOptSsYiibrbvV_Yojx53T...
by ComputerGuru on 2/5/24, 7:50 PM
by jackconsidine on 2/5/24, 8:24 PM
If there's any interest I might OSS the pipeline
by nacho2sweet on 2/6/24, 5:16 PM
by dvcoolarun on 2/5/24, 7:59 PM
by dvcoolarun on 2/6/24, 3:22 AM
I'm on mobile, so I can't add a Google Drive file screenshot to the readme, and iframes are not supported.
by pavs on 2/5/24, 10:30 PM
sudo apt install pandoc wkhtmltopdf
npm install -g readability-cli
pandoc -s https://www.paulgraham.com/avg.html -o output.html && readable output.html -o readable.html && wkhtmltopdf readable.html output.pdf && open output.pdf
going even further using bash script to prompt for url. #!/bin/bash
# Prompt the user for a URL
read -p "Enter the URL: " URL
# Use the URL in the pandoc command
pandoc -s $url -o output.html && readable output.html -o readable.html && wkhtmltopdf readable.html output.pdf && open output.pdf
chmod +x web2pdf.sh
# add an alias to bashrc
alias web2pdf='/path/to/your/web2pdf.sh'
source ~/.bashrc
by seabass-labrax on 2/5/24, 7:30 PM
by adrian_b on 2/6/24, 11:11 AM
It would be really nice if there existed a utility able to produce a PDF file where the Web pages are rendered as well as the browsers render them on the screen, without becoming confused even by complex scripts loaded by the page.
The alternatives to "Print" (producing a PDF) are even worse. A screenshot has limited resolution and it loses the text. In the past "Save as ..." was the normal solution, but now even if you save a "complete" page, it will still frequently include scripts that will no longer work offline. What I want to save are the pages perfectly rendered as they were at that instant, without any scripts that could make them appear differently in the future.
by Someone on 2/5/24, 10:04 PM
pipenv shell
pipenv install
python main.py https://www.paulgraham.com/avg.html, https://www.paulgraham.com/determination.html
Just add the webpage URLs separated by commas”What’s the rationale for “separated by commas”? The convention for CLI arguments is to use one argument per input file.
by jll29 on 2/5/24, 8:59 PM
% python main.py https://www.paulgraham.com/avg.html
Traceback (most recent call last):
File "/Users/bill/web2pdf/main.py", line 7, in <module>
from readability import Document
ImportError: cannot import name 'Document' from 'readability'
(/Users/bill/.local/share/virtualenvs/web2pdf-
gXeVRXKg/lib/python3.9/site-packages/readability/__init__.py)
But according to your Pipfile.lock, the readability module needed is 0.3.1: "readability": {
"hashes": [
"sha256:f9030df8bc31aad45baffa9a2d9ce1fdd8051833e5b5bda3027df32fdec00fad"
],
"index": "pypi",
"version": "==0.3.1"
},
Version 0.3.1 of the module "readability" exists, but does not appear to have a class "Document".by OhMeadhbh on 2/5/24, 8:21 PM
pdfpage() {
convert -resize 0x1000^ "${1}"[${2}] -background white -flatten sixel:-
}
You can probably deduce it assumes you have a Imagemagick installed and you're in a terminal with sixel support.by fishywang on 2/6/24, 5:47 AM
For e-ink readers epubs are generally better than PDFs for urls anyways, as epubs are basically packed htmls, and also the flow text works better on smaller screens.
by Throw73747 on 2/5/24, 8:31 PM
by rahimnathwani on 2/6/24, 1:25 AM
One benefit of using a Chrome extension (vs. CLI) is that it's easy to 'print' things that require authentication.
by jll29 on 2/5/24, 8:56 PM
by sn0n on 2/6/24, 7:27 AM
by harry8 on 2/5/24, 11:58 PM
I'm sure I'm missing something, what is a cli interface buying me here?
by K2h on 2/5/24, 8:00 PM
by codeonline on 2/6/24, 2:23 AM
by skanga on 2/5/24, 8:08 PM