by agamble on 6/27/17, 11:46 AM with 100 comments
by JackC on 6/27/17, 1:11 PM
Webrecorder is by a former Internet Archive engineer, Ilya Kreymer, who now captures online performance art for an art museum. What he's doing with capture and playback of Javascript, web video, streaming content, etc. is state of the art as far as I know.
(Disclaimer - I use bits of Webrecorder for my own archive, perma.cc.)
For OP, I would say consider building on and contributing back to Webrecorder -- or alternatively figure out what Webrecorder is good at and make sure you're good at something different. It's a crazy hard problem to do well and it's great to have more ideas in the mix.
by smoyer on 6/27/17, 12:44 PM
by Piskvorrr on 6/27/17, 11:50 AM
(Yes, yes, `wget --convert-links`, I know. Not quite as convenient, though.)
by j_s on 6/27/17, 5:15 PM
I believe the only way to incentivise participation in such a system is by paying for timestamp'ed signatures, eg. "some subset of downloaded [content] from [url] at [time] hashed to [hash]" all tucked into a Bitcoin transaction or something. There are services that will do this with user-provided content[1]; I am looking for something that will pull a url and timestamp the content.
This would also be a way to detect when different users are being served different content at the same url, thus the need for a global network of validators.
by unicornporn on 6/27/17, 1:01 PM
If you really want to create your own archive, set up a Live Archiving HTTP Proxy[1], run SquidMan [2] or check out WWWOFFLE[3].
If you want something simpler, have a look at Webrecorder[4] or a paid Pinboard account with the “Bookmark Archive”[5].
[1] http://netpreserve.org/projects/live-archiving-http-proxy/
[2] http://squidman.net/squidman/index.html
by rahiel on 6/27/17, 1:10 PM
[1]: http://blog.archive.is/post/72136308644/how-much-does-it-cos...
by venning on 6/27/17, 12:59 PM
I like the look. Very clean. I like how fast it's responding; better than archive.org (though, obviously, they have different scaling problems).
"Your own internet archive" might be overselling it, as other commenters have pointed out; the "Your" feels a bit misleading. I think "Save a copy of any webpage." gives a better impression, which you use on the site itself.
The "Archive!" link probably shouldn't work if there's nothing in the URL box. It just gives me an archive link that errors. Example: [1]
Using it on news.YC as a test gave me errors with the CSS & JS [2]. This might be due to the fact that HN uses query parameters in their CSS and JS, which repeat in the tesoro URL, which you may not be parsing correctly.
Maybe have something in addition to an email link for submitting error reports like the above, just cause I'd be more likely to file a GitHub issue (even if the repo is empty) than send a stranger an email.
As other commenters have pointed out, archive.is also does this, and their longevity helps me feel confident that they'll still be around. Perhaps, if you wish to differentiate, offer some way for me to "own" the copy of the page, like downloading it or emailing it to myself or sharing it with another site (like Google Docs or Imgur) to leverage redundancy, or something like that. Just a thought.
All in all, nice Show HN.
EDIT: You also may want to adjust the header to work properly on mobile devices. Still though, nice job. Sorry if I'm sounding critical.
[1] https://archive.tesoro.io/320b55cc9b78e271c94716ee23554da8
[2] https://archive.tesoro.io/a7bf03e247224bc3b4e5a7c1f2ad42b1
by bfirsh on 6/27/17, 3:33 PM
I know a lot of these sites have archiving features, but want something centralised and automatic.
by akerro on 6/27/17, 12:34 PM
They will love it!
by zippoxer on 6/27/17, 4:59 PM
This got me thinking about how a decentralized p2p internet archive could solve the trust problem that exists in centralized internet archives. Such solution could also increase the capacity of archived pages and the frequency at which archived pages are updated.
It is true that keeping the entire history of the internet on your local drive is likely impossible, but a solution similar to what Sia is doing could solve this problem: split each page to 20 pieces and distribute each piece to 10 peers such that every y pieces can recover the original page. So, you only have to trust that 10 peers out of 20 that store a page are still alive to get the complete page.
The main problem I can see right now would be lack of motivation to contribute to the system -- why would people run nodes? Just because it would feature a yet another cryptocurrency? Sure, this could hold now, but when the cryptocurrency craze quiets down and people stop buying random cryptocurrencies just for the sake of trading them, what then? Who would run the nodes and why?
by j_s on 6/27/17, 5:02 PM
extensions: Firefox "Print Edit" Addon / Firefox Scrapbook X / Chrome Falcon / Firefox Recoll
open source: Zotero / WorldBrain / Wallabag
commercial: Pinboard / InstaPaper / Pocket / Evernote / Mochimarks / Diigo / PageDash / URL Manager Pro / Save to Google / OneNote / Stash / Fetching
public: http://web.archive.org / https://archive.is/
by idlewords on 6/27/17, 3:44 PM
I (obviously) think personal archives are a great idea, but republishing is a hornets' nest.
by Retr0spectrum on 6/27/17, 12:53 PM
If I want my own archive, Ctrl+S in Firefox usually works fine for me.
by crispytx on 6/27/17, 1:34 PM
by zichy on 6/27/17, 12:08 PM
by CM30 on 6/27/17, 2:31 PM
As is it, while it's a nice service, it's still got all the issues of other archive ones:
1. It's online only, so one failed domain renewal or hosting payment takes everything offline.
2. It being online also means I can't access any saved pages if my connection goes down or has issues.
3. The whole thing is wide open to having content taken down by websites wanting to cover their tracks. I mean, what do you do if someone tells you to remove a page? What about with a DMCA notice?
It's a nice alternative to archive.is, but still doesn't really do what the title suggests if you ask me.
by jpalomaki on 6/27/17, 1:09 PM
Instead of hosting this directly on my computer, it would be interesting to have a setup where the archiving is done via the service and I would just provide somewhere a storage space where the content would end up being mirrored (just to guarantee that my valuable things are saved at least somewhere, should the the other nodes decide to remove the content).
I would prefer this setup, because it would be easily accessible for me from any device and I would not need to worry about running some always available system. With some suitable P2P setup my storage node would have less strict uptime requirements.
by dbz on 6/27/17, 12:50 PM
[1] https://chrome.google.com/webstore/detail/cmmlgikpahieigpccl...
by prirun on 6/27/17, 1:09 PM
by gorbachev on 6/27/17, 1:43 PM
I also second the need for user accounts. If I am to use your site as my personal archive, then I would need to log in and create a collection of my own archived sites.
by arkenflame on 6/27/17, 11:28 PM
by lozzo on 6/27/17, 12:23 PM
by jdc0589 on 6/27/17, 3:03 PM
I'm confused. It looks like image sources in "archived" pages on Tesoro still point back to the origin domain.
Edit: it works as expected. I just didn't notice the relative paths.
by salmonfamine on 6/27/17, 3:38 PM
by NicoJuicy on 6/27/17, 1:44 PM
I wonder what this site uses
by pbhjpbhj on 6/27/17, 1:07 PM
by skdotdan on 6/27/17, 5:42 PM