CF Archiver webapp/bot (for linked material -- reddit, tiktok, etc)

I have written a webapp to archive linked materials from posts to this forum. It’s really annoying reading a discussion and finding that the reddit post or tiktok video that was linked has since been deleted. This should solve both those problems. It also archives YT videos and some other things. Feature requests are welcome.

The site also allows users to submit URLs for archival. To do so, you need to register an account and become an ‘approved’ user. The easiest way to do become one is to copy the string from your profile page (after logging in) and paste that as the first thing in a public message on the forum. See the post after this for an example where I do that. If automatic way doesn’t work, then you can DM me and I can manually do it. A forum account can only be linked once, so make sure you save the generated username and pw in your password manager.

If you find that you can do anything that seems like it should be admin only, please report it as a bug (eg deleting an archived item, forcing a re-archive, you can access https://cf-archiver.xk.io/admin/user/1, etc).

Please note: this has not been tested very much. I have made sure that reddit and YT work, but it might not work for other sites. Some sites are skipped like wayback machine or archive.today and aliases. You are welcome to submit archival links to test and report bugs here (and/or make feature requests). YT videos over 1hr won’t be archived but it’s possible to add if we want it.

Currently, all approved users can toggle the NSFW status of a post. Please don’t abuse this. I figure this is okay for now since only people who post here (or long time community members who unlurk to DM me) will be approved. The NSFW feature is mostly there as a safeguard. Visibility of NSFW posts can be toggle via the button in the top right. The only reliable NSFW detection is for reddit posts (poorly tested; bugs aside, it should be easy to do reliably).

The site was 99% vibe coded via various claude models. I do the deployment stuff manually, but I wonder whether claude could do all that if I ran claude code on a fresh ubuntu server.

Links to content will usually 307 redirect to the bucket via public URL.



Note: the Archived Page Preview is open on page load by default. I collapsed it in the above image so you could see more of the page.

Videos are saved for YT, tiktok, twitter*, and some other providers (search for streamable on the github repo to find a list)

Comments are saved for YT, tiktok, and twitter*.

* Most of the twitter stuff is untested; I need to set up an account and log in on the server, and probably debug a bit after that.

Also, a comments section on /archive/X pages might appear at some point. I instructed claude to code it, but it doesn’t show up. I am not sure if a comments section is actually useful enough to bother fixing.

4 Likes

link_archive_account:BrightSphinx8636

The above is how you can link an archive account to your forum account (find it on the “Profile” page). At the moment this just sets your display name on the archive site (which isn’t visible anywhere yet), but maybe we can do something in future.

Misc notes:

The account auto link feature seems to work. Please report any bugs. It should set your display name to @Username exactly. Also it just occurred to me that anon accounts probably work with it too, so you can link an anon account if you like.

There is a small chance of some interruption from the VPS host (account verification related). If this happens, it should be within the next few days but ideally won’t happen (account verification stuff).

Ongoing costs are ~4 euro / mo via Hetzner + Cloudflare R2 ($0.015 / GB-month; first 10 GB is free).

Also, another link that you should not be able to access: https://cf-archiver.xk.io/admin/forum-user/@Max%20Max

For testing, here is an archived page marked NSFW that is SFW.

Also, latency for auto-archiving is maybe 10 to 70 seconds if there’s nothing in the queue. It polls the forum’s rss feed once per minute, and then archiving starts pretty quickly after that. There is some rate limiting on parallel archiving, so if there are 20 links in a post it will be a little while before they’re all processed, especially if some are videos. Under some views of All Archives you can see the pending, inprog, and failed archives.

Update: twitter threads are harder than anticipated. yt-dlp has a thing for it but my account got blocked pretty quickly. I’m just going to ignore twitter replies and threads for now.

link_archive_account:AncientCockatrice7366

link_archive_account:WildManticore6243

link_archive_account:GrimChimera3928