Register for the iXsystems Community to get an ad-free experience and exclusive discounts in our eBay Store.

Download newsgroup contents?

danb35

FreeNAS Wizard
Joined
Aug 16, 2011
Messages
10,909
Not very closely related to FreeNAS at all (except that it involves storing potentially-large amounts of data), but...

I'm wanting to set up a local Discourse instance and import the contents of some newsgroups into it, just for the sake of having a nicer interface for the group archive. The Discourse setup itself is easy enough, as is the message import given a .mbox file. But getting the messages, well...

One option is to download them from Google Groups. The Discourse folks provide a script that will do this, and it mostly works, but (1) it's really slow (multiple seconds per message), (2) it doesn't work for all the groups I'm interested in for some reason, and (3) since I'm not the "owner" of any of the groups, it mangles email addresses.

Then I remembered that I have an account at a NNTP server for, well, other reasons. Surely it couldn't be too difficult to fire up $SOFTWARE and tell it to download everything from comp.sys.something to a .mbox file, right? Except that I'm not sure what $SOFTWARE would be here. Any suggestions? Bonus points if it will work with more than one server on a priority basis, downloading everything from server1, and getting anything from server2 that isn't on server1.
 

danb35

FreeNAS Wizard
Joined
Aug 16, 2011
Messages
10,909
Well, a bit more digging suggests that slrn, and more specifically, its companion software of slrnpull, could do the job. They're both available in the FreeBSD package of slrn, and slrnpull exists to do pretty much what I want to do--it downloads each message as its own file, rather than combining them into large .mbox files, but the import script I'll be using for Discourse can handle those as well. Looks like there are only two issues left:
  • slrnpull uses directory structure, rather than dot separators, to handle the newsgroup hierarchy--so comp.sys.apple2 is in news/comp/sys/apple2. The Discourse import script uses the top-level directory as the "category" for the imported messages. This is an easy fix.
  • My NNTP server "only" has about 12 years' retention. I'd like to find an archive going back farther if I could.
 
Top