by invisiblerobot on 9/11/20, 9:24 PM with 2 comments
Just to be clear, given the html of a wapo article I want to discard all the affiliate links/comments and focus on the article text. I want a generalized solution for many blogs and news sites.
Any tips?
by tlack on 9/11/20, 9:48 PM
It's a very practical start.
I thought the science of it was called "envelope detection" but I'm not getting any relevant hits on that keyword. Will report back if I recall the name.
by nmstoker on 9/11/20, 9:56 PM