by hecticjeff on 1/19/15, 9:08 AM with 31 comments
by boie0025 on 1/19/15, 3:01 PM
Once I had a good pattern in place I could easily create subclasses of the data type I was trying to scrape, basically pointing each of the modeled data methods to an xpath that was specific to that page.
by Doctor_Fegg on 1/19/15, 11:39 AM
Mechanize allows you to write clean, efficient scraper code without all the boilerplate. It's the nicest scraping solution I've yet encountered.
by wnm on 1/19/15, 4:15 PM
I've spend a lot of time working on web scrapers for two of my projects, http://themescroller.com (dead) and http://www.remoteworknewsletter.com, and I think the holy grail is to build a rails app around your scraper. You can write your scrapers as libs, and then make them executable as rake tasks, or even cronjobs. And because its a rails app you can save all scraped data as actual models and have them persisted in a database. With rails its also super easy to build an api around your data, or build a quick backend for it via rails scaffolds.
[0] https://github.com/jnicklas/capybara [1] http://www.rubydoc.info/github/jnicklas/capybara/
by joshmn on 1/19/15, 5:10 PM
by jstoiko on 1/19/15, 3:31 PM
by pkmishra on 1/19/15, 4:44 PM
by k__ on 1/19/15, 12:10 PM
by programminggeek on 1/19/15, 4:14 PM
by richardpetersen on 1/19/15, 11:54 AM
by mychaelangelo on 1/19/15, 11:42 AM