from Hacker News

Sentiment Analysis on Web-Scraped Data

by shrig94 on 12/17/14, 8:50 PM with 8 comments

  • by dingdingdang on 12/17/14, 11:50 PM

    This is very interesting and well written article. Must admit that the fully online nature of the tools discourage rather than encourage in my case: why take the time to learn complexities of something as ephemeral as, what seems like, brand new web service? Especially when even large player like Google routinely retire whole platforms when they are not popular enough.

    All the same, the tech itself seems solid and article is as mentioned superb so I'm really just beating the proverbial drum for proper distributed services here (or plain old offline capable apps).

  • by Profan on 12/18/14, 12:16 AM

    If you haven't yet attempted to build some sort of sentiment analysis by yourself yet, be it rule-based or on statistical analysis, you should, even just a rudimentary rule based one is a lot of fun to implement, and it works surprisingly well [0].

    One of the harder parts of making a decent one based on statistical analysis however is the lack of good training data, other than the analyzed twitter dataset [1] and another movie reviews one [2].

    [0] http://fjavieralba.com/basic-sentiment-analysis-with-python....

    [1] http://help.sentiment140.com/for-students/

    [2] http://www.cs.cornell.edu/people/pabo/movie-review-data/

  • by hnriot on 12/18/14, 4:28 AM

    this is cool, but you can do the same with beautifulsoup and textblob in far fewer lines of code and you wouldn't need any web services. if textblob isn't your thing there's plenty of svm implementations out there.

    for more interesting sentiment analysis approaches check out sentence vectors, that's the current bleeding edge of research in this area.

    most sentiment analysis systems need to use an ensemble classifier because the domain of the text is very important. identifying the domain and using the appropriate domain specific model is important.

  • by silentrob on 12/18/14, 3:15 AM

    Very cool. MonkeyLearn looks promising. It would be nice if their docs were a little more clear around uploading CSV and the data structure.

    It would also be cool if it did unsupervised learning.