from Hacker News

Prophet: forecasting at scale

by benhamner on 2/27/17, 5:29 AM with 110 comments

  • by confounded on 2/27/17, 6:49 AM

    Worth noting Prophet is R/Python wrappers to some models with reasonable defaults, written in and fit by Stan, a probabilistic programming language, and Bayesian estimation framework.

    Stan is amazing in that you can fit pretty much any model you can describe in an equation (given enough time and compute, of course)!

    More on Stan here: http://mc-stan.org/

  • by rodionos on 2/27/17, 7:30 AM

    I didn't know wikipedia page view counters are available for public usage.

    The wikipediatrend R package relies on http://stats.grok.se/, which in turn relies on https://dumps.wikimedia.org/other/pagecounts-raw/ which has been deprecated.

    The new dump is located at https://dumps.wikimedia.org/other/pageviews/

    Data is available in hourly intervals.

    * pageviews-20170227-050000

      en Peyton_Manning 58 0
    
    [edit] There is a wikipedia-hosted OSS viewer for these logs, e.g. Swedish crime stats:

    https://tools.wmflabs.org/pageviews/?project=en.wikipedia.or...

  • by saosebastiao on 2/27/17, 3:46 PM

    This is an interesting project, and in one of the areas where almost all businesses could do better. Anecdotally, there is a ton of money left on the table by established businesses that do it poorly, which also leaves lots of room for resume-padding technical experience. So anything that claims to improve the state of the art of automated forecasting is definitely worth watching.

    That being said this claim in point #1 baffles me:

    > Prophet makes it much more straightforward to create a reasonable, accurate forecast. The forecast package includes many different forecasting techniques (ARIMA, exponential smoothing, etc), each with their own strengths, weaknesses, and tuning parameters. We have found that choosing the wrong model or parameters can often yield poor results, and it is unlikely that even experienced analysts can choose the correct model and parameters efficiently given this array of choices.

    The forecast package contains an auto.arima function which does full parameter optimization using AIC which is just as hands free as is claimed of Prophet. I have been using it commercially and successfully for years now. Maybe prophet produces better models (I'll definitely take a look myself), but to claim that it's not possible to get good results without experience seems a bit disingenuous.

    As an aside, anybody interested in a great introductory book on time series forecasting should check out Rob Hyndman's book which is freely available online. https://www.otexts.org/fpp

  • by schlarpc on 2/27/17, 6:47 AM

  • by techno_modus on 2/27/17, 9:46 AM

    It seems that they have developed a model for only univariate forecasts and only numeric regular time series which is a classical use case in statistics. Yet, most data sources have many dimensions (for example, energy consumption, temperature, humidity etc.) as well as categorical data like current state (On, Off). The situation is even more difficult if the data is not a regular time series but is more like asynchronous event stream. It would be interesting to find a good forecasting model for some of these use cases. In particular, it is interesting if this Prophet model can be generalized and applied to multivariate data.
  • by cardosof on 2/27/17, 6:25 AM

    That's very cool, congrats and thank you to the Facebook guys!

    A few days ago I was asked to do some forecasting with a daily revenue series for a client. Due to her business' nature the series was really tricky with weekdays and months/semesters having some specific effects on the data. I as many use Hyndman's forecast package, but I threw this data at prophet and it delivered a nice plot with the (correct) overall trend and seasonalities. Very cool and easy to do something.

  • by yoghurtio on 2/27/17, 7:42 AM

    We at https://yoghurt.io/ have been working towards similar forecasting solution. So far the feedback has been that automated solutions can also bring good results at a far lesser cost compared to hiring an expert analyst.

    Its a completely managed solution. No need to setup anything yourself.Just upload the data and predict next week's data, today itself. There is a free trial and if anyone here is looking for an extended trial, they can reach out to me.

  • by anacleto on 2/27/17, 9:15 AM

    This is so great!

    I've been using CasualImpact by Google [0] for months. This seems pretty straightforward.

    [0] https://google.github.io/CausalImpact/CausalImpact.html

  • by jl6 on 2/27/17, 6:39 AM

    I wonder what Sungard/FIS think of the name, which is the same as their commercial financial modelling/forecasting tool.
  • by pacifika on 2/27/17, 7:30 AM

    The more facebook grows the more tools it aligns tooling with intelligence services.
  • by asafira on 2/27/17, 6:02 AM

    So...How much will this do at forecasting stock prices? =)

    Very cool though --- I would be interested to dive into the methods they've implemented sometime in the near future!

  • by hnarayanan on 2/27/17, 6:49 AM

    Is there a way to extend these models to handle spatial variation (e.g. weather forecasting, property price estimation etc.) as well?
  • by dmichulke on 2/27/17, 1:01 PM

    I have been working for a few years on a similar project using evolutionary algorithms on top of other models (linear / ann). It works quite well (e.g., for equidistant energy demand / supply forecasts) but there's still lots of stuff to do.

    It's major benefit is that it figures out relationship to the target time series by itself, so you can just throw in all time series and see what comes out.

    Language is Clojure, 20kloc, incanter, encog. If anyone is interested in working for/with it, let me know. I currently develop a Rest Api for it and plan to release it as open source once the major code smells are dealt with.

  • by Steeeve on 2/27/17, 8:21 AM

    This actually looks incredibly useful and pretty simple to learn.

    Between this and Stan I think my free time for the next week is gone.

  • by zebrafish on 2/27/17, 2:57 PM

    So.... I don't understand how this is better or worse than using forecast.

    You talk about having to choose the best algorithm but it seems like Prophet is just another algorithm to choose from. Is there some kind of built in grid-search or are you just stating that results from your AM have been more accurate than ARIMA?

  • by hn_username on 2/27/17, 4:37 PM

    This is a nice piece of work - thanks for sharing with the community!

    Some feedback: it'd be nice to see you actually quantify how accurate Prophet's forecasts are on the landing page for the project. In the Wikipedia page view example, you go as far as showing a Prophet forecast, but it'd be nice to have you take it one step further and quantify its performance. Maybe withhold some of the data you use to fit the model and see how it performs on that out of sample data. It's nice that you show qualitatively that it captures seasonality, but you make bold claims about its accuracy and the data to back those claims up is conspicuously absent. Related, it might be worth benchmarking its performance against existing automated forecasting tools.

    I'll definitely be checking it out!

  • by SmellTheGlove on 2/27/17, 4:25 PM

    For us insurance/financial services folks, I would like to simply clarify that this is not the Sungard/FIS risk management platform that is also called Prophet! :D

    I got really excited for a second. Actually, I'm still pretty excited about this even if it was something else entirely.

  • by nickfzx on 2/27/17, 11:16 AM

    This looks amazing, congratulations.

    We're planning to add forecasting to our SaaS analytics product (https://chartmogul.com) later this year, I'm going to look and see if we can use this in our product now.

  • by minimaxir on 2/27/17, 6:54 AM

    Interesting definition of "scale" in this context, as it does not imply "big data" like every other usage of the word scale in data science. The tool works on, and is optimized, for day-to-day, mundane data.

    See also the R vignette, which shows that the data is returned per-column which gives it a lot of flexibility if you only want certain values: https://cran.r-project.org/web/packages/prophet/vignettes/qu...

  • by paulvs on 2/27/17, 3:49 PM

    For a corporate credit analyst working at a bank, what are some good introduction material for getting into forecasting using tools like these?

    I see this being applicable to analysts when deciding on on a company's credit worthiness.

  • by syntaxing on 2/27/17, 4:47 PM

    The fact that Prophet follows the "sklearn model API" and that it's very well integrated with pandas makes it super appealing and usable!
  • by monkeydust on 3/4/17, 8:15 AM

    Very cool, got loads of sensor data around my house over a years worth so curious to throw it at Prophet.

    Has anyone managed to get this working on windows with Juypter (Anaconda build) struggling with Pystan errors. Any guidance welcomed.

  • by eternalban on 2/27/17, 1:36 PM

    /please ignore: Oracle & Prophet. Oracle sifts through signs but Prophet has a line to the larger picture. I suppose the next 'product' will be called Messiah to complete the picture.
  • by elwell on 2/27/17, 8:59 PM

    Why do we need Prophet when we already have Temple OS (http://www.templeos.org/)?
  • by nodesocket on 2/27/17, 8:35 AM

    Are there any startups/services where you pass it a series and it returns forecast models? That's something I'd be willing to pay for.
  • by alexpetralia on 2/27/17, 2:28 PM

    Slightly inconvenient that the main image <figure> needs to be replaced by an <img> tag just to have the image appear in print outs.
  • by poppingtonic on 2/27/17, 9:42 AM

    This is very interesting. Forecasters who participate in the Good Judgment Project, such as myself, will find this useful.
  • by recurser on 2/27/17, 9:28 AM

    Very cool. Could this be re-purposed for detecting anomalies/outliers in time series data?
  • by ayayecocojambo on 2/27/17, 11:20 AM

    Can we use other features (like temperatue?), or it has to be only time-based?
  • by agounaris on 2/27/17, 6:38 AM

    How different this framework is from statsmodels?
  • by hubot on 2/27/17, 2:49 PM

    can someone explain what's the meaning of this line

    > df['y'] = np.log(df['y'])

  • by Helmet on 2/27/17, 5:42 PM

    just wanted to point out to potential windows users - this will only run on python 3.5 due to dependencies (pystan only works on python 3.5 for windows)
  • by fagnerbrack on 2/27/17, 7:02 AM

    Facebook...