from Hacker News

Ask HN: Where do you get your financial data?

by sdcoffey on 5/5/22, 7:20 PM with 17 comments

Hey Hacker News,

We're working on an investment aggregator that tracks the value of our customers' portfolios over time. We've faced a bunch of challenges in getting public stock market data (live/historical prices, splits, etc). We're currently cobbling together a dataset from a few different sources (Polygon, IEX, etc), but it's been a massive pain*

I'm wondering if this is the case for other fintech devs. Does everyone face the pain of assembling their own financial datasets? Or do we have unique needs/a bad solution?

So HN, where do you all source your financial data?

* Our main challenges:

Data quality: - Ticker symbol changes (and CUSIP/ISIN changes/challenges) - Missing or wrong values for some days - Missing or incorrect splits Speed: - Tens of API calls that would be necessary to render one screen - Historical data syncing that would take days of API calls Burden: - Enterprise sales contracts instead of self-serve - Building and maintaining your own ETL pipeline to ingest data

  • by Kon-Peki on 5/5/22, 9:12 PM

    You have to agree to serious licenses, and then you pay serious money for it.

    Here's a good start to the list of vendors (HINT: click on "Data Dictionary" for things that look interesting):

    https://wrds-www.wharton.upenn.edu/pages/about/data-vendors/

    For linking across different data sets and tracking companies over their entire history (check out the video starting around 6 minutes in):

    https://wrds-www.wharton.upenn.edu/pages/about/data-vendors/...

    EDIT - FYI, WRDS is the financial data platform that almost all large universities use (worldwide). They handle all the management of it, and your school just has to write a few checks. Many of these data sets are licensed to all students, not just business school students. So if you are a CS student and want this kind of data for building an ML model or something like that, you should be able to get it by requesting an account on the WRDS page linked above. They might push back on you a little, in which case you'll have to go over to the business school at your Uni to get things ironed out. They have a non-techie-friendly interface, but also offer a Postgres interface so you can connect directly from Python or R or whatever with your account credentials.

  • by tdubhro1 on 5/5/22, 9:59 PM

    Everyone faces this pain. There is no single good source for all the data any nontrivial app needs. The main vendors are predatory and will change the goalposts to charge you as much as you can bear and more as you grow. In my experience this is a huge obstacle and probably a major reason there is limited innovation in anything like derived market data analytics. It’s not just you, it’s everyone.
  • by dvasdekis on 5/5/22, 10:29 PM

    FWIW, I went deep with Interactive Brokers for my last Fintech. It was the only place we could source real-time currency options data and we managed to reduce our total latency from market to database to about 10ms by using a NYC datacenter, which was enough for us. They had historic ticks too, but an inbuilt ratelimiter made it a multi-month project to pull serious volumes of data.

    ib_insync was the python client library, and I OS'd the market gateway I built: https://github.com/dvasdekis/ib-gateway-docker-gcp

  • by olkyts on 5/6/22, 7:38 AM

    Interesting. I didn't think it is a challenge until I came across this discussion.

    Recently I saw this project on HN: https://openbb.co/ Venturebeat press release states that "The platform gleans its investment data via publicly available sources, among others that require an API key — these include Alpha Vantage, Financial Modeling Prep, Finnhub, Reddit, Twitter, Coinbase, the SEC, and many more." I haven't checked their git for data sources.

    My question then is how do you build a data provider? Where do these data providers take their data that they sell, like Bloomberg?

  • by vamega on 5/6/22, 2:37 AM

    What asset classes are you looking for?

    I’ve used Bloomberg Backoffice files in the past, and later went to work at Bloomberg to try and make that data more easily usable.

    MarketQA had a product that can give you historical data as well, but tied more into the Reuters world.

    Corporate actions are a complete pain, Bloomberg’s back office file data for the adjustment factors isn’t consistent with the data you can pull from a Bloomberg terminal.

    The wider your coverage the harder it is to do this correctly.

    If you then want historical intraday prices as well, this gets much more expensive and much more complicated to set up. My last job had an entire team trying to get all this right, and still got it wrong a lot.

  • by melony on 5/5/22, 10:59 PM

    Have you tried the obvious sources? Bloomberg, S&P, Nasdaq. Fintech isn't a cheap game. If you get your data from small data vendor startups, you risk getting poor fundamentals data. You don't want to waste time on debugging the calculations when you could be iterating. Some brokers like IB are also have a data vendoring side business.

    If you are really tight on cash, try Intrinio. No idea about their quality but they have been around for a while.

  • by blakbelt78 on 5/6/22, 3:11 AM

    I’ve tried many APIs and there’s always gaps in the data. I was working on a stock market API to scratch an itch (hotstoks.com) but now it’s in stealth mode just because all the data issues I was having. I’m using IEX and Yahoo finance.
  • by ecesena on 5/6/22, 12:26 AM

    Check out Pyth, if it has the data you’re looking for, it should be veri high quality and timely.

    https://pyth.network/

  • by gaws on 5/7/22, 12:21 AM

    Bloomberg Terminal. Free account. For life. God help me if I ever lose my login.
  • by snake_doc on 5/6/22, 12:21 AM

    Bloomberg Terminal