by sdcoffey on 5/5/22, 7:20 PM with 17 comments
We're working on an investment aggregator that tracks the value of our customers' portfolios over time. We've faced a bunch of challenges in getting public stock market data (live/historical prices, splits, etc). We're currently cobbling together a dataset from a few different sources (Polygon, IEX, etc), but it's been a massive pain*
I'm wondering if this is the case for other fintech devs. Does everyone face the pain of assembling their own financial datasets? Or do we have unique needs/a bad solution?
So HN, where do you all source your financial data?
* Our main challenges:
Data quality: - Ticker symbol changes (and CUSIP/ISIN changes/challenges) - Missing or wrong values for some days - Missing or incorrect splits Speed: - Tens of API calls that would be necessary to render one screen - Historical data syncing that would take days of API calls Burden: - Enterprise sales contracts instead of self-serve - Building and maintaining your own ETL pipeline to ingest data
by Kon-Peki on 5/5/22, 9:12 PM
Here's a good start to the list of vendors (HINT: click on "Data Dictionary" for things that look interesting):
https://wrds-www.wharton.upenn.edu/pages/about/data-vendors/
For linking across different data sets and tracking companies over their entire history (check out the video starting around 6 minutes in):
https://wrds-www.wharton.upenn.edu/pages/about/data-vendors/...
EDIT - FYI, WRDS is the financial data platform that almost all large universities use (worldwide). They handle all the management of it, and your school just has to write a few checks. Many of these data sets are licensed to all students, not just business school students. So if you are a CS student and want this kind of data for building an ML model or something like that, you should be able to get it by requesting an account on the WRDS page linked above. They might push back on you a little, in which case you'll have to go over to the business school at your Uni to get things ironed out. They have a non-techie-friendly interface, but also offer a Postgres interface so you can connect directly from Python or R or whatever with your account credentials.
by tdubhro1 on 5/5/22, 9:59 PM
by dvasdekis on 5/5/22, 10:29 PM
ib_insync was the python client library, and I OS'd the market gateway I built: https://github.com/dvasdekis/ib-gateway-docker-gcp
by olkyts on 5/6/22, 7:38 AM
Recently I saw this project on HN: https://openbb.co/ Venturebeat press release states that "The platform gleans its investment data via publicly available sources, among others that require an API key — these include Alpha Vantage, Financial Modeling Prep, Finnhub, Reddit, Twitter, Coinbase, the SEC, and many more." I haven't checked their git for data sources.
My question then is how do you build a data provider? Where do these data providers take their data that they sell, like Bloomberg?
by vamega on 5/6/22, 2:37 AM
I’ve used Bloomberg Backoffice files in the past, and later went to work at Bloomberg to try and make that data more easily usable.
MarketQA had a product that can give you historical data as well, but tied more into the Reuters world.
Corporate actions are a complete pain, Bloomberg’s back office file data for the adjustment factors isn’t consistent with the data you can pull from a Bloomberg terminal.
The wider your coverage the harder it is to do this correctly.
If you then want historical intraday prices as well, this gets much more expensive and much more complicated to set up. My last job had an entire team trying to get all this right, and still got it wrong a lot.
by melony on 5/5/22, 10:59 PM
If you are really tight on cash, try Intrinio. No idea about their quality but they have been around for a while.
by blakbelt78 on 5/6/22, 3:11 AM
by ecesena on 5/6/22, 12:26 AM
by gaws on 5/7/22, 12:21 AM
by snake_doc on 5/6/22, 12:21 AM