This is the second post in a series about data-driven investing.
In the first post I described Hetty, a collection of tools for monitoring stocks and doing financial data analysis. Naturally, Hetty needs a steady inflow of financial information, and I prefer not to input that information by hand, so that’s where online data sources come in.
In this post I want to discuss the sources for financial data that I reviewed as potential input sources for Hetty. I found many questions on Quora and Medium posts about this, but most of the “review articles” that discuss data brokers seem to be written by developer evangelists (ie. marketing people) of the data brokers themselves. Hmm.
Spoiler: I did not find a service that has all these properties.
Quandl is a marketplace for data. Third parties can use it to sell subscriptions to financial datasets. Quandl seems to target institutional investors, but they offer a set of ‘core financial data’ for individual investors.
The platform was clearly built with automation in mind. Upon registration, you immediately receive an API key for interacting with the platform. Quandl also provides sample data for most of their datasets, which is a handy feature for developers.
However, Quandl is not for me. Even very basic information can only be found in ‘premium’ datasets, which cost about $30 dollar per month per dataset. This seems a bit steep since you can get the same data for free from Yahoo Finance. For example, a dataset with End of Day stock prices for US equities was $39,99 a month for personal use, or $299 anually. Since I’d need to combine various datasets to get information for all the (European and American) companies I invest in, this would quickly become too expensive. It’s also a shame that you need to sign up for a Quandl account to just see the pricing for each dataset.
Alpha Vantage is financial data broker startup. They cater to individual investors that need financial data in open formats such as JSON and CSV. Their website specifically mentions the cryptocurrency and foreign exchange (FX) crowd, two areas of investment where many people do algorithmic trading. The standard API is free, and limited to 5 requests per minute and 500 requests per day. This would not be enough for Hetty’s large-scale stock screening, which I run once a month. If you want more requests per minute an no daily limit, you need to buy a subscription to their premium API, which starts at $30 per month.
Unfortunately, I got the feeling that Alpha Vantage’s back-end is not completely ready yet. At first, the website kept going offline, so I couldn’t get an API key. When I finally got some data out of the API, the stock prices contained many outliers: weird gaps and spikes in the data that did not reflect the real-world stock price movement.
If Alpha Vantage’s technology becomes a bit more stable, I might revisit it later to see if it fits my need, especially since Yahoo Finance is scheduled to disappear in the future. In terms of product offering, Alpha Vantage came closest to being a solution for my problem – only I don’t trust the data quality yet. If I am going to pay for a subscription to financial data, I want to be 100% sure of the data quality.
Atom.finance is another financial data startup, which offers a platform for tracking your investment portfolio, a stock screener, a social network for investors, etc. Unfortunately, they currently do not offer any information about securities traded in European markets, and they also don’t have an API.
I registered to view the platform, but I quickly found out that Atom was not what I was looking for. Their user management back-end still needs a bit of work. After I deleted my account, I kept receiving marketing emails with an unsubscribe link to a non-existing user account. Their customer support fixed it for me manually after I sent them an email.
Bloomberg is a big name in the financial world since they also make Bloomberg Terminals, which are used by professional traders for getting financial information. They offer a clear, minimalist website with financial news and general information about equities. The website even has up-to-date information for many European companies.
I like that they report earnings for each company in a consistent format, unlike the Yahoo website, where the reported financials can be very different across companies. Unfortunately I ran into problems when I wanted to collect company information in bulk. The Bloomberg website has a very strict anti-bot-policy, which makes it impossible to scrape, and access to the Bloomberg API probably does not fit my budget.
Yahoo Finance is a great source of information for me. I’ve read multiple reports that the service will disappear in the future, but so far, it doesn’t look like that will happen any time soon.
Apparently, Yahoo Finance used to have an API, but it’s gone now (or only available for premium members). However it doesn’t matter that the API is gone, as I can grab all the data I need from the JSON data embedded in each webpage! The Yahoo Finance website has no strict anti-scraping measures, unlike Bloomberg, so I can easily scrape and compare data for 1000 companies in one go. As an added bonus, Yahoo allows users to download historical stock prices for each company. Just go to a specific equity, go to ‘historical prices’, and click ‘Download’.
As for accuracy, stock prices, market information and general company information are high quality for every listed equity. If you also want recent fundamental data (like balance sheets and statements of income), the data quality differs per company. The data for companies traded on a US market is generally high quality, but the information might be incomplete, inaccurate or missing for companies outside the US.
So far, Yahoo Finance is my best source of information outside of official company filings. I only use it for up-to-date stock prices, as fundamental information is not accurate for European companies.
I have not found a service yet that has all the properties I mentioned in my data wishlist above. I’d love to know about any alternatives to the websites I’ve discussed in this post, so if you know another data source that I should investigate, let me know!
Judith van Stegeren is a Dutch computer scientist specialized in natural language processing, machine learning and data mining. Previously she researched natural language generation for the video games industry at the University of Twente.