Interact with the yahoo finance API using python's requests library

Interact with the yahoo finance API using python's requests library

2018, Apr 04    

When you want to start using financial data for a side project or to get started with Data Science it can quickly become tedious to scrape all the relevant information from the web.
No more! Here are three small functions which let you interact with the yahoo finance website in order to download historical stock data.

For these functions to work, we need to import some modules first -

import requests                  # [handles the http interactions](http://docs.python-requests.org/en/master/) 
from bs4 import BeautifulSoup    # beautiful soup handles the html to text conversion and more
import re                        # regular expressions are necessary for finding the crumb (more on crumbs later)
from datetime import datetime    # string to datetime object conversion
from time import mktime          # mktime transforms datetime objects to unix timestamps

Now, lets talk a little about cookies 🍪 and crumbs 🍴🍪

Cookie Monster:
Two wrongs not make right.
But two 🍪 . . . make everything right.

Twitter

Cookies are small text snippets websites usually save on your computer, they can contain for example identifiers such as session IDs.
The yahoo finance website uses cookies and restricts access to users (scripts) unless they are sending the proper cookie to their server.

Requests handles cookies in a cookiejar object, which essentially is a python dictionary but cookiejar sounds way cooler, right?!
in my case, when connecting to the website, the cookiejar
<RequestsCookieJar[<Cookie B=7a6fe1pdcbvsa&b=3&s=c6 for .yahoo.com/>]>
contains a single cookie element (B), which we will send with our request.
Additionally, they use something called a crumb, which is another identifier, but this one is send in the url, when requesting the historical csv file!

Coding

The first function calls the website of a selected stock and collects the cookies and crumb.
We reuse the headers in the subsequent function to mimic the same browser.

def _get_crumbs_and_cookies(stock):
    """
    get crumb and cookies for historical data csv download from yahoo finance
    
    parameters: stock - short-handle identifier of the company 
    
    returns a tuple of header, crumb and cookie
    """
    
    url = 'https://finance.yahoo.com/quote/{}/history'.format(stock)
    with requests.session():
        header = {'Connection': 'keep-alive',
                   'Expires': '-1',
                   'Upgrade-Insecure-Requests': '1',
                   'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) \
                   AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36'
                   }
        
        website = requests.get(url, headers=header)
        soup = BeautifulSoup(website.text, 'lxml')
        crumb = re.findall('"CrumbStore":{"crumb":"(.+?)"}', str(soup))

        return (header, crumb[0], website.cookies)

The second function just converts the provided start/stop dates into unix timestamps, because yahoo finance uses these in the request url.

def convert_to_unix(date):
    """
    converts date to unix timestamp
    
    parameters: date - in format (dd-mm-yyyy)
    
    returns integer unix timestamp
    """
    datum = datetime.strptime(date, '%d-%m-%Y')
    
    return int(mktime(datum.timetuple()))

While the last function does the actual requesting of the historical stock data.
This function returns a list of the individual daily values, containing the header (row 0) and subsequently one line per day.

def load_csv_data(stock, interval='1d', day_begin='01-03-2018', day_end='28-03-2018'):
    """
    queries yahoo finance api to receive historical data in csv file format
    
    parameters: 
        stock - short-handle identifier of the company
        
        interval - 1d, 1wk, 1mo - daily, weekly monthly data
        
        day_begin - starting date for the historical data (format: dd-mm-yyyy)
        
        day_end - final date of the data (format: dd-mm-yyyy)
    
    returns a list of comma seperated value lines
    """
    day_begin_unix = convert_to_unix(day_begin)
    day_end_unix = convert_to_unix(day_end)
    
    header, crumb, cookies = _get_crumbs_and_cookies(stock)
    
    with requests.session():
        url = 'https://query1.finance.yahoo.com/v7/finance/download/' \
              '{stock}?period1={day_begin}&period2={day_end}&interval={interval}&events=history&crumb={crumb}' \
              .format(stock=stock, day_begin=day_begin_unix, day_end=day_end_unix, interval=interval, crumb=crumb)
                
        website = requests.get(url, headers=header, cookies=cookies)
       
        return website.text.split('\n')[:-1]
    

Thats it, all the magic done and you can now work with the stock data and visualize it beautifully!

If you like and/or use this code, drop me a tweet or dm!