Here at The Data Incubator, our Fellows deploy their own fully functional, public-facing web app to showcase their data science skills to employers. This not only gives them valuable experience dynamically fetching and displaying data, but also encourages them to think about end user interaction. To demo the process, we decided to marry together some of our favorite technologies:
- Flask, a slick web framework for Python
- Heroku for cloud-based app deployment
- Bokeh for interactive, D3.js-style visualizations
- Git for version control and distributing code
The goal is to create some distant ancestor of Google Finance: a form capable of accepting a stock ticker as input and producing a plot of the daily close price. Here’s the finished product. So how do we get there?
Building the app
We’re going to be building our app in Flask, and all we need are some barebones forms, redirects, and HTML templates to collect the user input and display the desired information. We can even go the extra mile and create a custom error page – how fancy! If you’re new to Flask, try this great tutorial or the official one from Flask and you’ll be up to speed in no time.
We need to get data from somewhere, and Quandl makes this easy with a robust API that can pull from open datasets and output in multiple formats. We can grab the time-series data using Python’s requests library and throw it into a Pandas dataframe from there. Pandas provide a plethora of tools for exploring, cleaning, and analyzing data, and they can be plugged right into Bokeh. To get an idea of the kinds of plots you can generate, check out this series of step-by-step IPython notebooks. A couple tricks that came in handy were adding in some retries to the API call:
import requests api_url = 'https://www.quandl.com/api/v1/datasets/WIKI/%s.json' % stock session = requests.Session() session.mount('http://', requests.adapters.HTTPAdapter(max_retries=3)) raw_data = session.get(api_url)
and making sure to let Bokeh know it’s dealing with a DateTime index:
from bokeh.plotting import figure plot = figure(tools=TOOLS, title='Data from Quandle WIKI set', x_axis_label='date', x_axis_type='datetime')
from bokeh.embed import components script, div = components(plot) return render_template('graph.html', script=script, div=div)
graph.html, we’ll load BokehJS on the fly from a Content Delivery Network and use Jinja2 templating to include the plot components without worrying about escaping characters:
*Note*: In the below, you’ll want to make sure to modify the CSS and JS include to specify the version of bokeh you’re using (here 0.12.0)
The great thing about Bokeh is that it produces malleable plots: even though the entire date range is displayed at first, you can easily box- or wheel-zoom your way into whatever specifics you’re looking for.
Publishing to the web
Heroku fully supports Python apps, but we still want to make sure the remote environment is set up properly. We can specify a Python runtime (take note, those who’ve made the switch to Python 3) as well as take advantage of gunicorn‘s concurrent request processing by modifying the
Procfile according to what process types we want to use:
web: gunicorn app:app
Of course, we’ll also need to manage our dependencies so that Heroku knows where to find gunicorn, Jinja2, and everything else we’re using. As we’ve done more development in Python, we’ve come to appreciate Conda as an alternative package manager to the ubiquitous pip. Rather than compiling from source, Conda installs from binaries, which can be noticeably faster, especially when pushing a build to Heroku. Luckily all we have to do to take advantage of Conda is add the buildpack:
heroku config:add BUILDPACK_URL=https://github.com/kennethreitz/conda-buildpack.git
and add a
conda-requirements.txt to handle the install dependencies. Not everything can be installed with Conda, so we still need a
requirements.txt file to instruct pip to take care of the rest.
Finally, for hosting the app on Heroku, port 33507 is reserved for Flask and that worked well for us:
if __name__ == '__main__': app.run(port=33507)
It’s often useful to run the app locally in order to iterate rapidly through changes, by pointing your web browser at
localhost:33507. However, if you’re developing on a cloud server like Digital Ocean where you can’t do that, you can make the app publicly available by using
app.run(host='0.0.0.0'). Just be careful not to have debug mode on when you do this, as it will allow anyone on the internet to execute arbitrary code on your computer.
Heroku and Git already have tight integration. Once all the pieces are in place, deploying is as simple as:
git init git add . git commit -m 'initial commit' heroku login heroku create cleverappname git push heroku master
and voilà, your app is live at
Distributing the framework
Part of the beauty of Git is that it’s not just about keeping track of code revisions, it’s about participating in the open-source community of GitHub where ideas and expertise flow freely. It’s easy to get started, too. Before packaging this repository up and making it public, we’ll strip out the logic so that the Fellows will have a working framework when they clone the repository, but they’ll have to figure out the interactions themselves. Have a look for yourself.
Many Fellows this session actually went above and beyond, both stylistically and in terms of adding functionality like candlestick charts and time controls. Later in the course, we talk in detail about how to seamlessly overlay differently-scaled data like stock volume onto plots like this, how to add sliders for fine-tuned control, and much more. But for now, this is a good starting point.
If you’ve come up with a particularly cool web app or, better yet, one that pushes the limits of Heroku, send me a link! And if this strikes your fancy, consider applying for our fellowship program.
Editor’s Note: The Data Incubator is a data science education company. We offer a free eight-week Fellowship helping candidates with PhDs and masters degrees enter data science careers. Companies can hire talented data scientists or enroll employees in our data science corporate training.