Push pandas DataFrames to Tableau Server/Online
Introduction
My stack at the time was Python for blending and cleaning data, then pushing the output to Tableau Online (the SaaS version of Tableau Server). Most dashboards needed daily refreshes, some hourly. Here’s how to automate the refresh with Tableau Server Client and pandleau.
Getting started
-
Standard Python data stack (pandas, numpy). Easiest path is Anaconda.
-
Install Tableau’s Extract API 2.0. Download the .zip, unzip, and on *nix run
sudo python setup.py install. We only need.hyperextracts. -
Install Tableau Server Client:
pip install tableauserverclient -
Install pandleau:
pip install pandleau --no-depsOne of its deps had issues —
--no-depsbypasses it.
Automating the refresh
Imports:
import pandas as pdimport tableauserverclient as TSCfrom pandleau import *Load the data, wrap it in a pandleau object, write a .hyper. For spatial data, specify the column — see pandleau’s README.
df = pd.read_csv('yourdata.csv')df = pandleau(df)df.to_tableau('/mydir/mydata.hyper', add_index=False)Authenticate to Tableau Server:
# Don't hard-code credentials in real usetableau_auth = TSC.TableauAuth('you@example.com', 'password123', site_id='mysite')server = TSC.Server('https://us-east-1.online.tableau.com/', use_server_version=True)server.auth.sign_in(tableau_auth)Find the project id. Use TSC.Pager because .get() only returns the first 100 records:
projectname = 'Business Intelligence'print([(p.name, p.id) for p in TSC.Pager(server.projects) if p.name.startswith(projectname)])Publish the .hyper to that project. If a datasource with the same name already exists, Overwrite replaces it:
mydatasourceitem = TSC.DatasourceItem('myprojectid_12345', name='myfirstdatasource')item = server.datasources.publish(mydatasourceitem, '/mydir/mydata.hyper', 'Overwrite')print("{} published with id: {}".format(item.name, item.id))
server.auth.sign_out()Wrapping up
Schedule with crontab -e to refresh on a cadence.
Wrapped up as a function:
import pandas as pdimport tableauserverclient as TSCfrom pandleau import *
def publish_to_tableau(df, folder_path, projectid, datasource_name, auth_list, site='yoursite'): """Publish a pandas DataFrame to Tableau Online.
Args: df: DataFrame to publish folder_path: where to stage the temp .hyper projectid: Tableau Server project id datasource_name: name of the published datasource auth_list: [username, password] site: Tableau site """ pandleau(df).to_tableau(folder_path + 'temp.hyper', add_index=False)
tableau_auth = TSC.TableauAuth(auth_list[0], auth_list[1], site_id=site) server = TSC.Server('https://us-east-1.online.tableau.com/', use_server_version=True)
with server.auth.sign_in(tableau_auth): mydatasourceitem = TSC.DatasourceItem(projectid, name=datasource_name) item = server.datasources.publish(mydatasourceitem, folder_path + 'temp.hyper', 'Overwrite') print("{} published with id: {}".format(item.name, item.id))← Back to blog