Push pandas DataFrames to Tableau Server/Online Push pandas DataFrames to Tableau Server/Online

Push pandas DataFrames to Tableau Server/Online

Introduction

My stack at the time was Python for blending and cleaning data, then pushing the output to Tableau Online (the SaaS version of Tableau Server). Most dashboards needed daily refreshes, some hourly. Here’s how to automate the refresh with Tableau Server Client and pandleau.

Getting started

  • Standard Python data stack (pandas, numpy). Easiest path is Anaconda.

  • Install Tableau’s Extract API 2.0. Download the .zip, unzip, and on *nix run sudo python setup.py install. We only need .hyper extracts.

  • Install Tableau Server Client:

    pip install tableauserverclient
  • Install pandleau:

    pip install pandleau --no-deps

    One of its deps had issues — --no-deps bypasses it.

Automating the refresh

Imports:

import pandas as pd
import tableauserverclient as TSC
from pandleau import *

Load the data, wrap it in a pandleau object, write a .hyper. For spatial data, specify the column — see pandleau’s README.

df = pd.read_csv('yourdata.csv')
df = pandleau(df)
df.to_tableau('/mydir/mydata.hyper', add_index=False)

Authenticate to Tableau Server:

# Don't hard-code credentials in real use
tableau_auth = TSC.TableauAuth('you@example.com', 'password123', site_id='mysite')
server = TSC.Server('https://us-east-1.online.tableau.com/', use_server_version=True)
server.auth.sign_in(tableau_auth)

Find the project id. Use TSC.Pager because .get() only returns the first 100 records:

projectname = 'Business Intelligence'
print([(p.name, p.id) for p in TSC.Pager(server.projects) if p.name.startswith(projectname)])

Publish the .hyper to that project. If a datasource with the same name already exists, Overwrite replaces it:

mydatasourceitem = TSC.DatasourceItem('myprojectid_12345', name='myfirstdatasource')
item = server.datasources.publish(mydatasourceitem, '/mydir/mydata.hyper', 'Overwrite')
print("{} published with id: {}".format(item.name, item.id))
server.auth.sign_out()

Wrapping up

Schedule with crontab -e to refresh on a cadence.

Wrapped up as a function:

import pandas as pd
import tableauserverclient as TSC
from pandleau import *
def publish_to_tableau(df, folder_path, projectid, datasource_name, auth_list, site='yoursite'):
"""Publish a pandas DataFrame to Tableau Online.
Args:
df: DataFrame to publish
folder_path: where to stage the temp .hyper
projectid: Tableau Server project id
datasource_name: name of the published datasource
auth_list: [username, password]
site: Tableau site
"""
pandleau(df).to_tableau(folder_path + 'temp.hyper', add_index=False)
tableau_auth = TSC.TableauAuth(auth_list[0], auth_list[1], site_id=site)
server = TSC.Server('https://us-east-1.online.tableau.com/', use_server_version=True)
with server.auth.sign_in(tableau_auth):
mydatasourceitem = TSC.DatasourceItem(projectid, name=datasource_name)
item = server.datasources.publish(mydatasourceitem, folder_path + 'temp.hyper', 'Overwrite')
print("{} published with id: {}".format(item.name, item.id))

← Back to blog