Skip to content Skip to footer

Grow a Treemap with Python and Plotly Express | by Lee Vaughan

Turn a government PDF into a financial planning tool

A photo looking up the trunk of a tree in autumn.
Photo by Robert Murray on Unsplash!

Hierarchical data is a data model where items are linked to each other in parent-child relationships, forming a tree structure. Some obvious examples are family trees and corporate organization charts.

A treemap is a diagram that represents hierarchical data using nested rectangles. The area of each rectangle corresponds to its numerical value. Treemaps have been around for about 30 years. An early application was to visualize hard drive usage, as demonstrated in the figure below.

A treemap of nested rectangles in yellows and oranges.
Allocation of hard disk space visualized with a treemap (Carnivore1973 via Wikimedia Commons)

Treemaps let you capture both the value of individual categories and the structure of the hierarchy. They’re useful for:

  • Displaying hierarchical data when the number of categories overwhelms a bar chart.
  • Highlighting proportions between individual categories and the whole.
  • Distinguishing categories using different sizes and colors.
  • Spotting patterns, primary contributors, and outliers.
  • Bringing a fresh look to data visualization.

In this Quick Success Data Science project, we’ll use Python to create a treemap to help people budget for expenses. We’ll first use the tabula-py library to turn a web-based Bureau of Labor Statistics PDF into a pandas DataFrame. Then we’ll use the Plotly Express library to turn the DataFrame into an attractive and interactive area-based visualization.

With the pandemic and subsequent inflationary surge, consumer spending has drawn a lot of attention. Individuals need to know how to budget effectively, and policymakers need to understand what sectors are causing the most financial burden to potential voters.

To help track income and spending, the Census Bureau uses Consumer Expenditure Surveys to collect information on US consumers’ expenditures, income, and demographic characteristics. The Bureau of Labor Statistics (BLS) then compiles these statistics into annual

Source link