2. Transforming deeply nested JSON from APIs into Pandas DataFrame
The `json_normalize` function and the `explode` method in Pandas can be used to transform deeply nested JSON data from APIs into a Pandas DataFrame.
Feb 27, 2025 ยท Samuel Chan
In one of the past workshops we did for Bank Rakyat Indonesia (BBRI), we provided a clean dataset that participants use to construct their analysis on. This article expands on how we transformed the raw JSON data from the API into a Pandas DataFrame.
For context, here is how the data looks like in tabular form:
By comparing the historical P/E ratios of different banks to their peers and their own historical averages, an analyst can get a sense of whether a bank is trading at a premium or discount to its intrinsic value, and whether it is a good value investment relative to its historical norm.
The data, of course, is sourced from Sectors Financial Data API, which is a product incubated by Supertype and powering thousands of financial workflows for the leading financial institutions in Indonesia. In the following sections, we will walk through the process of transforming the raw JSON data from the API into a Pandas DataFrame.
requests
on a Financial API endpoint
To actually use fetch_data
we need to specify a url. This can be constructed from
reading the correponding API documentation. For example, to fetch the historical P/E ratios of BBRI, one could do:
pd.json_normalize
and explode
method
The one line that you should pay attention to from the section above is pd.json_normalize(response['valuation']).T
.
This line of code is responsible for transforming the deeply nested JSON data into a Pandas DataFrame, and then transposing it.
If we were to print bbri
, we would get the following output:
An interesting observation here is that with the json_normalize
function, each key in the JSON object is
flattened into a column in the DataFrame, but the historical_valuation
key is still a list of dictionaries.
To "explode out" the list of dictionaires into separate, individual columns, we can use the explode
method in Pandas.
Let's see how .explode()
works in action:
Notice that each of the nested dictionaries in the historical_valuation
column has been exploded out -- this is the
essence of the explode
method in Pandas, and when we combine it with json_normalize
, we can transform even deeply nested
JSON data into a Pandas DataFrame:
Putting it all together
As a quick recap, whenever an API service (like the one from Sectors Financial API) returns deeply nested JSON data, you can call on these two tools in Pandas to transform it into a flattened, tabular DataFrame:
json_normalize
to flatten the JSON data into a DataFrameexplode
to expand out lists of dictionaries into separate into columns
To arrive at the full code used for the workshop at BBRI, one would just have to perform the above iteratively
for each bank in a defined banks
list before concatenating them into a single DataFrame.
Read More
Tags: api, python, data analysis, pandas