Tag: data visualization

Visualizing Musical Genres, Part 2

My last post showed a little bit about the musical genre data structure in the MusicBrainz database. In this post we’ll expand our view to include all genres and sub-genres, and look at a few visualization approaches using Flourish.

Flourish provides several options for visualizing hierarchical data; in this post we’ll look at some of the advantages and disadvantages of each approach. Ultimately, my goal is to categorize all my CDs and vinyl using this approach, but for now we’ll work with the MusicBrainz genre data.

We had a quick look at a sunburst chart in the prior post, so we’ll begin there with the much larger dataset we now have. How does it work?

Sunburst chart displaying all genres and sub-genres

Hmmm…it’s a little challenging to see the data beyond the first few genres (these are the ones with the most sub-genres). We can narrow our focus by using the filter or by clicking on one of the inner circle genres. Let’s look at the rockgenre:

Sunburst chart for the rock genre

That’s a bit better; note that each sub-genre has an identical size here, something that will change once I feed my own music collection into Flourish. At least we can now identify all the sub-genres in the data.

What about a treemap approach? Treemaps can be useful in showing categories and sub-categories, sized by count or some other value (revenue, sales, profit, etc.). Here’s a look at all the data:

Treemap with all genres

Once again, it’s a challenge to see anything beyond the most frequently occurring genres; even if we provide a pop-up label it’s not very user-friendly. Let’s filter down, this time in the electronicgenre:

Treemap filtered by electronic genre

Here we get a similar result to the sunburst, albeit in a different layout. Again, this could be more interesting with an actual record collection, where each sub-genre would potentially be sized differently, with some not even appearing (i.e.- no recordings in a sub-genre).

Our next example will use circles, an approach sometimes known as circular packing. All genres will be arranged in a somewhat random layout, rather than the radial or rectangular formats we have just seen. Here is a look at all genres:

All genres in a circle layout

Once more, we have a similar issue to the sunburst and treemap displays, although it is fairly easy to see the highest frequency genres in the center. Filtering on the popgenre yields a series of identical sized circles for all pop sub-genres:

Circle chart for the pop genre

The circle approach is perhaps my least favorite of the three we have seen thus far, due to the seemingly more random placement of the individual circles.

At the opposite end of the spectrum we can use bars to view the same data. Here we are able to clearly see the rank order and relative frequency for each genre:

Partial view of all genres using bars

This looks really good for the high frequency genres – clear labels with easy to distinguish relative frequencies. The downside is when we have hundreds of genres; our bar chart becomes incredibly tall from top to bottom. In short, this approach will be effective for a limited number of genres, although the same could be said for the other methods.

Our final approach uses a radial tree option in Flourish. This method most closely mimics the sunburst option, with results laid out in a circle; genres can then be clicked on or filtered to get to the sub-genre level. Here are all genres:

Radial axis chart with all genres

Not exactly helpful, is it? There are simply too many genres and sub-genres to display; even the sunburst chart provided more information at first glance. But what about when we select a single genre, such as reggae?

Radial tree for the reggae genre

That’s better! We now have a clear, concise display to work with. This could prove to be useful when we have different size values for each sub-genre; in essence it will merge the best aspects of the sunburst and bar displays. I’ll be interested in seeing this sort of display when my music collection data is complete to see how well it handles differing sizes.

So which approach is best? I’m going to say that it depends on the underlying data; none of these charts was great when we attempted to view all genres at once, but they do appear to offer potential when the data has fewer categories (genres). Personally I like the sunburst and radial methods for the clarity of their display coupled with the visible connection between the sub-genres and the parent genre. I’m eager to see how they work with a more typical dataset.

That’s it for now – hope you enjoyed this, and thanks for reading!

Miles Davis Song Plots

In this blog we’re going to use Flourish with more MusicBrainz data to plot the length of Miles Davis songs on a range of vinyl releases. This type of data often suggests the use of a scatter plot with an x-y axis to best visualize the information. For instance, we could place record labels on the x-axis, and the length of each song (in seconds) on the y-axis. However, with record labels being a categorical variable (i.e.- discrete values such as Sony, Columbia, etc.) there are better options for understanding the data versus a true scatter plot.

The first of these is a boxplot, which provides the ability to see the distribution of data (song lengths) by record label. Let’s take a look at this data in Flourish:

Here we have limited the data display to a single label (showing all was quite messy!). Select CBS or Columbia to see labels with many Miles Davis releases. We now see the median length of a recording, as well as the 25th percentile (bottom of the box) and the 75th percentile (top of the box). It’s also easy to see individual songs that lie below or above the typical range; in statistical terms, these are called outliers. On our plot, they represent songs that are either much shorter than normal (below the extended line) or longer than normal (above the extended line).

This is all useful information, but presents some limitations. Boxplots are very good at doing the aggregations for us while obscuring the individual data values, especially values that lie inside the box. To improve our ability to see those values we turn to a violin plot, which excels at showing the shape of a distribution, rather than the fixed shape provided by the boxplot. We have also combined a beeswarm plot with the violin plot so we can see every individual value:

Again, select CBS or Columbia to view a label with many releases/songs to understand why we elected to use this approach. Hover over individual points to learn more about an individual song – it’s length, release, artist, label, and song title. For me, this approach is best if I’m trying to explore the data; the boxplot is great when I’m interested in overall patterns. Both are powerful tools suited to their individual strengths.

I’ll be using Flourish to interrogate the MusicBrainz data further in future posts, but that’s it for now. Thanks for reading!