Tag: Gephi

Getz, Parker, Henderson Networks Updated

I’ve added three more jazz artist networks following the same template I used for the initial group I added earlier in the week.

https://jazzgraphs.com/graphs/charlie_parker/

https://jazzgraphs.com/graphs/stan_getz

https://jazzgraphs.com/graphs/joe_henderson

Each of these network graphs show the connections between an artist, his releases, and the songs on each release. The graphs are easy to zoom, pan, and click, with an information panel opening on the left side of the screen. Enjoy!

Shorter, Coltrane, Gordon Networks deployed

After a weekend trying multiple approaches at incorporating images of jazz musicians as backgrounds to their network graphs, I finally hit on a good solution – create icon-like images that display in the upper left corner of the graph background. This allows the network to be easily viewed on multiple devices ranging from phones to widescreen monitors, without interfering with any of the network elements.

Here’s the solution, as shown for the Dexter Gordon network:

Dexter Gordon musical network

Nice and tidy, with an icon-like image created in Powerpoint; the outer circle around the image is colored to match the artist node at the center of the graph for a consistent visual appearance. Even when the sidebar is shown and the graph is zoomed in, this approach works well:

Dexter Gordon musical network zoomed

Note that the musician image is combined with the background color and the circle shape in Powerpoint, and is then saved as a picture we can use in our CSS file for the network. Here’s a look at the styling elements for this graph:

#carte {    
    position: absolute;  
    background-image: url('../img/Gordon_20230306.png');
    background-repeat: no-repeat;
    background-size: cover;
    left: 0px;
    width: 100%;
    height: 100%;
}

We ensure that the background image is set to cover the entire screen by using the background-size attribute and setting it to ‘cover’. This enables the background to adjust to larger screen sizes seamlessly – no awkward edges to be seen!

From my perspective, this approach solves two issues in a very nice way:

  • First, it provides a consistent look & feel to the networks, regardless of which artist we select
  • Second, it is very production-friendly; I can use the same background and circle while adding in a new artist picture. This provides a very efficient solution for creating future graph networks

I’ll be adding more permanent links to all of these networks, but for now, here are the initial three:

https://jazzgraphs.com/graphs/dexter_gordon

https://jazzgraphs.com/graphs/wayne_shorter

https://jazzgraphs.com/graphs/john_coltrane

Thats’ it for now – have fun with the networks and thanks for reading!

Refining the Musician Networks, Part 1

I recently posted about the six new saxophonist networks I created using MusicBrainz data and Gephi, and have subsequently created another four, including networks for two of the acknowledged giants of the instrument – Charlie Parker and John Coltrane. However, as I was digging deeper into the data I realized that there are a lot of redundancies in the data due to a couple of grammatical issues. There are two major issues I have now addressed that will make for cleaner networks.

Here’s what these networks currently look like:

Snapshot of Dexter Gordon graph network

I expect to replace these with cleaner, more logical graphs after making this pair of changes. The end result will have fewer nodes and fewer edges crossing each other to connect nodes.

The first change to address is the subtle but important difference between an apostrophe character ( ‘ ) and the similar yet slightly different grave ( ` ) character. Each one is used to represent an apostrophe in the MusicBrainz release.name field, leading to duplicate entries that are actually the same song. For example, ‘Round Midnight versus `Round Midnight. Subtle difference, right? But one that my postgres queries and ultimately Gephi see as two unique songs, cluttering the network graph unnecessarily. So how do we fix this issue in the data?

I first created a new version of the releases table, just in case something went wrong as I tried to make any updates. We now have an empty table with all the same attributes as the original. Step 2 is to populate the new table with a simple SELECT INTO statement:

select * into public.release_new  from public.release r

The next step is a bit trickier since it involves an apostrophe character, which postgres treats as quotation marks for other characters. We have to use some additional formatting to convince postgres that we really do want to replace all our ` characters with ‘ characters. Here’s the code I used (there are several ways to do this):

UPDATE 
   public.release_new
SET 
   "name"  = REPLACE(name,'`',E'\'')
WHERE 
"name" > '0'

Without going into too much detail, we are telling our query to find all ` characters and replace them with an apostrophe. I recognize this might mess up some cases where there is an actual grave accent on a song name, but we will now have a consistent approach rather than two slightly different characters throwing us off. The goal is to ensure that we recognize a given song as a single node for our graph as much as humanly possible.

We have a second issue to correct for, but this one can be done within our query rather than updating the database. In this case, some songs are listed in upper case in one place, and then as lower case in another. We could force all text to lower case for a match, but that is less than ideal. The same holds true for upper case; we don’t want our graph labels to be all caps. A third solution is to use the INITCAP function in postgres, like this:

select b.id, b.name, b.label, b.type, SUM(b.size)as size
FROM
(SELECT DISTINCT INITCAP(a.id) as id, INITCAP(a.name) as name, INITCAP(a.label) as label, a.type, a.size

INITCAP forces all first letters to upper case while leaving the other letters as they were. It’s not a perfect solution; apostrophes cause us a problem here too, but it’s perhaps a 99% solution. By correcting the apostrophe format and then using INITCAP, we now have a much cleaner query result for Gephi. As an example, the nodes query for Joe Henderson now returns 269 records, versus the original 285, an improvement of > 5%. This should certainly help clean up our graphs, as it will also reduce the number of edges connecting the nodes.

In part 2 of this series I’ll show the impact these changes have on our network graphs. The beauty of these changes is that I can apply the logic to all future graphs. Thanks for reading, and see you soon.

Six New Saxophonist Networks

Decided to play with the latest version of Gephi by creating a new musician network, and wound up creating six using MusicBrainz data. This was a fun project and will be followed by additional work covering more great jazz musicians. Here’s a quick screenshot of one of the graphs showing the artist, album releases, and songs associated with those releases:

Dexter Gordon music network

I’ll post the links to each network below and then take a walk through the creation process. Note that the graphs are all interactive, with panning, zooming, edge removal, and other features all available. More on those features later in the post. Here are the links to each network graph:

Note that there are at least two major omissions among saxophonists – Charlie Parker and John Coltrane, and of course some other notable names. I’ll plan to address those omissions in a future post.

To create each of these graphs I followed a simple set of steps and then used the same settings to create graphs with a consistent look & feel. The goal is to have users focus on the structure and content of the network as opposed to having to deal with changing shapes, sizes, and colors. Perhaps I’ll alter this for different instruments – piano or trumpet, for example may have a different color palette. For now, the color palette I have used conveys an appropriately jazzy aura, with the dark background and contrasting pastel-like node colors and subtle gray edges connecting the nodes.

The data for this project is sourced from the impressive MusicBrainz database. Note that MusicBrainz data covers many genres beyond jazz, but for my current purposes the focus is on jazz. I have created a local version of the data using DBeaver for writing and running SQL queries to retrieve data for ingestion by Gephi. DBeaver is a great solution for me – all of my other databases are in MySQL, while the MusicBrainz data is in PostgreSQL format. No problem, as DBeaver can handle both types (as well as many other data formats) with ease.

Here’s an example of the code used for node creation for Sonny Rollins:

SELECT a.* 
FROM
((SELECT CONCAT(ac.name, ' (Artist)') AS id, ac.name AS name, ac.name AS label, 'Artist' AS type, 50 AS size

FROM public.artist_credit ac

WHERE ac.id = 21832)

UNION ALL

(SELECT CONCAT(r.name, ' (Release)') AS id, r.name AS name, r.name AS label, 'Release' AS type, COUNT(DISTINCT rl.release) AS size

FROM public.release r
INNER JOIN public.artist_credit ac
ON r.artist_credit = ac.id
INNER JOIN public.medium m 
ON r.id = m.release
INNER JOIN public.medium_format mf 
ON m.format = mf.id
INNER JOIN public.release_label rl
ON r.id = rl.release
INNER JOIN public.label l
ON rl.label = l.id

WHERE r.artist_credit = 21832 
GROUP BY r.name
)

UNION ALL 

(SELECT CONCAT(ta.name, ' (Song)') AS id, ta.name AS name, ta.name AS label, 'Song' AS type, COUNT(DISTINCT rl.release)
AS Size
FROM public.release r
INNER JOIN public.artist_credit ac
ON r.artist_credit = ac.id
INNER JOIN public.medium m 
ON r.id = m.release
INNER JOIN public.medium_format mf 
ON m.format = mf.id
INNER JOIN public.release_label rl
ON r.id = rl.release
INNER JOIN public.label l
ON rl.label = l.id
INNER JOIN public.track t
ON m.id = t.medium
INNER JOIN public.track_aggregate ta
ON t.name = ta.name

WHERE r.artist_credit = 21832 
GROUP BY ta.name
)) a

While the code may appear complex, it’s goal is simple – retrieve all releases and songs for the artist Sonny Rollins, who has the ‘21832’ id. This code creates nodes for the artist (first section), all releases (second section) and all songs (third section). It uses the UNION ALL statement to combine the three sections into a single output file.

We then run similar code to create an edges source file:

SELECT a.*
FROM
((SELECT CONCAT(ac.name, ' (Artist)') AS source, CONCAT(r.name, ' (Release)') AS Target, 'Artist' AS source_type, 'Release' AS target_type
FROM public.artist_credit ac
INNER JOIN public.release r
ON ac.id = r.artist_credit
INNER JOIN public.release_label rl
ON r.id = rl.release
WHERE ac.id = 21832)

UNION ALL

(SELECT CONCAT(r.name, ' (Release)') AS Source, CONCAT(ta.name, ' (Song)') AS Target, 'Release' AS source_type, 'Song' AS target_type
FROM public.release r
INNER JOIN public.release_label rl
ON r.id = rl.release
INNER JOIN public.medium m
ON r.id = m.release
INNER JOIN public.track t
ON m.id = t.medium
INNER JOIN public.track_aggregate ta
 ON t.name = ta.name
 
WHERE r.artist_credit = 21832)) a
GROUP BY a.source, a.target, a.source_type, a.target_type

This output will instruct Gephi to use the artist as a source node and all releases as target nodes (first section) and then to use all releases as source nodes with songs as target nodes. Think of this as a hierarchy of Artist –> Releases –> Songs where individual songs are associated with the release they appeared on. Of course, in jazz, many of the most popular songs will appear connected to multiple releases, ultimately making for a more interesting graph.

Now that we have created the source files, let’s shift to Gephi to see how we use them.

Gephi allows us to pull in spreadsheet files as long as they meet certain criteria. Node files should have a name, label, id, and preferably a size attribute, although this can be created within Gephi based on the data. Edge files must have source and target fields, and ideally a weight value corresponding to the strength of network connections.

Here’s our data after ingestion, starting with the nodes:

Dexter Gordon nodes in Gephi

I forget to mention the usefulness of having a ‘type’ column; this will make it simple to set node colors in Gephi. Now the edges file:

Dexter Gordon edges in Gephi

You can see the source and target values, which are critical to how the graph will be displayed. Our edge weights are all set to 1 in this network, but frequently we will have varying numbers to indicate stronger versus weaker connections.

Here’s our completed graph in Gephi, after using a number of settings:

  • Setting the node colors by type in the Partition tab
  • Sizing the nodes in the Ranking tab
  • Choosing a layout algorithm – Force Atlas 2 is a popular choice
  • Scaling the graph to an appropriate size
  • Preventing overlap of nodes

This process can be very iterative, playing with different settings until you are pleased with the results. For graphs like this with hundreds of nodes, different options can be tried very quickly.

Dexter Gordon graph in Gephi

The next step is to export the underlying data as a graph file – .gexf is my choice for the web template I use. Here’s a small subset of the Dexter Gordon .gexf file showing the name, type, and size associated with each node.

Data from the Dexter Gordon .gexf file

Next, we can update settings in the config.js file; These will adjust the display parameters for the nodes and edges; note that there is also a .css (Cascading Style Sheet) file where many more modifications can be made.

Set graph options in the config.js file

Finally, we have the index.html file that contains links to several scripts as well as the config file. This is where we can also add a title and small bits of information about the graph content.

The index.html file is used for titles, script references, and other display options

I’ll be creating additional network graphs using this same end to end approach. The process becomes easier once the code has been tested and validated, and the settings have been standardized in Gephi and the resulting output files; much of the effort will simply involve copying and pasting existing settings. Watch this space for new graphs, and thanks for reading!

The Music of Charles Mingus

In honor of the recently released Mingus live recording from 1973, Jazz in Detroit/Strata Concert Gallery/46 Selden, I have created a Mingus network graph detailing his many recordings and songs, using data from the MusicBrainz database. This new recording is not part of the network, but many other classic Mingus works are detailed.

This graph shows Mingus at the center, surrounded by his many recordings, with individual songs connected to the recordings where they appear. Songs are sized based on the number of recordings they appeared on, providing a quick glimpse into Mingus’ most notable tunes. Use the zoom and pan controls as well as the search box to navigate the graph. The data is sourced from the MusicBrainz database, while the graph is built using Gephi and sigma.js.

Here’s a link to the live version – enjoy!

Building JazzGraphs, Part 1

As the initial versions of my JazzGraphs networks are being tested, I thought this would be an appropriate time to walk through the process I use for moving the data from raw database form to the eventual network graph output shared on this site. Mind you, I am still iterating through the data, testing what works best for graph creation, and otherwise seeking to create some memorable output that pays homage to many of the great jazz artists of the last 100 years. Ultimately, the process could change a bit, but it feels like things are pretty stable for the moment. So here goes with a basic outline for the steps currently being taken to create the graphs.

Of course it all starts with the data. Without a good data source, these graphs would be tremendously challenging to create. One could do some web scraping, endless Google searches, or even go old school with trips to the local library. None of these approaches are efficient nor would they yield the amount of data I would like to have for this project. Enter MusicBrainz, a quite remarkable site devoted to building the most comprehensive source of all things musical. Their stated mission is to be:

    1. The ultimate source of music information by allowing anyone to contribute and releasing the data under open licenses.
    2. The universal lingua franca for music by providing a reliable and unambiguous form of music identification, enabling both people and machines to have meaningful conversations about music.

While the content in any such site is never perfect (think Wikipedia), it nonetheless provides a fantastic starting point for projects such as JazzGraphs.

Fortunately for me (and other SQL coders), the MusicBrainz database is available in a PostgreSQL format, which is very similar to the MySQL databases more familiar to me. So the learning curve has been anything but steep in technical terms. The greater challenge has come in the sheer number of tables and fields in the database, and figuring out which ones are highly populated versus the ones intended for future growth. Another interesting aspect is the multiple spelling variations available for a single song or even an album release. Such are the challenges with public contributions; fortunately they are more than offset by the impressive level of detail available for many artists.

To add some visual perspective, here’s a view of the MusicBrainz schema I am tapping into for the JazzGraphs data:

So now you have a basic idea for what the database looks like, how joins are implemented, and how the general naming conventions work. Much of the artist information is found in the upper left of the diagram, with release details in the lower left. Works (songs) can be seen in the lower right quadrant, while the upper right is largely concerned with location and label information. Note that this is but a small subset of the more than 300 tables in the database.

Once the data is available, the challenge is to understand all the relationships within the data, as well as where to find the most robust tables and fields. As I noted earlier, not all tables are populated equally at this point, so it is critical to work within the framework of what’s currently in the database. Fortunately, for a prominent artist such as Miles Davis, we find plenty of data in some of the key tables. This enabled me to begin testing SQL code, and to also envision how I might wish to display the data in a network graph. After a few rounds of testing, I arrived at the conclusion that using a tri-modal approach might be the best approach. In other words, the artist (Miles Davis, in this case) connects directly to specific releases (an album or CD), which in turn connect directly to the songs on the release. What makes this approach appealing is the fact that many songs show up on multiple releases; otherwise, we would wind up with a simple hierarchical (think organization chart) output, and that wouldn’t be much fun 🙂 .

The next step is to create output for use in Gephi. Recall that in most cases, network graphs require both node and edge inputs. What do these represent in our current case? Since nodes are the graphical representation of specific objects or entities, we have three types in this analysis:

    1. The artist (Miles Davis)
    2. All releases (Kind of Blue, Bitches Brew, etc.)
    3. All songs (So What, Milestones, etc.)

Each of these elements should have a single node; we can later adjust for frequency by sizing the nodes in our graph. For example, if ‘So What’ shows up on 20 releases, we want to display a single node that connects to each of those releases, and then size the node based on frequency. More on that in a bit. So our initial code will focus on creating these nodes, as shown below, using a UNION query to place all of our nodes in a single result, which can then be exported to a .csv file for Gephi to ingest. Here’s the code:

SELECT a.*
FROM
((SELECT CONCAT(ac.name, ‘ (Artist)’) AS id, ac.name AS name, ac.name AS label, ‘Artist’ AS type, 50 AS size

FROM public.artist_credit ac

WHERE ac.id = 1954)

UNION ALL

(SELECT CONCAT(r.name, ‘ (Release)’) AS id, r.name AS name, r.name AS label, ‘Release’ AS type, COUNT(DISTINCT rl.release) AS size

FROM public.release r
INNER JOIN public.artist_credit ac
ON r.artist_credit = ac.id
INNER JOIN public.medium m
ON r.id = m.release
INNER JOIN public.medium_format mf
ON m.format = mf.id
INNER JOIN public.release_label rl
ON r.id = rl.release
INNER JOIN public.label l
ON rl.label = l.id

WHERE r.artist_credit = 1954
GROUP BY r.name
)

UNION ALL

(SELECT CONCAT(ta.name, ‘ (Song)’) AS id, ta.name AS name, ta.name AS label, ‘Song’ AS type, COUNT(DISTINCT rl.release)
AS Size
FROM public.release r
INNER JOIN public.artist_credit ac
ON r.artist_credit = ac.id
INNER JOIN public.medium m
ON r.id = m.release
INNER JOIN public.medium_format mf
ON m.format = mf.id
INNER JOIN public.release_label rl
ON r.id = rl.release
INNER JOIN public.label l
ON rl.label = l.id
INNER JOIN public.track t
ON m.id = t.medium
INNER JOIN public.track_aggregate ta
ON t.name = ta.name

WHERE r.artist_credit = 1954
GROUP BY ta.name
)) a

As you may have gathered, the ‘1954’ value for the artist_credit corresponds with Miles Davis’ unique id in the database. The beauty of this code is that we need only modify that id in each of the 3 code sections to pull the same type of detail for another artist. So our code is highly reusable. Once our results have been returned, we simply export the results to a .csv format.

Now that the node code has been completed, it’s time to focus on the creation of edges, the connections between nodes. In this case, we want releases to connect only to the artist, and songs to connect only to releases. Once again we’ll use SQL UNION logic to merge results from each section into a single set of results for use in Gephi. Here’s a look at the code for edge creation:

SELECT a.*
FROM
((SELECT CONCAT(ac.name, ‘ (Artist)’) AS source, CONCAT(r.name, ‘ (Release)’) AS Target, ‘Artist’ AS source_type, ‘Release’ AS target_type
FROM public.artist_credit ac
INNER JOIN public.release r
ON ac.id = r.artist_credit
INNER JOIN public.release_label rl
ON r.id = rl.release
WHERE ac.id = 1954)

UNION ALL

(SELECT CONCAT(r.name, ‘ (Release)’) AS Source, CONCAT(ta.name, ‘ (Song)’) AS Target, ‘Release’ AS source_type, ‘Song’ AS target_type
FROM public.release r
INNER JOIN public.release_label rl
ON r.id = rl.release
INNER JOIN public.medium m
ON r.id = m.release
INNER JOIN public.track t
ON m.id = t.medium
INNER JOIN public.track_aggregate ta
ON t.name = ta.name

WHERE r.artist_credit = 1954)) a
GROUP BY a.source, a.target, a.source_type, a.target_type

Pretty simple – in the first section, the artist represents the source node, and all releases become the target nodes (source and target values are essential to Gephi edge creation). The second section adds releases as the source nodes, and songs as target nodes. This will give us the tri-modal network structure I spoke of earlier.

Next comes the fun part (not to say that coding can’t be fun 🙂 ) where we start to use Gephi to create our network.The next post will examine how we pull the data into Gephi and start creating our network. Hope you found this informative and helpful, and thanks for reading!

Visualizing Miles

I’ve been spending some time working with data from the MusicBrainz site, and exploring various ways to create networks using Gephi. My initial explorations focus on the vast musical network of Miles Davis, as seen through album releases and the many songs Miles recorded. Here’s a look at one such iteration, wherein Miles is connected to releases, which are in turn connected to songs. Of course, many of the songs are associated with multiple releases, making for an interesting graph displaying all the connections between artist, releases, and songs.

Miles can be found to the far right of this graph, with dozens of connections flowing outbound to his many releases. In the web-based version below (built using sigma.js), we can see a bit more detail and structure in the network:

Now Miles can be seen clearly, as we have enlarged his node to draw attention to him as the focal point of the network. We can also begin to see some of the most frequently released songs as larger pink circles. Tunes like ‘So What’ and ‘Milestones’ appear on many releases, and are sized accordingly. Of course, one of the best aspects of deploying the network to the web is the ability to offer interactivity, where users can zoom, pan, click, and otherwise navigate the network to learn more. If you wish to do so, click here to open the network in a new tab.

Note that this is an unfinished product at this point, despite being several iterations in the making. I have yet to resolve spelling differences that make one song appear to be many different tunes (‘Round Midnight is a classic example), and I also plan to make some other modifications. Having said that, it feels like we’re close to a working template that will allow for depicting the networks of so many of the heroes of jazz – Coltrane, Monk, Ellington, Parker, Mingus, and many more.

So stay tuned for periodic updates and improvements, and thanks for reading!