I really dislike StackOverflow. While I acknowledge that it is sometimes useful, I really don’t like the negativity, the showboating, and the pile-on mentality toward people who deviate at all from a perfectly asked question.

So when I recently realized that a giant archive of StackOverflow comments is available to download (, I saw an opportunity to dig deeper into questions of sentiment in SO comments.

This is not a small dataset. There are 84,369,024 comments in the archive. Unzipped, it took up 24 GB of disk space. This was too much for my little laptop, so I used an external hard drive to store the unzipped file. The file size also makes processing this data a challenge: the XML file is much too big to store in memory, so I had to do some contortions to process the file in much smaller bits.

Ultimately, I was able to use Python’s TextBlob library to trace sentiment in StackOverflow comments over time:

A chart showing sentiment in StackOverflow comments from 2008 to 2022

First of all, I was surprised that StackOverflow comments scored positively in sentiment at all. I was expecting them to be deeply negative.

Second, there are two interesting trends here. From 2008 to 2017, sentiment on StackOverflow was getting increasingly negative. Then in 2018, things turned around and started improving mildly. I don’t think that this is accidental. In this 2018 blog post, StackOverflow acknowledged that they had a negativity problem. They also made it clear that they were going to take action to address the issue. From the data, the steps they have taken seem to have had an effect: sentiment has improved since 2017.

This implies that StackOverflow was successful at improving the tone in their community somewhat, which is interesting.

This entry was posted in sentiment analysis, stackoverflow. Bookmark the permalink. Both comments and trackbacks are currently closed.
  • Subscribe to this blog

Skip to toolbar