Bob Poekert's Web-Log

Feb 05, 2016

Twitter Star Chart

Twitter Star Map

Here shown is a chart of the top ten thousand websites linked to on twitter. You can play with an interactive version of it here.

Before I get into the details of how I made the chart, here are some interesting things that fall out from it:

  • Spambot networks are really obvious. It's pretty embarrassing for twitter that having such blatant spambot networks is viable
  • Conservative twitter is much more densely connected than liberal twitter. That means that people who tweet links to places like The Drudge Report are less likely to tweet links to other sorts of sites than people who tweet links to places like The Daily Kos
  • Liberal twitter is closer to mainstream twitter than conservative twitter is. This might just be a consequence of conservative twitter being more densely connected
  • Japanese social media twitter (which I'm labelling as "2ch", though it's not just 2ch) is almost completely distinct from what I'm calling "upstanding japanese twitter" (links to mainstream news sites like news24)
  • Hacker News is in the 2ch cluster. It's much closer to nicovideo.jp and pixiv.net than it is to techcrunch
  • Quran quotes are very popular on twitter, but the quote sites are frequented by a relatively tight-knit corner of twitter
  • archive.org (not including the wayback machine, which is web.archive.org) is in the cluster with the Quran quote sites
  • What I'm calling "mommy blog twitter" (containing sites like bayareamommy.net and everythingmommyhood.com) is pretty isolated from the rest of twitter. People who tweet links to mommy blogs are less likely to tweet links to other things.
  • Livejournal is (or was) a popular place to host spam blogs to link to from your twitter spam bots

Here's how to make the chart for those wanting to play along at home:

  • Record the twitter sample stream over the course of about a year (in 2013 and 2014 in my case)
  • For each tweet that contains a link, pull out the website that the link was pointing to and the user id of the tweet author
  • For each pair of websites, count the number of people that tweeted at least one link to both of them
  • Take that co-occurernce matrix, treating each row as a point in a high-dimensional space, and run it through t-SNE to generate a 2d embedding that minimizes the change in the distances between the points from high dimensions to two
  • That embedding is what you're looking at