Measuring Wikipedia Signpost popularity
01/17/2016I briefly spoke of my involvement with the Wikipedia Signpost earlier in my blog post essay on the negotiation between Openness and Quality in the Wikimedian movement. My involvement with the Signpost has dampened somewhat lately: I've been busy with other projects and haven't had as much time as I did in the past to write articles. I still participate in the editorial board and still read every issue, of course, but more of my work is in the secondary technical elements of the newspaper. You can see the list of every article I've contributed here.
The Signpost, like the English Wikipedia that hosts it, is an organic entity that has come about as a result of over a decade of steady innovation. By the time I became active again there in early 2015 its internal organization had descended into an unfortunate maelstrom of half-active pages, unused template code, and decade-old discussions still nested into hidden talk pages that hadn't been visited by actual human in years. I took it upon myself to refresh the project's technical organization, a lengthy singular effort that took months of slow progress to complete. I created content guidelines, fixed the layout (with guidelines!), reworked the default templates we use for creating articles, constructed a new submission queue, wrote coordination guidelines, wrote a technology report republication script, wrote a featured content publication script (saving us at least an hour of work every week), wrote a Blog Importer webtool on Wikimedia Labs (saving us at least 20 minutes every time we republish something), and introduced interactive polls and indexing into our stories. All a labor of love, usually during school hours no less...
Continuing along with that theme, this week I put together a Jupyter notebook analyzing page view information for Signpost stories using the new (incredibly long overdue) pageview API. We on the Signpost board have for ages now wanted to run some analysis on our stories to see what it is that readers like the most and the least so that we can better target our publication efforts, and now I've finally gone and done it. The key takeaways are:
- The average Signpost story gets 1550 pageviews, but with a crazy amount of variance, from low 900s to 5000 or more. My own comparison: though my school newspaper probably has higher average circulation, it sure as heck has fewer readers. Sorry Ticker.
- The most popular stories are, as expected, in-depth special reports and community-written op-eds, which often advance interesting or controversial or just in-depth points of view or stories and which reliably get over 2000 hits per publication.
- The regular "News and notes" and "In the media" sections are most popular when they focus on one big story, instead of an amalgamation of small ones as we used to have in the past.
- Surprisingly, 20% of hits happen outside of the current news cycle.
- The Blog section (republications of reports from the Wikimedia Blog) is surprisingly poorly read. Galleries (photo compositions) are also not popular. Nor is the Technology report, but this makes sense: it is just a script-based copy-paste job.
- Readers want controversy!
- Jupyter notebooks are amazing and I now want to do everything in them.
I presented a lightning talk on this topic at yesterday's NYC Wikipedia Day 2016 celebratory conference: the video is below. You can see the data for yourself on GitHub.
Addendum: I extended this analysis a bit further by looking at spikes in Signpost viewership due to links from popular websites. You can read all about it here.