
In this report, we extract information about published JOSS papers and generate
graphics as well as a summary table that can be downloaded and used for further analyses.

Collect information about papers

Pull down paper info from Crossref and citation information from OpenAlex

We get the information about published JOSS papers from Crossref, using the rcrossref R package. The openalexR R package is used to extract citation counts from OpenAlex.

Pull down info from Whedon API

For each published paper, we use the Whedon API to get information about pre-review and review issue numbers, corresponding software repository etc.

Combine with info from GitHub issues

From each pre-review and review issue, we extract information about review times and assigned labels.

Add information from software repositories

## [1] 2690   12
papers <- papers %>% dplyr::left_join(df, by = "repo_url")
## [1] 2704   69
source_track <- c(source_track, 
                  structure(rep("sw-github", length(setdiff(colnames(papers),
Clean up a bit

Tabulate number of missing values

In some cases, fetching information from (e.g.) the GitHub API fails for a subset of the publications. There are also other reasons for missing values (for example, the earliest submissions do not have an associated pre-review issue). The table below lists the number of missing values for each of the variables in the data frame.

Number of published papers per month

Number of published papers per year

Fraction rejected papers

The plots below illustrate the fraction of pre-review and review issues closed during each month that have the 'rejected' label attached.

Citation distribution

Papers with 20 or more citations are grouped in the ">=20" category.

Most cited papers

The table below sorts the JOSS papers in decreasing order by the number of citations in OpenAlex.

Citation count vs time since publication

Power law of citation count within each half year

Here, we plot the citation count for all papers published within each half year, sorted in decreasing order.

Pre-review/review time over time

In these plots we investigate whether the time a submission spends in the pre-review or review stage (or their sum) has changed over time. The blue curve corresponds to a rolling median for submissions over 120 days.

Next, we consider the languages used by the submissions, both as reported by Whedon and based on the information encoded in available GitHub repositories (for the latter, we also record the number of bytes of code written in each language). Note that a given submission can use multiple languages.

Association between number of citations and number of stars of the GitHub repo

Distribution of time between GitHub repo creation and JOSS submission

Distribution of time between JOSS acceptance and last commit

Number of authors per paper

List the papers with the largest number of authors, and display the distribution of the number of authors per paper, for papers with at most 20 authors.

Number of authors vs number of contributors to the GitHub repo

Note that points are slightly jittered to reduce the overlap.

Number of reviewers per paper

Submissions associated with rOpenSci and pyOpenSci are not considered here, since they are not explicitly reviewed at JOSS.

Most active reviewers

Submissions associated with rOpenSci and pyOpenSci are not considered here, since they are not explicitly reviewed at JOSS.

Number of papers per editor and year

Distribution of software repo licenses

Most common GitHub repo topics

Citation analysis

Here, we take a more detailed look at the papers that cite JOSS papers, using data from the Open Citations Corpus.

Get citing papers for each submission

Summary statistics

Most citing journals

Save object

The tibble object with all data collected above is serialized to a file that can be downloaded and reused.

To read the current version of this file directly from GitHub, use the following code:

Session info

