GitHub API already provides plenty of information about the pull requests of open-source projects hosted in its platform. However, sometimes researchers/developers may need some combinatory information about the pull requests which may not be accessible directly through one API call. The PuReX is an effort in providing an API over the GitHub API for getting combinatory information about the pull requests of the open-source projects. This tool can be applicable in developing datasets/benchmarks which can be employed by researchers in software supply chain security area.
PuReX can be installed from PyPI.
Using pip:
pip install purexUsing uv (recommended):
uv add purexTo install the documentation, you can install purex[doc] instead of purex.
uv add purex[doc]To install from the source, clone this repository, cd into the directory and run the following command:
pip install -e .First thing to do after the installation, is to set the environment variable token. This token is your GitHub token that will be used for sending the requests to GitHub REST API. Although including the token is not necessary, but it can be helpful for a faster extraction, specially for bigger projects, since it has a higher rate limit than the public API.
In UNIX-like (GNU/Linux, Mac OS) operating systems:
export PUREX_TOKEN="YOUR TOKEN"In Windows operating system:
set PUREX_TOKEN="YOUR_TOKEN"For getting help about the PuReX, you can run it without any extra command or just pass the help option:
purex --helpIt shows the general help of the tool:
Usage: purex [OPTIONS] COMMAND [ARGS]...
Options:
--version Show the version and exit.
--help Show this message and exit.
Commands:
get Get pull-request data of a repository.The help option is also available for every subcommand. For example for get command:
purex get --helpOutputs:
Usage: purex get [OPTIONS] OWNER REPOSITORY
GET pull-request data for REPOSITY from OWNER.
OWNER is the account name that hosts the repository (e.g., torvalds).
REPOSITORY is the name of the repository (e.g., linux).
Options:
-t, --token TEXT GitHub Token
-u, --base_url TEXT REST API url of GitHub.
--start_date [%m-%d-%Y] Inclusive starting date (MM-DD-YYYY) for pulling
the pull-request data.
--help Show this message and exit.
Example: Let's say we want to get the pull-request information of furo package by pradyunsg starting from 01-01-2024 until the current date. We can use PuReX like this:
purex get pradyunsg furo --start_date 01-01-2024PuReX will extract the information of the requested repository within the selected time delta, and finally finds the maintainers responsible for closing or merging those PRs and returns the results in JSON format:
{
'pradyunsg': {'closed': 7, 'merged': 36},
'dependabot[bot]': {'closed': 3, 'merged': 0},
'ferdnyc': {'closed': 1, 'merged': 0},
'M-ZubairAhmed': {'closed': 1, 'merged': 0}
}
The results shows the number of PRs closed/merged by each maitainer.
For more info and tutorials, please refer to the documentation.
If you use PuReX in your research, please cite it as follows:
@software{PuReX,
author = {Mokhtari Koushyar, Javad},
doi = {10.5281/zenodo.15851126},
month = {2},
title = {{PuReX, Pull-Request Extractor}},
url = {https://github.com/j0m0k0/PuReX},
year = {2025}
}