close
Skip to content

j0m0k0/PuReX

Repository files navigation

PuReX logo with some description about it.

Image Image
Image Image DOI Image

Motivation

GitHub API already provides plenty of information about the pull requests of open-source projects hosted in its platform. However, sometimes researchers/developers may need some combinatory information about the pull requests which may not be accessible directly through one API call. The PuReX is an effort in providing an API over the GitHub API for getting combinatory information about the pull requests of the open-source projects. This tool can be applicable in developing datasets/benchmarks which can be employed by researchers in software supply chain security area.

Installation

PuReX can be installed from PyPI.

Using pip:

pip install purex

Using uv (recommended):

uv add purex

To install the documentation, you can install purex[doc] instead of purex.

uv add purex[doc]

To install from the source, clone this repository, cd into the directory and run the following command:

pip install -e .

Basic Usage

First thing to do after the installation, is to set the environment variable token. This token is your GitHub token that will be used for sending the requests to GitHub REST API. Although including the token is not necessary, but it can be helpful for a faster extraction, specially for bigger projects, since it has a higher rate limit than the public API.

In UNIX-like (GNU/Linux, Mac OS) operating systems:

export PUREX_TOKEN="YOUR TOKEN"

In Windows operating system:

set PUREX_TOKEN="YOUR_TOKEN"

For getting help about the PuReX, you can run it without any extra command or just pass the help option:

purex --help

It shows the general help of the tool:

Usage: purex [OPTIONS] COMMAND [ARGS]...

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  get  Get pull-request data of a repository.

Getting Data from a Repository

The help option is also available for every subcommand. For example for get command:

purex get --help

Outputs:

Usage: purex get [OPTIONS] OWNER REPOSITORY

  GET pull-request data for REPOSITY from OWNER.

  OWNER is the account name that hosts the repository (e.g., torvalds).

  REPOSITORY is the name of the repository (e.g., linux).

Options:
  -t, --token TEXT         GitHub Token
  -u, --base_url TEXT      REST API url of GitHub.
  --start_date [%m-%d-%Y]  Inclusive starting date (MM-DD-YYYY) for pulling
                           the pull-request data.
  --help                   Show this message and exit.

Example: Let's say we want to get the pull-request information of furo package by pradyunsg starting from 01-01-2024 until the current date. We can use PuReX like this:

purex get pradyunsg furo --start_date 01-01-2024

PuReX will extract the information of the requested repository within the selected time delta, and finally finds the maintainers responsible for closing or merging those PRs and returns the results in JSON format:

{
  'pradyunsg': {'closed': 7, 'merged': 36},
  'dependabot[bot]': {'closed': 3, 'merged': 0},
  'ferdnyc': {'closed': 1, 'merged': 0},
  'M-ZubairAhmed': {'closed': 1, 'merged': 0}
}

The results shows the number of PRs closed/merged by each maitainer.

For more info and tutorials, please refer to the documentation.

About

Publications

If you use PuReX in your research, please cite it as follows:

@software{PuReX,
  author = {Mokhtari Koushyar, Javad},
  doi = {10.5281/zenodo.15851126},
  month = {2},
  title = {{PuReX, Pull-Request Extractor}},
  url = {https://github.com/j0m0k0/PuReX},
  year = {2025}
}

About

Information extraction from pull requests

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors