Research Software Sharing, Publication, & Distribution Checklists

Warning

WARNING: work in progress!

The checklists are developed in this gitlab repository

Who is this for? If you are publishing a research paper that has any analysis code associated with it, are developing a software tool which will be used by resarchers, or deploying a web service which will be used by researchers.

How do I use the checklists? Copy the markdown file suitable to your output type (see below) to your repo and check the boxes that apply to your project, or even better have someone else use the checklist to score your project.

For more see the FAQ

(opionally, in the future not implemented yet add a badge to your repo’s readme with your medal and score)

Research Software Output types

There are three broad types research code: Code that is effectively a one-off record of a specific analysis, code which runs as a web-based service to be interacted with by researchers, and code which is intended to be re-used by others as a tool for their research. This latter type can be further split into two types: Individual software packages, typically in a single programming language and using a standard format for the distribution of a software package in that language. Multi-part workflows or pipelines. These typically consist of a number of different tools chained together they might all be in different languages and each perform some step in a complex, and often computationally intensive, process which might be distributed across multiple compute nodes in an high performance compute (HPC) cluster or cloud environment. These are typically distributed in a domain specific β€˜glue’ language with affordances suited to the task of pipeline management such as caching the results of completed steps.

Reviewing, Publishing, Sharing and distributing each of these types of research code requires slightly different considerations and approaches to be taken, though there are some things which are common to all of them.

Here we provide a checklist for each of these types of output.

These checklists also aim to accommodate a range of scales from small individual projects to massive projects with many contributors, so it is important to gauge what is possible and worth doing at your scale. It may be inappropriate, inefficient, and inconvenient to adopt some of the practices of large projects at the beginning of your workflow - though it can pay to leave the path open to using them later by laying some groundwork. Similarly running a massive project as though it was being run by a solo hobbyist is also not a good time - just for more people. Which practices to adopt into your project at which stage of its maturity will depend on the specifics of your project. In presenting practices suitable to a range of scales we hope to let growing projects see what is available to those working at a larger scale plan for their adoption at a suitable time.

Summary of Types

Generic Parts of the Checklists

There are differences in how some of these are executed depending on the type of output but all share these general features, each feature has a motivating question.

  • πŸ“’ Source control
    • How can you keep track of the history of your project and collaborate on it?
  • Β© Licencing
    • On what terms can others use your code, and how can you communicate this?
  • πŸ“– Documentation
    • How do people know what your project is, how to use it and how to contribute?
  • πŸ”— Making Citable
    • How should people make reference to your project and credit your work?
  • βœ… Testing
    • How can you test your project so you can be confident it does what you think it does?
  • πŸ€– Automation
    • What tasks can you automate to increase consistency and reduce manual work?
  • πŸ‘₯ Peer review / Code Review
    • How can you get third party endorsement of and expert feedback on your project?
  • πŸ“¦ Distribution
    • How can people install or access the software emerging from your project?
  • πŸ’½ Environment Management / Portability
    • How can people get specific versions of your software running on their systems?
  • 🌱 Energy Efficiency
    • How can you and your users minimise wasted energy?
  • βš– Governance, Conduct, & Continuity
    • How can you be excellent to each other, make good decisions well, and continue to do so?

Project Scoreing & Medal System

To check the box for each of these items should be attainable for any project. To facilitate this you can check the box at different tiers, Bronze, Silver, Gold and Platinum.

Whilst Bronze aims to be highly attainable Platinum is highly aspirational and essentially no project should expect to have Platinum across the board, indeed if you do your probably overdoing it. Do not be scared if you do not even understand what the silver and above items mean, the difficulty curve is quite steep! Check out the expandable details sections for some resources on steps you can take to start ticking boxes. To achieve an overall bronze rating you must achieve at least bronze in all categories.

Points are assigned to the rankings, 1 point for a Bronze and 4 for a Platinum in each category. Overall project tier is determined by the mean of the score across all categories rounded down to the nearest integer. A high score is possible that does not β€˜medal’ because of important deficiencies in some of the key characteristics of the research software.

πŸ“’ Source control 1 1 4 1 2 3 3 4
Β© Licencing 1 1 1 1 2 4 3 4
πŸ“– Documentation 1 1 1 1 2 3 3 4
πŸ”— Making Citable 1 1 4 1 2 4 3 4
βœ… Testing 1 2 1 1 2 2 3 4
πŸ€– Automation 1 1 1 1 2 1 3 4
πŸ‘₯ Peer review / Code Review 1 1 4 1 2 3 3 4
πŸ“¦ Distribution 1 1 1 1 2 3 3 4
πŸ’½ Environment Management / Portability 1 1 1 1 2 2 3 4
🌱 Energy Efficiency 1 1 4 1 2 3 3 4
βš– Governance, Conduct, & Continuity 0 0 0 1 2 4 3 4
Total Score 10 11 22 11 22 32 33 44
Scaled Score (floor(total score / 11)) 0 1 2 1 2 2 3 4
Overall Medal NA NA NA πŸ₯‰ πŸ₯ˆ πŸ₯ˆ πŸ₯‡ πŸ†

Suggested Workflow(s)

If you are doing a self-assessment:

  • Pick the checklist appropriate to the type of software of your project
  • Download the checklist file and commit it to your repo
  • As you make changes to your project that allow you to check off boxes in the list commit the changes along with the corresponding checked box This way the box checking change is in the same commit as the change that implements it
  • When you’ve got yourself an initial score generate a repo badge and add it to the README of your project, update it along with subsequent changes to the checklist (badge generator)

Not all changes to the checklist will be accompanied changes to the code, that’s fine. However, if you have something like a changelog, release notes, or other documentation which might be a suitable place to document changes that allow you to check off a box you might want to included these in the commit.

Ideally you would want to get an independent 3rd party to assess your project using the checklist.

If you are collaborating on project you will probably be following their usual contribution model and doing something like opening a merge/pull request with the proposed changes and the corresponding checked box(es) for each change.. This may also allow you to have a record of your checklist related changes in the issue tracker on your code forge.