Research Software Sharing, Publication, & Distribution Checklists
WARNING: work in progress!
The checklists are developed in this gitlab repository
Who is this for? If you are publishing a research paper that has any analysis code associated with it, are developing a software tool which will be used by resarchers, or deploying a web service which will be used by researchers.
How do I use the checklists? Copy the markdown file suitable to your output type (see below) to your repo and check the boxes that apply to your project, or even better have someone else use the checklist to score your project.
For more see the FAQ
(opionally, in the future not implemented yet add a badge to your repoβs readme with your medal and score)
Research Software Output types
There are three broad types research code: Code that is effectively a one-off record of a specific analysis, code which runs as a web-based service to be interacted with by researchers, and code which is intended to be re-used by others as a tool for their research. This latter type can be further split into two types: Individual software packages, typically in a single programming language and using a standard format for the distribution of a software package in that language. Multi-part workflows or pipelines. These typically consist of a number of different tools chained together they might all be in different languages and each perform some step in a complex, and often computationally intensive, process which might be distributed across multiple compute nodes in an high performance compute (HPC) cluster or cloud environment. These are typically distributed in a domain specific βglueβ language with affordances suited to the task of pipeline management such as caching the results of completed steps.
Reviewing, Publishing, Sharing and distributing each of these types of research code requires slightly different considerations and approaches to be taken, though there are some things which are common to all of them.
Here we provide a checklist for each of these types of output.
These checklists also aim to accommodate a range of scales from small individual projects to massive projects with many contributors, so it is important to gauge what is possible and worth doing at your scale. It may be inappropriate, inefficient, and inconvenient to adopt some of the practices of large projects at the beginning of your workflow - though it can pay to leave the path open to using them later by laying some groundwork. Similarly running a massive project as though it was being run by a solo hobbyist is also not a good time - just for more people. Which practices to adopt into your project at which stage of its maturity will depend on the specifics of your project. In presenting practices suitable to a range of scales we hope to let growing projects see what is available to those working at a larger scale plan for their adoption at a suitable time.
Summary of Types
Generic Parts of the Checklists
There are differences in how some of these are executed depending on the type of output but all share these general features, each feature has a motivating question.
- π Source control
- How can you keep track of the history of your project and collaborate on it?
- Β© Licencing
- On what terms can others use your code, and how can you communicate this?
- π Documentation
- How do people know what your project is, how to use it and how to contribute?
- π Making Citable
- How should people make reference to your project and credit your work?
- β
Testing
- How can you test your project so you can be confident it does what you think it does?
- π€ Automation
- What tasks can you automate to increase consistency and reduce manual work?
- π₯ Peer review / Code Review
- How can you get third party endorsement of and expert feedback on your project?
- π¦ Distribution
- How can people install or access the software emerging from your project?
- π½ Environment Management / Portability
- How can people get specific versions of your software running on their systems?
- π± Energy Efficiency
- How can you and your users minimise wasted energy?
- β Governance, Conduct, & Continuity
- How can you be excellent to each other, make good decisions well, and continue to do so?
Project Scoreing & Medal System
To check the box for each of these items should be attainable for any project. To facilitate this you can check the box at different tiers, Bronze, Silver, Gold and Platinum. Whilst Bronze aims to be highly attainable Platinum is highly aspirational and essentially no project should expect to have Platinum across the board, indeed if you do your probably overdoing it. Do not be scared if you do not even understand what the silver and above items mean, the difficulty curve is quite steep! Check out the expandable details sections for some resources on steps you can take to start ticking boxes. To achieve an overall bronze rating you must achieve at least bronze in all categories.
Points are assigned to the rankings, 1 point for a Bronze and 4 for a Platinum in each category. Overall project tier is determined by the mean of the score across all categories rounded down to the nearest integer. A high score is possible that does not βmedalβ because of important deficiencies in some of the key characteristics of the research software.
π Source control | 1 | 1 | 4 | 1 | 2 | 3 | 3 | 4 |
Β© Licencing | 1 | 1 | 1 | 1 | 2 | 4 | 3 | 4 |
π Documentation | 1 | 1 | 1 | 1 | 2 | 3 | 3 | 4 |
π Making Citable | 1 | 1 | 4 | 1 | 2 | 4 | 3 | 4 |
β Testing | 1 | 2 | 1 | 1 | 2 | 2 | 3 | 4 |
π€ Automation | 1 | 1 | 1 | 1 | 2 | 1 | 3 | 4 |
π₯ Peer review / Code Review | 1 | 1 | 4 | 1 | 2 | 3 | 3 | 4 |
π¦ Distribution | 1 | 1 | 1 | 1 | 2 | 3 | 3 | 4 |
π½ Environment Management / Portability | 1 | 1 | 1 | 1 | 2 | 2 | 3 | 4 |
π± Energy Efficiency | 1 | 1 | 4 | 1 | 2 | 3 | 3 | 4 |
β Governance, Conduct, & Continuity | 0 | 0 | 0 | 1 | 2 | 4 | 3 | 4 |
Total Score | 10 | 11 | 22 | 11 | 22 | 32 | 33 | 44 |
Scaled Score (floor(total score / 11) ) |
0 | 1 | 2 | 1 | 2 | 2 | 3 | 4 |
Overall Medal | NA | NA | NA | π₯ | π₯ | π₯ | π₯ | π |