[ ] Web-based service [raw]

Research Software Sharing, Publication, & Distribution Checklist

A database, API (application programming interface), or other web-based tool which is, generally, to be hosted on an ongoing basis and offer some service or access to some resource to researchers. In many ways the key considerations here lean more organisational than technical. Do you have the resources to operate the service on an ongoing basis? A record of a specific analysis is a snapshot in time that is only expected to run in it’s specified environment and once done is done. Operating an online service requires continuous ongoing work to keep up with security updates and monitoring the status of your server(s) to ensure that your service is still up and working as intended. You can also expect to take on some degree of user support from people having trouble using your service. Depending on the nature of the project it may not make sense for others to be deploying instances of the server, but at minimum other developers will need a test deployment to work on if not now then in the future.

Most of the suggestions here would be the same as the section in the software packages checklist, and indeed software of this type is generally one or more packages so that checklist also applies here. This checklist focuses on the things that are in addition to general packages and more specific to web-based services.

📒Source control

How can you keep track of the history of your project and collaborate on it?

Uses git (or other source control tool)
- 🥉Bronze (Easy): Using version control but has a shallow project history, just placed in git for distribution
- 🥈Silver (Intermediate): Longer project history, commit messages of mixed quality, some large messy changes
- 🥇Gold (Hard): Silver plus - Well written commit messages, nice granular commits making discrete self-contained changes.
  - Also keep your description of your testing and development deployment using a too such as docker compose or ansible under source control
  - Tags, releases, or branches at major project milestones, maybe some contributions from other users
- 🏆Platinum (MAXIMUM OVERKILL): Gold plus - Some from: conventional commits; Clean history with a consistent rebasing/merging strategy; Signed commits from all contributors; Contributions go through a consistent workflow like, issues, then a pull request from a branch.

Unline many of the other research software output types this sort of output tends to consist of multiple seperate components which may have their sources managed seperately. Many of these projects might involve multiple different components which have seperate git repositories, for example your front-end and back-end codebases might live in their own repos. It can be useful to group these projects together within a group or organisation on your gitforge so that their relationship to one another is clear
Another thing which it is valuable for this sort of project to have is how to deploy a local testing and development environment and/or a minimal deployment of the service. This might take the form of a docker / docker-compose, ansible playbook, or similar automation tool for easily deploying a test / example environment. Taking the ‘infrastructure as code’ approach to the deployment of your tool and versioning these examples is most useful when your service is of a sort where it makes sense for others to want to host their own instances. If it is just deployed by you as a central resource these practice may be useful for you internally but they are less impactful for the rest of the community.
If your project is backed by a curated database then documentation of / code from the data collecting and cleaning process which lead to the current content of the database is valuable for the provenance of that dataset. If you are taking new additions to that database such tools are very valuable resources for any collaborators wanting to add data. Even if you are not adding new data these tools can also be very useful to researcher wanting to use data from your database with data they have themselves generated or curated, so the ability to process it in the same way as your data may be essential for valid comparisons.

©Licencing

On what terms can others use your code, and how can you communicate this?

Project is suitably licensed
- 🥉Bronze (easy): There is a LICENSE file in the repository for a license which meets one of the OSI, Debian, or FSF/GNU definitions of free/libre or open source software. Or for any contents that are not software a Creative Commons license.
- 🥈Silver (easy): If any prose/documentation or images is licenced differently from the code in the project this is indicated and those licences provided. If licences have an attribution requirement there is are easy to copy text/links for appropriate attribution.
- 🥇Gold (intermediate): Uses REUSE.software to provide license information for every file.
- 🏆Platinum (intermediate): all previous tiers plus any images have licensing information embedded in their metadata.

If you want to apply a copyleft license to a piece of software that is to be accessed over a network and not necessarily run on end-users own computers then you would want to adopt a license such as the AGPL to ensure that your end users still have the right to run, study, modify and redistribute the code of the server-side part of the tool.
All software needs a license if you want to permit others to reuse it. It is important to give some thought to the type of license which best suits your project, it is a choice which can have significant long term implications. Checkout the turing way chapter on licensing for an introduction to the subject. If you have no time some pretty safe choices are: For a permissive license, the Apache 2.0. This would allow the re-use of your work in closed commercial code. For a ‘copyleft’ license, the GPLv3 (AGPL for server-side apps). This requires that anyone distributing software containing your code or derivatives of it share the source code with the people they distributed it too.
If you are including external code in your service then you should check that their licenses are compatible and you are legally allowed to distribute your code together in this way. Checkout this resource on license compatibility.
REUSE.software is a tool that can help you keep track licenses in complex multi-license projects. It identifies licences for code in individual files with SPDX licence codes and has an approach to doing so for binary assets.

✅Testing

How can you test your project so you can be confident it does what you think it does?

Service is appropriately tested
- 🥉Bronze (easy): You have examples in documentation or vignettes which are run and allow you to see ‘manually’ if your code’s output is correct for key functionality
- 🥈Silver (easy): You are using unit tests and an automated testing framework with tests that cover at least your package’s core functionality
- 🥇Gold (intermediate): silver plus: You are monitoring your test coverage to get some insight into any important code paths you might be missing
- 🏆Platinum (hard): You follow the Test Driven Development (TDD) model, designing and writing tests first then writing code to make them pass

Services and graphical interfaces many require integration tests which check that the different components of your system work together as expected and UI based testing frameworks which simulate user interaction in a web browser might be things that you would consider adding to the sorts of test you might do for a simpler library.
A good test suite allows you to refactor your code without fear of breaking its functionality. Good tests are agnostic to the implementation details of action that you are testing, so that you can change how you implemented something without needing to change the tests. The use of automated testing frameworks is especially useful for software that is under ongoing development as it allows developers to catch the unintended consequences of a change made in one place on some other part of the code that they did not anticipate.
Examples of automated testing frameworks include {testthat} for R & unittest for python. Tools like Codecov or coveralls in conjucnction with language specific tools such as covr can help with code coverage monitoring and insights.
Unit tests allow you to spell out in detail what you expect the behaviour of your software to be under a particular circumstance and test if it conforms to these expectations. Automatically running tests like this can be added to CI/CD pipelines on git forges.
Test coverage does not necessarily need to be 100% or even especially high but code coverage tools can allow you to spot gaps in test coverage over important parts of your codebase and ensure that you cover them and give you an indication when you added new and poorly covered code to your codebase that you may want to add tests for.
Try to make sure that your test suite runs fast so that you can run it regularly and quickly iterate.
Test Driven Development (TDD) is the practice of writing your tests first and then developing the code which conforms to these tests. It works well if you have an extremenly well defined idea of what exactly you want your code to do and not do.

🤖 Automation

What tasks can you automate to increase consistency and reduce manual work?

👥Peer review / Code Review

[ ]
- 🥉Bronze (easy):
- 🥈Silver (easy):
- 🥇Gold (intermediate):
- 🏆Platinum (intermediate):

If you are building a database of some kind then you might want the processes by which you process, collect or curate the data which go into this database to be subject to an academic style review, and papers about the creation of such resources are not uncommon.
Seeking an external technical review may be trickier for your core code but review of how easy your system is to deploy is perhaps more accessible from the community of amateur self-hosters. Who may be quite willing to try deploying your tool in many and varied homelabs if it offers them something and/or you ask nicely and in the right places.

📦Distribution

Distribution for a web based service covers both hosting the service and distributing the software to sysadmins who may want to run their own instance of the service.

Service is distributed in suitable fashion
- 🥉Bronze (easy): Code is in a public repository or repositories
- 🥈Silver (easy):
  - Detailed instructions are available on how to set up, at minimum, a development environment in which the service can run.
  - All constituent components of the service are appropriately packaged
  - Lock file(s) with
- 🥇Gold (intermediate): A simple deployment of the service is available in declarative form using a tool such as docker compose or ansible which automates a simple deployment.
- 🏆Platinum (intermediate):
  - Reproducibly deployable as a Nix module

Your general audience is users of your web service, and there’s a smaller but imporant audience of sysadmins and developers who may need to run your server software on their own systems not just use it. So ‘distribution’ splits in to two slightly different problems.

Operating your website, things like:

making sure that your TLS certificates stay up to date and you have enough compute resources for the service to run well for users.
Having a sensible URL, potentially including any look-alike urls that malicious actors might try to typosquat
Distributing your service to developers who may want to build tools on top of it or query it in an automated fashion via an API. The API should be well documented and conform to open standards.
Take some simple measures to ensure the reliability of your site under elevated load. Such as using a reverse proxy, enabling content caching so that your proxy can serve requests for the same content without hitting your application server(s) again, limit concurrent connections to the max number of sessions your server can handle at once so if traffic spikes it gets slower but dosen’t completely fall over, load balancing across multiple application servers.
Consider DDoS protection for your site if your traffic grows over a certain threshold.
Be wary of ‘denial of wallet’ attacks when hosting on automatically horizontally scaling platform by setting limits to prevent malicious parties from spamming your site in such a fashion as to cause you to incur massive hosting bills.

Distributing your server software to sysadmins, devops people, and potentially general IT staff, developers, and amateur self-hosters.

💽Environment Management / Portability

Simple to install development, testing, and/or demo environment
- 🥉Bronze (easy): A developer / sysadmin other than yourself can deploy a simple testing instance of your software using only your documentation.
- 🥈Silver (easy):
  - Sensible security defaults. Such as not having default passwords, for databases or admin consoles and requiring the person deploying your service to configure them, hopefully with appropriate secrets management approaches.
- 🥇Gold (intermediate):
  - [ ]
- 🏆Platinum (intermediate):
  - [ ]
  - Deployment is possible on multiple architectures (such as x86_64 & ARM64)

Depending on the infrastructure that you chose to deploy on you might use a different management tool, but it is best if you do use such a tool as a part of your development and deployment as, if done right, this provides an easy ‘run a couple of commands’ development environment setup for anyone picking up the project. Be that a future maintainer, someone wanting to play with a local test deployment, or someone wanting to contribute to the project. Examples of such tools include: ansible, terraform, docker/docker compose, nix, helm charts or a combination of some of these that fits your needs and experience.

[ ] Web-based service [raw]

📒Source control

©Licencing

📖Documentation

🔗Making Citable

✅Testing

🤖 Automation

👥Peer review / Code Review

📦Distribution

💽Environment Management / Portability

🌱 Energy Efficiency

⚖ Governance, Conduct, & Continuity