Skip to main content

Architecture

SCHEMA api implements a service for submitting and monitoring containerized task execution requests. Its purpose is to act as a gateway between users and the task execution environment, performing necessary authentication and authorization checks and recording submitted task requests. Submitted, valid and authorized task requests can ultimately be scheduled for execution and be monitored from SCHEMA api exposed endpoints.

Dependencies

  • TESK: SCHEMA api is designed to work over a Kubernetes cluster on which it schedules tasks shipped in Docker containers. Currently, it does not talk directly to Kubernetes but instead schedules task executions through TESK, an implementation of the TES API specification by GA4GH. Thus, SCHEMA api requires the existence of a deployment of TESK, accessible through HTTP.

    SCHEMA api components
  • Filesystem: SCHEMA api is limited to providing an API for executing tasks and enforcing rules that allow access only to authorized users - it does not offer in any way handles for managing input and output files that are needed during the execution. The reason for this, is to decouple SCHEMA api from domain-specific file handling operations. Moreover, the underlying file system is mostly a dependency of TESK rather than SCHEMA api itself, since TESK is the one that will carry out the necessary actions for mounting input files inside the execution container and retrieving output files from it.

    Currently, TESK supports S3 and FTP file storages. As a result, in order for SCHEMA api to perform tasks that utilize files, it's expected that such a file storage is set up and configured with the TESK deployment. As far as file management operations are concerned, a separate solution should be considered.

  • RDBMS: In order for SCHEMA api, to reliably track task submissions it is designed to work with a relational database. Any relational database management system from the ones listed below is supported:

    • PostgreSQL (recommended)

    • MariaDB

    • MySQL

    • Oracle

      note

      Although an RDBMS is required, if one does not exist, SCHEMA api will use an SQLite file database instead. While this is fine and pretty easy to manage for development purposes, SQLite lacks several features that are crucial for production environments.

      As a result, it is strongly advised that a supported RDBMS from the ones above is used.

Modules

SCHEMA api can be easily understood as a pair of modules: the Authorization and Authentication module which is responsible for carrying out all necessary auth-related operations, and the Tasks module which encapsulates the task submission and monitoring logic. Consequently, SCHEMA api is a combination of exposed endpoint sets, namely Tasks API and Auth API. This distinction between Tasks and Authorization will be used throughout the rest of this documentation to efficiently document the corresponding architecture, data and API endpoints.