Annotation
Bakta
All assembled ATB sample genomes were annotated using Bakta v1.9.4 using its full database v5.1. Bakta was run in a Conda environment using its public Bioconda package.
Result files are available on OSF in the Bakta component. However, due to the huge amount of raw annotation data (>35 TB), a couple of measures have been taken to handle this. First, only JSON files are provided since all Bakta output files can be restored from those (see below). Second, sample result files are packed in taxonomic batches, just like assembled FASTA files to achieve better compression ratios. Third, since there are OSF data size limits of 50 GB and 5 GB for compartments and files, respectively, all taxonomy batches were further distributed to several compartments. Furthermore, some taxonomy batches had to be split into separate files to meet the 5 GB file limit. By doing so, all annotation data could be reduced to a total of ~1.3 TB and ~0.3 TB for r0.2
and incr_release.202408
, respectively.
For each release, there is a status file in a File_Lists
folder, e.g. atb.bakta.r0.2.status.tsv.gz providing the following information:
sample
= the INSDC sample accessionstatus
= the status of the Bakta run (PASS
,FAIL
)file_name
= the name of the Bakta JSON result file, e.g. SAMN38372697.bakta.jsonfile_md5
= MD5 sum of file_nametar_xz
= the name of the tar.xz file where this sample’s JSON lives following a fix schema: atb.analysis
.release
.batch
.tar.xz, e.g. atb.bakta.r0.2.batch.1.tar.xztar_xz_md5
= MD5 sum of tar_xztar_xz_size_MB
= size of the tar_xz file in MB
Example SAMN38372697:
sample SAMN38372697
status PASS
file_name SAMN38372697.bakta.json
file_md5 008d86ad046e0d152b8cc22d7452be24
tar_xz atb.bakta.incr_release.202408.batch.29.tar.xz
tar_xz_md5 7da90ac7650de2c2e0b821569ae2a602
tar_xz_size_MB 1201.0
To restore all output files for a given sample from its JSON file, use the following command:
bakta_io --output <output-path> --prefix <file-prefix> sample.json
For any questions regarding the Bakta genome annotation, please contact Oliver Schwengers.