Structure of gtf-files
September 27, 2023 · View on GitHub
Updates:
- 09.2023 - added data from MirGeneDB 2.1, duplicate records removed
In this document you can find short instructions for using databases of small non-coding RNAs published in article ITAS: integrated transcript annotation for small RNA (https://doi.org/10.3390/ncrna8030030).
Integrated annotation was created for precursors and mature miRNA, piRNA, rRNA, tRNA and tRNA fragments for human, mouse, rat, C.elegans and D.melanogaster.
Archived databases gtf-files can be found in the directory https://github.com/EpiEpiMSU/ITAS/tree/main/Integrated_annotation
Scripts, which were used to obtain this databases located in the directory https://github.com/EpiEpiMSU/ITAS_scripts
Structure of gtf-files
One small noncoding RNA often has many loci across the genome. Because of that we have used “exon” in the feature field, the same transcript_id, but different transcript_copy_id for genomic loci of the certain transcript.
For example:
chr2 pirnadb_v1_7_6 exon 10486586 10486608 . + . transcript_id "mmu-piR-10017"; transcript_copy_id "mmu-piR-10017_1"; sequence "AGTTGTGTGTGCATGTTCATGT"
chr2 pirnadb_v1_7_6 exon 10481682 10481704 . + . transcript_id "mmu-piR-10017"; transcript_copy_id "mmu-piR-10017_2"; sequence "AGTTGTGTGTGCATGTTCATGT"
chr2 pirnadb_v1_7_6 exon 10484140 10484162 . + . transcript_id "mmu-piR-10017"; transcript_copy_id "mmu-piR-10017_3"; sequence "AGTTGTGTGTGCATGTTCATGT"
Download and uncompress the database
#downloading whole database
git clone https://github.com/EpiEpiMSU/ITAS
#for unziping files we recommend you to use 7z tool (commands unzip/gzip don`t work directly with multi-part archives)
sudo apt install p7zip-full
#extracting whole database
7z x -y -r ITAS -o./whole_database
#extracting specific database (mouse, for example)
7z x -y -r ITAS/Integrated_annotation/mouse/ -o ./mouse_snRNA_databases