############################################################# README for ftp://ncbi.nlm.nih.gov/refseq/H_sapiens/historical/ Last updated: June 23, 2023 ############################################################# This directory provides alignments and annotation for current and historical RefSeq transcripts, including both replaced and suppressed RefSeq transcript versions. Only "known" RefSeq transcripts with NM_ or NR_ accession prefixes are included. The data is comprised of alignments collected from current and previous annotation releases of the GRCh38 assembly, favoring the most recent annotation release for any given transcript accession.version. The set is then supplemented with alignments of older transcripts that were never annotated on GRCh38. The alignments are then filtered to remove rare transcripts that aren't aligned to the expected location for its gene to remove some historical errors. Genome annotation in GFF3 format is then generated from the set of all alignments. The GFF3 output differs from GFF3 provided as part of regular annotation releases by omitting use of short 1-2 bp "microintrons" to represent indels. The annotation GFF3 is provided to indicate CDS locations on the genome, but remapping from transcript to genome coordinates should always rely directly on the alignment data. Current annotation releases are named using the format: -RS_YYYY_MM The historical alignment set is anchored on the most recent annotation release. Provided files: 1. GCF_000001405.40-RS_YYYY_MM_knownrefseq_alns.gff.gz alignment files in GFF3 format see: https://www.ncbi.nlm.nih.gov/datasets/docs/v2/reference-docs/file-formats/annotation-files/about-ncbi-gff3/#alignments 2. GCF_000001405.40-RS_YYYY_MM_knownrefseq_alns.bam GCF_000001405.40-RS_YYYY_MM_knownrefseq_alns.bam.bai alignment files in BAM format 3. GCF_000001405.40-RS_YYYY_MM_genomic.gff.gz annotation files for the corresponding alignments