Download fasta file with protein accession biopython - consider, that
03 - Parsing GenBank files¶
# Biopython's SeqIO module handles sequence input/outputfromBioimportSeqIOdefget_cds_feature_with_qualifier_value(seq_record,name,value):"""Function to look for CDS feature by annotation value in sequence record. e.g. You can use this for finding features by locus tag, gene ID, or protein ID. """# Loop over the featuresforfeatureingenome_record.features:iffeature.type=="CDS"andvalueinfeature.qualifiers.get(name,[]):returnfeature# Could not find itreturnNonegenome_record=SeqIO.read("NC_004547.gbk","genbank")old_tags=["ECA0662","ECA1451","ECA1871","ECA2166","ECA3646","ECA4387","ECA4407","ECA4432"]withopen("nucleotides.fasta","w")asnt_output,open("proteins.fasta","w")asaa_output:fortaginold_tags:print("Looking at "+tag)cds_feature=get_cds_feature_with_qualifier_value(genome_record,"old_locus_tag",tag)gene_sequence=cds_feature.extract(genome_record.seq)protein_sequence=gene_sequence.translate(table=11,cds=True)# This is asking Python to halt if the translation does not match:assertprotein_sequence==cds_feature.qualifiers["translation"][0]# Output FASTA records - note \n means insert a new line.# This is a little lazy as it won't line wrap the sequence:nt_output.write(">%s\n%s\n"%(tag,gene_sequence))aa_output.write(">%s\n%s\n"%(tag,gene_sequence))print("Done")
Источник: https://widdowquinn.github.io/2018-03-06-ibioic/01-introduction/03-parsing.html
0 thoughts to “Download fasta file with protein accession biopython”