KEGG The database is called the genome encyclopedia , Is a containing gene, pathway And other comprehensive databases . For better query kegg data , The official provided the corresponding API.
stay biopython in , adopt Bio.KEGG modular , Yes kegg Official API It was packaged , Allow in python Use in the environment kegg API.KEGG API And python The corresponding relationship of the code is as follows
/list/hsa:10458+ece:Z5100 -> REST.kegg_list(["hsa:10458", "ece:Z5100"])
/find/compound/300-310/mol_weight -> REST.kegg_find("compound", "300-310", "mol_weight")
/get/hsa:10458+ece:Z5100/aaseq -> REST.kegg_get(["hsa:10458", "ece:Z5100"], "aaseq")
utilize REST modular , Can download API Any type of data supported , With pathway For example , Examples are as follows
>>> from Bio.KEGG import REST
>>> pathway = REST.kegg_get('hsa00010')
For the content obtained from the query , adopt read Method can be converted to plain text , Examples are as follows
>>> pathway = REST.kegg_get('hsa00010')
>>> res = pathway.read().split("\n")
>>> res[0]
'ENTRY hsa00010 Pathway'
>>> res[1]
'NAME Glycolysis / Gluconeogenesis - Homo sapiens (human)'
>>> res[2]
'DESCRIPTION Glycolysis is the process of converting glucose into pyruvate and generating small amounts of ATP (energy) and NADH (reducing power). It is a central pathway that produces important precursor metabolites: six-carbon compounds of glucose-6P and fructose-6P and three-carbon compounds of glycerone-P, glyceraldehyde-3P, glycerate-3P, phosphoenolpyruvate, and pyruvate [MD:M00001]. Acetyl-CoA, another important precursor metabolite, is produced by oxidative decarboxylation of pyruvate [MD:M00307]. When the enzyme genes of this pathway are examined in completely sequenced genomes, the reaction steps of three-carbon compounds from glycerone-P to pyruvate form a conserved core module [MD:M00002], which is found in almost all organisms and which sometimes contains operon structures in bacterial genomes. Gluconeogenesis is a synthesis pathway of glucose from noncarbohydrate precursors. It is essentially a reversal of glycolysis with minor variations of alternative paths [MD:M00003].'
In this way, the string can be parsed , To get the number corresponding to the path , name , Notes, etc . about KEGG Data analysis ,biopython Special parsing functions are also provided , But the analytic function is not complete , At present, it only covers compound, map, enzyme And so on . With enzyme Database, for example , Usage is as follows
>>> from Bio.KEGG import REST
>>> request = REST.kegg_get("ec:5.4.2.2")
>>> open("ec_5.4.2.2.txt", "w").write(request.read())
>>> records = Enzyme.parse(open("ec_5.4.2.2.txt"))
>>> record = list(records)[0]
>>> record
< Bio.KEGG.Enzyme.Record object at 0x02EE7D18 >
>>> record.classname
['Isomerases;', 'Intramolecular transferases;', 'Phosphotransferases (phosphomutases)']
>>> record.entry
'5.4.2.2'
adopt biopython, We can not only in python Use in the environment kegg api, what's more , Can use python Logical processing of , To implement complex filtering logic , Search for example human in DNA Repair related genes , The basic idea is as follows
1. adopt list API obtain human be-all pathway Number ;
2. adopt get API Get each pathway, Analyze its description Information , Filter appears repair Keyword access ;
3. For screened pathways , The genes corresponding to this pathway were obtained by text analysis ;
The complete code is as follows
>>> from Bio.KEGG import REST
>>> human_pathways = REST.kegg_list("pathway", "hsa").read()
>>> repair_pathways = []
>>> for line in human_pathways.rstrip().split("\n"):
... entry, description = line.split("\t")
... if "repair" in description:
... repair_pathways.append(entry)
...
>>> repair_pathways
['path:hsa03410', 'path:hsa03420', 'path:hsa03430']
>>> repair_genes = []
>>> for pathway in repair_pathways:
... pathway_file = REST.kegg_get(pathway).read()
... current_section = None
... for line in pathway_file.rstrip().split("\n"):
... section = line[:12].strip()
... if not section == "":
... current_section = section
... if current_section == "GENE":
... gene_identifiers, gene_description = line[12:].split("; ")
... gene_id, gene_symbol = gene_identifiers.split()
... if not gene_symbol in repair_genes:
... repair_genes.append(gene_symbol)
...
>>> repair_genes
['OGG1', 'NTHL1', 'NEIL1', 'NEIL2', 'NEIL3', 'UNG', 'SMUG1', 'MUTYH', 'MPG', 'MBD4', 'TDG', 'APEX1', 'APEX2', 'POLB', 'POLL', 'HMGB1', 'XRCC1', 'PCNA', 'POLD1', 'POLD2', 'POLD3', 'POLD4', 'POLE', 'POLE2', 'POLE3', 'POLE4', 'LIG1', 'LIG3', 'PARP1', 'PARP2', 'PARP3', 'PARP4', 'FEN1', 'RBX1', 'CUL4B', 'CUL4A', 'DDB1', 'DDB2', 'XPC', 'RAD23B', 'RAD23A', 'CETN2', 'ERCC8', 'ERCC6', 'CDK7', 'MNAT1', 'CCNH', 'ERCC3', 'ERCC2', 'GTF2H5', 'GTF2H1', 'GTF2H2', 'GTF2H2C_2', 'GTF2H2C', 'GTF2H3', 'GTF2H4', 'ERCC5', 'BIVM-ERCC5', 'XPA', 'RPA1', 'RPA2', 'RPA3', 'RPA4', 'ERCC4', 'ERCC1', 'RFC1', 'RFC4', 'RFC2', 'RFC5', 'RFC3', 'SSBP1', 'PMS2', 'MLH1', 'MSH6', 'MSH2', 'MSH3', 'MLH3', 'EXO1']
adopt biopython, It can be used more efficiently KEGG API, combination API Data acquisition capability and python Logical processing capability , To meet our personalized analysis needs .
·end·
— If you like , Share it with your friends —
Pay attention to our , Unlock more !

Vs2019 exports dynamic link libraries (DLLs) for use by other vs projects and Python code
adopt vs You can export dynami
Python uploads files to the object storage system WOS (based on requests+requests_toolbelt)
Project requirements: upload t