程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python learning notes: parsing XML (the elementtree XML API)

編輯:Python

python Provide treatment ( Parse and create )XML Interface of format file :xml.etree.ElementTree( hereinafter referred to as ET)  modular .

> notes : since version3.3 after ,xml.etree.cElementTree Module obsolescence .

One 、XML Format

XML Is a hierarchical data format , Usually it can be used “ Trees ” Express .ET There are two classes in (class) But for XML To said :

  1.  ElementTree: Will the whole XML The file is represented as “ Trees ”;(class ET.ElementTree)
  2. Element: Represents a single node in the tree .(class ET.Element)

Two 、 analysis XML

The following is an analysis of country_data.xml File as an example :

<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>

2.1  Read XML Format file

    2.1.1  Reading method

  (1) Method 1 : from file

import xml.etree.ElementTree as ET
# Method 1: from file
tree = ET.parse('XXXX.xml') # File storage path , Get the entire xml
root = tree.getroot() # obtain xml The root node

    (2)  Method 2 : From file content ( character string )

# Method 2: From string
root = ET.fromstring('XXXX.xml All strings of the file ')

        explain :ET.fromstring()  Function will XML The contents of the document ( String format ) It is directly parsed into a Element object ( node ), This Element It is this that is parsed XML Root node of tree .

   2.1.2  Code

import xml.etree.ElementTree as ET
filePath = 'C:\codes\data\country_data.xml'
##method1: reading from a file
tree = ET.parse(filePath)
root = tree.getroot()
print(root.tag)
##method2: importing from a string
root2 = ET.fromstring('''<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>''')
print (root2.tag)

        Output :

 2.2 obtain Element Object properties

Serial number attribute Express data type give an example 1Element.tagelement name, It refers to the element Type of object character string

Input :root.tag

Output :data

2Element.attribelement atrribute's name and value Dictionaries

Input :root[0].attrib

Output :{‘name’:'Liechtenstein'}

3Element.textthe text between the element's start tag and its first child or end tag, or None.( At present element start tag Adjacent to the next tag Text between ) Usually a string

Input :root[0][0].text 

Output :1

4Element.tailthe text between the element's end tag and the next tag, or None.( At present element end tag And the next one tag Text between ) Usually a string

Input :root[0][0].tail

Output :None

5Element.keys() Get the current object / Key of node attribute , Returns a list of list

Input :root[0].keys()

Output :['name']

6Element.items() Get the current object / Node attribute key value pairs , Returns a list of list[(,)]

Input :root[0][3].items()

Output :[('name', 'Austria'), ('direction', 'E')]

 2.3 Inquire about subElement Object function

2.3.1 The search scope is current Element Object and all levels below

        Iterator lookup :Element.iter('tagname')

  • Query the current element Object and all levels below tag by tagname The object of ( Depth first search );
  • if tagname by None or ' * ', Then find the current element Object and all objects at all levels below .
# Get current element All levels under the object tag by Neighbor The object of
for neighbor in root.iter('neighbor'):
print(neighbor.attrib) 

          Output :

2.3.2 The search scope is current Element The next level of the object

  • Element.findall(match): Get current Element The next level of the object ( This layer only ) List of matching objects .
  • Element.iterfind(match): Get current Element The next level of the object ( This layer only ) Matching object iterators .
  • Element.find(match): Get current Element The first matching object in the next level of the object .
  • Element.findtext(match, default=None): Get current Element Of the first matching object in the next level of the object text( It fails to work well , There will be a lot of '\n( Space )').
# Find the current element The next level of the object
print('Using element.findall:')
ele1 = root.findall('country')
for every in ele1:
print(every.attrib)
print("Using element.iterfind:")
for every in root.iterfind('country'):
print(every.attrib)
print('Using element.itertext:')
for every in root.itertext():
if every.startswith('\n')==False:
print(every)
# Find the current element The first matching object at the next level of the object
print('Using element.find:')
ele = root.find('country')
print(ele.attrib)
print('Using element.findtext:')
ranktext = ele.findtext('rank')
print(ranktext)

Output :

 


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved