程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Basic usage of Python pyquery

編輯:Python

1. Installation method

pip install pyquery

2. Reference method

from pyquery import PyQuery as pq

3. brief introduction

pyquery It's the type jquery A special supply of python The use of html Parsed Library , The method of use is similar to bs4.

4. Usage method

4.1 Initialization method :

from pyquery import PyQuery as pq
doc =pq(html) # analysis html character string
doc =pq("http://news.baidu.com/") # Parse web pages
doc =pq("./a.html") # analysis html Text 

      4.2 basic CSS Selectors

from pyquery import PyQuery as pq
html = '''
<div id="wrap">
<ul class="s_from">
asdasd
<link href="http://asda.com">asdadasdad12312</link>
<link href="http://asda1.com">asdadasdad12312</link>
<link href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
'''
doc = pq(html)
print doc("#wrap .s_from link")

Running results :

<link href="http://asda.com">asdadasdad12312</link>
<link href="http://asda1.com">asdadasdad12312</link>
<link href="http://asda2.com">asdadasdad12312</link>

# Is to find id The label of   . Is to find class The label of   link Is to find link label The space in the middle indicates the inner layer

4.3 Find child elements

from pyquery import PyQuery as pq
html = '''
<div id="wrap">
<ul class="s_from">
asdasd
<link href="http://asda.com">asdadasdad12312</link>
<link href="http://asda1.com">asdadasdad12312</link>
<link href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
'''
# Find child elements
doc = pq(html)
items=doc("#wrap")
print(items)
print(" The type is :%s"%type(items))
link = items.find('.s_from')
print(link)
link = items.children()
print(link)

Running results :

<div id="wrap">
<ul class="s_from">
asdasd
<link href="http://asda.com">asdadasdad12312</link>
<link href="http://asda1.com">asdadasdad12312</link>
<link href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
The type is :<class 'pyquery.pyquery.PyQuery'>
<ul class="s_from">
asdasd
<link href="http://asda.com">asdadasdad12312</link>
<link href="http://asda1.com">asdadasdad12312</link>
<link href="http://asda2.com">asdadasdad12312</link>
</ul>
<ul class="s_from">
asdasd
<link href="http://asda.com">asdadasdad12312</link>
<link href="http://asda1.com">asdadasdad12312</link>
<link href="http://asda2.com">asdadasdad12312</link>
</ul>

According to the running results, it can be found that the return result type is pyquery, also find Methods and children Methods can get the inner label

4.4 Find the parent element

from pyquery import PyQuery as pq
html = '''
<div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link href="http://asda.com">asdadasdad12312</link>
<link href="http://asda1.com">asdadasdad12312</link>
<link href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
'''
doc = pq(html)
items=doc(".s_from")
print(items)
# Find the parent element
parent_href=items.parent()
print(parent_href)

Running results :

<ul class="s_from">
asdasd
<link href="http://asda.com">asdadasdad12312</link>
<link href="http://asda1.com">asdadasdad12312</link>
<link href="http://asda2.com">asdadasdad12312</link>
</ul>
<div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link href="http://asda.com">asdadasdad12312</link>
<link href="http://asda1.com">asdadasdad12312</link>
<link href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>

parent You can find out the contents of the outer label , Or something like that parents, You can get all outer nodes

4.5 Find sibling elements

from pyquery import PyQuery as pq
html = '''
<div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link class='active1 a123' href="http://asda.com">asdadasdad12312</link>
<link class='active2' href="http://asda1.com">asdadasdad12312</link>
<link class='movie1' href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
'''
doc = pq(html)
items=doc("link.active1.a123")
print(items)
# Find sibling elements
siblings_href=items.siblings()
print(siblings_href)

Running results :

<link class="active1 a123" href="http://asda.com">asdadasdad12312</link>
<link class="active2" href="http://asda1.com">asdadasdad12312</link>
<link class="movie1" href="http://asda2.com">asdadasdad12312</link>

According to the running results, we can see ,siblings Returned other tags of the same level

Conclusion : Sub element lookup , Parent element lookup , Brother element search , The result types returned by these methods are pyquery type , You can choose again for the result

4.6 Traverse the search results

from pyquery import PyQuery as pq
html = '''
<div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link class='active1 a123' href="http://asda.com">asdadasdad12312</link>
<link class='active2' href="http://asda1.com">asdadasdad12312</link>
<link class='movie1' href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
'''
doc = pq(html)
its=doc("link").items()
for it in its:
print(it)

Running results :

<link class="active1 a123" href="http://asda.com">asdadasdad12312</link>
<link class="active2" href="http://asda1.com">asdadasdad12312</link>
<link class="movie1" href="http://asda2.com">asdadasdad12312</link>

4.7 Get attribute information

from pyquery import PyQuery as pq
html = '''
<div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link class='active1 a123' href="http://asda.com">asdadasdad12312</link>
<link class='active2' href="http://asda1.com">asdadasdad12312</link>
<link class='movie1' href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
'''
doc = pq(html)
its=doc("link").items()
for it in its:
print(it.attr('href'))
print(it.attr.href)

Running results :

http://asda.com
http://asda.com
http://asda1.com
http://asda1.com
http://asda2.com
http://asda2.com

4.8 Get text

from pyquery import PyQuery as pq
html = '''
<div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link class='active1 a123' href="http://asda.com">asdadasdad12312</link>
<link class='active2' href="http://asda1.com">asdadasdad12312</link>
<link class='movie1' href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
'''
doc = pq(html)
its=doc("link").items()
for it in its:
print(it.text())

Running results

asdadasdad12312
asdadasdad12312
asdadasdad12312

4.9 obtain HTML Information

from pyquery import PyQuery as pq
html = '''
<div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link class='active1 a123' href="http://asda.com"><a>asdadasdad12312</a></link>
<link class='active2' href="http://asda1.com">asdadasdad12312</link>
<link class='movie1' href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
'''
doc = pq(html)
its=doc("link").items()
for it in its:
print(it.html())

Running results :

<a>asdadasdad12312</a>
asdadasdad12312
asdadasdad12312

 

5. Commonly used DOM operation

5.1 addClass removeClass

add to , remove class label

from pyquery import PyQuery as pq
html = '''
<div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link class='active1 a123' href="http://asda.com"><a>asdadasdad12312</a></link>
<link class='active2' href="http://asda1.com">asdadasdad12312</link>
<link class='movie1' href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
'''
doc = pq(html)
its=doc("link").items()
for it in its:
print(" add to :%s"%it.addClass('active1'))
print(" remove :%s"%it.removeClass('active1'))

Running results

 add to :<link class="active1 a123" href="http://asda.com"><a>asdadasdad12312</a></link>
remove :<link class="a123" href="http://asda.com"><a>asdadasdad12312</a></link>
add to :<link class="active2 active1" href="http://asda1.com">asdadasdad12312</link>
remove :<link class="active2" href="http://asda1.com">asdadasdad12312</link>
add to :<link class="movie1 active1" href="http://asda2.com">asdadasdad12312</link>
remove :<link class="movie1" href="http://asda2.com">asdadasdad12312</link>

It should be noted that there are already class Tags will not continue to be added

5.2 attr css

attr To get / Modify properties css add to style attribute

from pyquery import PyQuery as pq
html = '''
<div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link class='active1 a123' href="http://asda.com"><a>asdadasdad12312</a></link>
<link class='active2' href="http://asda1.com">asdadasdad12312</link>
<link class='movie1' href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
'''
doc = pq(html)
its=doc("link").items()
for it in its:
print(" modify :%s"%it.attr('class','active'))
print(" add to :%s"%it.css('font-size','14px'))

Running results

C:\Python27\python.exe D:/test_his/test_re_1.py
modify :<link class="active" href="http://asda.com"><a>asdadasdad12312</a></link>
add to :<link class="active" href="http://asda.com" ><a>asdadasdad12312</a></link>
modify :<link class="active" href="http://asda1.com">asdadasdad12312</link>
add to :<link class="active" href="http://asda1.com" >asdadasdad12312</link>
modify :<link class="active" href="http://asda2.com">asdadasdad12312</link>
add to :<link class="active" href="http://asda2.com" >asdadasdad12312</link>

attr css The operation directly modifies the

5.3 remove

remove Remove the label

from pyquery import PyQuery as pq
html = '''
<div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link class='active1 a123' href="http://asda.com"><a>asdadasdad12312</a></link>
<link class='active2' href="http://asda1.com">asdadasdad12312</link>
<link class='movie1' href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
'''
doc = pq(html)
its=doc("div")
print(' Get text results before removing :\n%s'%its.text())
it=its.remove('ul')
print(' Get text results after removal :\n%s'%it.text())

Running results

 Get text results before removing :
hello nihao
asdasd
asdadasdad12312
asdadasdad12312
asdadasdad12312
Get text results after removal :
hello nihao

other DOM Method reference :

http://pyquery.readthedocs.io/en/latest/api.html

6. Pseudo class selector

 

from pyquery import PyQuery as pq
html = '''
<div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link class='active1 a123' href="http://asda.com"><a>helloasdadasdad12312</a></link>
<link class='active2' href="http://asda1.com">asdadasdad12312</link>
<link class='movie1' href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
'''
doc = pq(html)
its=doc("link:first-child")
print(' First label :%s'%its)
its=doc("link:last-child")
print(' The last label :%s'%its)
its=doc("link:nth-child(2)")
print(' Second label :%s'%its)
its=doc("link:gt(0)") # Starting from scratch
print(" obtain 0 Future labels :%s"%its)
its=doc("link:nth-child(2n-1)")
print(" Get odd tags :%s"%its)
its=doc("link:contains('hello')")
print(" Get text containing hello The label of :%s"%its)

 

Running results

 First label :<link class="active1 a123" href="http://asda.com"><a>helloasdadasdad12312</a></link>
The last label :<link class="movie1" href="http://asda2.com">asdadasdad12312</link>
Second label :<link class="active2" href="http://asda1.com">asdadasdad12312</link>
obtain 0 Future labels :<link class="active2" href="http://asda1.com">asdadasdad12312</link>
<link class="movie1" href="http://asda2.com">asdadasdad12312</link>
Get odd tags :<link class="active1 a123" href="http://asda.com"><a>helloasdadasdad12312</a></link>
<link class="movie1" href="http://asda2.com">asdadasdad12312</link>
Get text containing hello The label of :<link class="active1 a123" href="http://asda.com"><a>helloasdadasdad12312</a></link>


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved