程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python captured the relevant reports of the Wang Leehom incident, and I ate a bigger melon

編輯:Python

Hello, Hello everyone , I'm Chen Chen ~

today , Let me teach you how to use python Come and eat melons ~

I've been brushed by Wang Leehom's melon these days , Many female friends expressed great shock and anger

My general impression of Wang Leehom is only based on his high education 、 Fluent English and full of classics , Many of his beautiful songs are still popular in the streets , I didn't expect such behavior .

Today I use Python To capture the content of the comment area below these two parties , And draw a cloud of words , The main codes are as follows

@retry(stop=stop_after_attempt(7))
def do_requests(uid, pageNum):
headers = {
"cookie": "SCF=Anhuv5v0Lu8oFE06-PmKm-uqVmUQgSwrLYauTMNCvEmRH0iOd-jT0poB-pgkpX_aJsOYqZjgw_F8TAZ0SL_aE9Q.; _T_WM=32be9637e54d4f58408755d6f8100d5c; SUB=_2A25MueV4DeRhGeRN7lQY8ynEwziIHXVsRYswrDV6PUJbkdAKLRPSkW1NU7D9XCuoP6vJEUUVjb0HcSPigsLzxFaW; SSOLoginState=1639814440",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36"
}
url = "https://weibo.cn/repost/L6w2sfDXb?&uid={}&&page={}".format(uid, pageNum)
response = requests.get(url, headers = headers)
return response.text
def get_comment(html_data):
html_text = BeautifulSoup(html_data, 'lxml')
comment_list = html_text.select("span.ctt")
return comment_list
def jieba_():
stop_words = set([line.strip() for line in open("chineseStopWords.txt", encoding="GBK").readlines()])
for word in [" reply ", " Is there any "]:
stop_words.add(word)
comment_list = []
with open("comment_data.txt", "r", encoding="utf-8") as comment_data_list:
for comment in comment_data_list:
comment_list.append(comment)
text = ", ".join(comment_list)
word_num = jieba.lcut(text, cut_all=False)
rule = re.compile(r"^[\u4e00-\u9fa5]+$")
word_num_selected = [word for word in word_num if word not in stop_words and
re.search(rule, word) and len(word) >= 2]
return word_num_selected
def plot_word_cloud(text):
# Open the background image of the word cloud
cloud_mask = np.array(Image.open('gua_1.jpg'))
# Define some properties of word cloud
wc = WordCloud(
# The background image is divided into white
background_color='white',
# Background pattern
mask=cloud_mask,
# Show the maximum number of words
max_words=200,
# According to Chinese
font_path='KAITI.ttf',
# Maximum size
max_font_size=100
)
text_ = ", ".join(text)
# Word cloud function
x = wc.generate(text_)
# Generate word cloud image
image = x.to_image()
# Show word cloud pictures
image.show()
# Save word cloud pictures
wc.to_file('melon_1.png')

The word cloud generated in the comment area for male owners is as follows , It can be seen that it is all abuse and resentment against the male Lord , There are many people who want to block the male owner .

And the comment area under his ex-wife's post , The generated word cloud is as follows , Everyone is encouraging his ex-wife to be strong 、 Come on, face life , Out of the trough of life .

Is it right python A lot of keywords can be extracted at once , Understand people's views on this matter

Interested partners can also try

My sharing ends here , If you like, just praise and pay attention ~


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved