程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Actual Python easy to draw word cloud (with detailed source code)

編輯:Python

Project background Although now there are many ready-made tools for making word cloud pictures , However, the following problems generally exist : Question 1 : Too many tools , See things in a blur , Uneven quality , Dyslexia of choice ; Question two : Most word cloud tools are more or less limited , The customized space is limited ; Question 3 : Some tools even charge . Based on the above questions , DeeDee felt it necessary to write an article Python Draw a picture of the word cloud , Because it is too simple ! What Xiaobai can do without any programming foundation , What tools are you looking for !

OK,FINE. We don't talk nonsense , Direct practice . Project operation A general word cloud rendering

To make a cloud picture of words, you must first have words , Where do words come from , DeeDee thought for a long time but couldn't figure it out . Since I have no idea , Then take the angry houlang soft text to play , There are different opinions about houlang , Didi dare not comment .

First , Let's save Hou Lang's full text as HL.txt, Interception part , Long like this :

next , Download and import the library needed to make the word cloud , The functions of each library are annotated .

1import jieba # Stuttering participle 2from wordcloud import WordCloud # Word cloud display library 3from PIL import Image # Image processing library 4import numpy as np # Support multidimensional array and matrix operation 5import matplotlib.pyplot as plt # Image gallery

then , hold HL.txt Read it out .

1# Read the text 2with open('HL.txt','r',encoding="UTF-8") as f:3 file = f.read() # Read the text as an entire string ,readlines You can read by line

Then , We need to break the whole string into words ,jieba war , Nothing grows .

1# Carry out word segmentation 2data_cut =jieba.cut(file,cut_all = False) # Precise pattern segmentation

After dividing the words, I found , What a comma 、 Semicolon 、 The full stop has also come out as a single word , That is not , We have to find a way stop they . Build stop lists , Put the words you don't like remove fall , you 're right , I don't like the way we talk about you .

1stop_words = [",",".",";","、"," We "," You "] # Custom stop word list

Of course , A friend will say , You are doing this because there is little text , It is convenient to make a stop list by yourself , But if there are thousands of texts, this stop word will not be enough .OK, Let's Baidu next stop list , casual download One , Save as stopwords.txt.stopwords.txt share 1893 A common stop , Long like this :

With a stop list , We have to use Python Read it out .

1stop_words = [] # Create an empty list 2with open("stopwords.txt", 'r', encoding='utf-8') as f:3 for line in f:4 if len(line)>0:5 stop_words.append(line.strip()) # Add the stop word to stop_words In the list

The stop word is ready , The next step is remove Stop words , We have got the words we need .

1data_result = [i for i in data_cut if i not in stop_words] # Get the words you need

print once data_result, Long like this :

This is not acceptable. , What we need is a string of words . therefore , Need to use join The function is separated by spaces and concatenates all words into a new string .replace In this case, it means that a new line ( ) Character is replaced with null .

1text = " ".join(data_result).replace("
","") # Concatenated into a string 2print(text)

Let's print it out text See the effect :

The word has , You can start designing word cloud pictures , Because all the words are in Chinese , and WordCloud Chinese is not supported by default , fall ! I also have to specify the font file path , Otherwise, there will be confusion . After all, Didi came from Europe , So I found a small block letter , You can set different fonts according to your preference , There are a lot of free fonts on the Internet .

1wc = WordCloud(2 # Set the font , If you don't specify it, there will be garbled code , This font file needs to be downloaded 3 font_path = " The demonstration is leisurely in regular script .ttf",4 background_color = "black",5 max_words = 5000,6)

After configuration , Let's create a picture and show it .

 1# Generate word cloud 2wc.generate(text) 3 4# Save word cloud 5wc.to_file("IMJG.jpg") # Save the picture 6 7# Exhibition 8plt.imshow(wc) # Process the pictures , And show its format 9plt.axis("off") # Turn off the axis 10plt.show() # Show the picture

The effect is as follows :

Here we are , You may think DeeDee is going to write the conclusion . sorry , It's not over yet. , Our goal should not be limited to this , In poetry and distance , Oh no , Is to customize your own word cloud . DeeDee is going to add a custom base map to the word cloud , Make the word cloud look more vivid . I thought for a long time. , I don't know what kind of drawing is suitable . So DeeDee opened a long useless Photoshop cc, I drew a picture that you can do better than me with beautiful pictures png.

I named this picture JG.png, And use Image Method open .

1# use Image Method to open the picture 2images = np.array(Image.open("JG.png"))

hold images Configure to word cloud wc In the middle , Pass to parameter mask.

1wc = WordCloud(2 # Set the font , If you don't specify it, there will be garbled code , This font file needs to be downloaded 3 font_path = " The demonstration is leisurely in regular script .ttf",4 background_color = "black",5 max_words = 5000,6 mask=images7)

Regenerate and save the word cloud , The effect is as follows :

ha-ha , Slightly ugly . If you are interested, you can make a base map or online download Try a base map , The base map should be as clear as possible 、 Just highlight the color as much as possible .

Some friends may ask why the word cloud at the beginning of my article is a sentence , Here are some explanations , Because reading HL.txt When I used readlines ah ~

Second, draw word cloud according to word frequency

The general word cloud system can be used in the above methods , But in real life, our needs may be more complex , There are more cases of drawing word cloud map according to word frequency . The following is J I often use a practical case , Open source code is presented .

The general idea is from Mysql Tens of thousands of transaction records are extracted from the database , use sql Statement before the transaction scale 100 The brand of select come out , Then the word cloud is made according to the transaction scale of each brand , The larger the text, the larger the transaction scale .

 1#-*- coding = uft-8 -*- 2#@Time : 2020/5/23 10:30 In the morning 3#@Author : I am a J Brother 4#@File : my_wordcloud.py 5 6# Given word frequency, make word cloud map 7from matplotlib import pyplot as plt # mapping , Data visualization 8from wordcloud import WordCloud # The word cloud 9from PIL import Image # The image processing 10import numpy as np # Matrix operations 11import pymysql # database 12import pandas as pd # Data processing 131415# Prepare text for word cloud ( word )16conn = pymysql.connect(host="localhost", user=" Yours ", passwd=" Yours ", db="test", port=3306, charset="utf8")17cur = conn.cursor()18sql = "select brand as name,round(sum(jine)/10000,0) as value from Sc_month4 group by name order by value desc limit 100;"19df = pd.read_sql(sql, conn)20print(df)21name = list(df.name) # word 22value = df.value # Frequency of words 23dic = dict(zip(name, value)) # Word frequency is stored in the form of a dictionary 24#print(dic)25cur.close()26conn.close()2728img = Image.open("tree.png")29img_arry = np.array(img)30wc = WordCloud(31 background_color="white",32 mask=img_arry,33 max_words=1000,34 max_font_size=500,35 #font_path=" The demonstration is leisurely in regular script .ttf"36 #font_path=" Youziku dragon Tibetan script .ttf"37 font_path=" The demonstration is leisurely in regular script .ttf"38)3940wc.generate_from_frequencies(dic) # Generate word cloud by word frequency 4142# Drawing pictures 43fig = plt.figure(1)44plt.imshow(wc)45plt.axis("off")46plt.show()4748# Output word cloud image to file 49plt.savefig("JGJG.jpg",dpi=400)

The generated word cloud looks like this :

junction language On the whole ,Python It's easy to make a cloud picture of words , Clear code , Less code , It is very suitable for beginners to get started . Of course , To present a good word cloud effect , The premise is that your data is clean and tidy , Therefore, the knowledge of data cleaning must be mastered .


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved