程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Using Python to easily implement the project of drawing word cloud (with detailed source code)

編輯:Python

Catalog

Project background

Project operation

One 、 General word cloud rendering

Two 、 Draw word cloud according to word frequency

junction language

Project background

Although now there are many ready-made tools for making word cloud pictures , However, the following problems generally exist :

Question 1 : Too many tools , See things in a blur , Uneven quality , Dyslexia of choice ;

Question two : Most word cloud tools are more or less limited , The customized space is limited ;

Question 3 : Some tools even charge .

Based on the above questions , Feel it necessary to write an article Python Draw a picture of the word cloud , Because it is too simple ! What Xiaobai can do without any programming foundation , What tools are you looking for !

OK,FINE. We don't talk nonsense , Direct practice .

Project operation One 、 General word cloud rendering

To make a cloud picture of words, you must first have words , Where do words come from , DeeDee thought for a long time but couldn't figure it out . Since I have no idea , Then take the angry houlang soft text to play , There are different opinions about houlang , Didi dare not comment .

First , Let's save Hou Lang's full text as HL.txt, Interception part , Long like this :

next , Download and import the library needed to make the word cloud , The functions of each library are annotated .

import jieba # Stuttering participle from wordcloud import WordCloud # Word cloud display library from PIL import Image # Image processing library import numpy as np # Support multidimensional array and matrix operation import matplotlib.pyplot as plt # Image gallery

then , hold HL.txt Read it out .

# Read the text with open('HL.txt','r',encoding="UTF-8") as f: file = f.read() # Read the text as an entire string ,readlines You can read by line

Then , We need to break the whole string into words ,jieba war , Nothing grows .

# Carry out word segmentation data_cut =jieba.cut(file,cut_all = False) # Precise pattern segmentation

After dividing the words, I found , What a comma 、 Semicolon 、 The full stop has also come out as a single word , That is not , We have to find a way stop they . Build stop lists , Put the words you don't like remove fall , you 're right , I don't like the way we talk about you .

stop_words = [",",".",";","、"," We "," You "] # Custom stop word list

Of course , A friend will say , You are doing this because there is little text , It is convenient to make a stop list by yourself , But if there are thousands of texts, this stop word will not be enough .OK, Let's Baidu next stop list , casual download One , Save as stopwords.txt.stopwords.txt share 1893 A common stop , Long like this :

With a stop list , We have to use Python Read it out .

stop_words = [] # Create an empty list with open("stopwords.txt", 'r', encoding='utf-8') as f: for line in f: if len(line)>0: stop_words.append(line.strip()) # Add the stop word to stop_words In the list

The stop word is ready , The next step is remove Stop words , We have got the words we need .

data_result = [i for i in data_cut if i not in stop_words] # Get the words you need

print once data_result, Long like this :

This is not acceptable. , What we need is a string of words . therefore , Need to use join The function is separated by spaces and concatenates all words into a new string .replace In this case, it means that a new line (\n) Character is replaced with null .

text = " ".join(data_result).replace("\n","") # Concatenated into a string print(text)

Let's print it out text See the effect :

The word has , You can start designing word cloud pictures , Because all the words are in Chinese , and WordCloud Chinese is not supported by default , fall ! I also have to specify the font file path , Otherwise, there will be confusion . After all, Didi came from Europe , So I found a small block letter , You can set different fonts according to your preference , There are a lot of free fonts on the Internet .

wc = WordCloud( # Set the font , If you don't specify it, there will be garbled code , This font file needs to be downloaded font_path = " The demonstration is leisurely in regular script .ttf", background_color = "black", max_words = 5000,)

After configuration , Let's create a picture and show it .

# Generate word cloud wc.generate(text)# Save word cloud wc.to_file("IMJG.jpg") # Save the picture # Exhibition plt.imshow(wc) # Process the pictures , And show its format plt.axis("off") # Turn off the axis plt.show() # Show the picture

The effect is as follows :

Here we are , You may think DeeDee is going to write the conclusion . sorry , It's not over yet. , Our goal should not be limited to this , In poetry and distance , Oh no , Is to customize your own word cloud . DeeDee is going to add a custom base map to the word cloud , Make the word cloud look more vivid . I thought for a long time. , I don't know what kind of drawing is suitable . So DeeDee opened a long useless Photoshop cc, I drew a picture that you can do better than me with beautiful pictures png.

I named this picture JG.png, And use Image Method open .

# use Image Method to open the picture images = np.array(Image.open("JG.png"))

hold images Configure to word cloud wc In the middle , Pass to parameter mask.

wc = WordCloud( # Set the font , If you don't specify it, there will be garbled code , This font file needs to be downloaded font_path = " The demonstration is leisurely in regular script .ttf", background_color = "black", max_words = 5000, mask=images)

Regenerate and save the word cloud , The effect is as follows :

ha-ha , Slightly ugly . If you are interested, you can make a base map or online download Try a base map , The base map should be as clear as possible 、 Just highlight the color as much as possible .

Some friends may ask why the word cloud at the beginning of my article is a sentence , Here are some explanations , Because reading HL.txt When I used readlines ah ~

Two 、 Draw word cloud according to word frequency

The general word cloud system can be used in the above methods , But in real life, our needs may be more complex , There are more cases of drawing word cloud map according to word frequency . The following is J I often use a practical case , Open source code is presented .

The general idea is from Mysql Tens of thousands of transaction records are extracted from the database , use sql Statement before the transaction scale 100 The brand of select come out , Then the word cloud is made according to the transaction scale of each brand , The larger the text, the larger the transaction scale .

#-*- coding = uft-8 -*-#@Time : 2020/5/23 10:30 In the morning #@Author : I am a J Brother #@File : my_wordcloud.py# Given word frequency, make word cloud map from matplotlib import pyplot as plt # mapping , Data visualization from wordcloud import WordCloud # The word cloud from PIL import Image # The image processing import numpy as np # Matrix operations import pymysql # database import pandas as pd # Data processing # Prepare text for word cloud ( word )conn = pymysql.connect(host="localhost", user=" Yours ", passwd=" Yours ", db="test", port=3306, charset="utf8")cur = conn.cursor()sql = "select brand as name,round(sum(jine)/10000,0) as value from Sc_month4 group by name order by value desc limit 100;"df = pd.read_sql(sql, conn)print(df)name = list(df.name) # word value = df.value # Frequency of words dic = dict(zip(name, value)) # Word frequency is stored in the form of a dictionary #print(dic)cur.close()conn.close()img = Image.open("tree.png")img_arry = np.array(img)wc = WordCloud( background_color="white", mask=img_arry, max_words=1000, max_font_size=500, #font_path=" The demonstration is leisurely in regular script .ttf" #font_path=" Youziku dragon Tibetan script .ttf" font_path=" The demonstration is leisurely in regular script .ttf")wc.generate_from_frequencies(dic) # Generate word cloud by word frequency # Drawing pictures fig = plt.figure(1)plt.imshow(wc)plt.axis("off")plt.show()# Output word cloud image to file plt.savefig("JGJG.jpg",dpi=400)

The generated word cloud looks like this :

junction language

On the whole ,Python It's easy to make a cloud picture of words , Clear code , Less code , It is very suitable for beginners to get started . Of course , To present a good word cloud effect , The premise is that your data is clean and tidy , Therefore, the knowledge of data cleaning must be mastered .

This is about using Python This is the end of the article on the easy implementation of the word cloud drawing project , More about Python Please search the previous articles of the software development network or continue to browse the following related articles. I hope you can support the software development network in the future !



  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved