程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python realizes recognizing provinces and cities in text and drawing

編輯:Python

Catalog

1. Get ready

2. Basic use

3. Advanced use

Doing it NLP( natural language processing ) Related tasks , We often encounter the need to identify and extract provinces 、 City 、 The needs of the administrative region . Although we can search the keyword table one by one to achieve the purpose of extraction , But we need to collect the keyword list of provinces and cities first , Relatively cumbersome .

Today I will introduce a module to you , You just need to pass the string to this module , He can return the province in this string to you 、 City 、 Zone keywords , And can mark it on the picture for you , It is Cpca modular .

1. Get ready

Before the start , You have to make sure that Python and pip Has been successfully installed on the computer , without , You can visit this article : Hyperdetail Python Installation guide   Installation .

( Optional 1)  If you use Python The goal is data analysis , It can be installed directly Anaconda, It has... Built in Python and pip.

( Optional 2)  Besides , Recommended VSCode Editor , It has many advantages

Please choose one of the following ways to enter the command to install the dependency

1. Windows Environmental Science open Cmd ( Start - function -CMD).

2. MacOS Environmental Science open Terminal (command+ Space input Terminal).

3. If you're using a VSCode Editor or Pycharm, You can directly use the Terminal.

pip install cpca

Be careful , at present cpca The module only supports Python3 And above .

stay windows The following problems may occur on the :

Building wheel for pyahocorasick (setup.py) ... error

First read the original text to download Microsoft Visual C++ Build Tools install VC++ Building tools , Again pip install cpca, Problem solvable .

2. Basic use

Through two lines of code, you can achieve the most basic provincial and urban extraction :

# official account : Python Practical treasure # 2022/06/23import cpcalocation_str = [    " Shennan Middle Road, bating street, Futian District, Shenzhen City, Guangdong Province 1025 New town building No 1 layer ",    " Tesla Shanghai Super factory is Tesla's first super factory outside the United States , Located in Shanghai, the people's Republic of China .",    " Sanxingdui site is located on the Bank of Yazi River in Sanxingdui Town, west of Guanghan City, Sichuan Province, China , It is a bronze age cultural site "]df = cpca.transform(location_str)print(df)

The effect is as follows :

province City District Address adcode
0 Guangdong province, shenzhen Futian district Shennan Middle Road, bating street 1025 New town building No 1 layer 440304
1 Shanghai None None .310000
2 Sichuan Province deyang Guanghan City By the Duck River in Sanxingdui town in the west of the city , It is a bronze age cultural site 510681

Pay attention to Article 3 of Guanghan City ,cpca Not only the county-level city Guanghan City in the sentence is recognized , It can also be automatically matched to Deyang City, which is the entrusted city , I have to say it's very powerful .

If you want to know that the program extracts the name of the province or city from the position of the string , You can add one  pos_sensitive=True  Parameters :

# official account : Python Practical treasure # 2022/06/23import cpcalocation_str = [    " Shennan Middle Road, bating street, Futian District, Shenzhen City, Guangdong Province 1025 New town building No 1 layer ",    " Tesla Shanghai Super factory is Tesla's first super factory outside the United States , Located in Shanghai, the people's Republic of China .",    " Sanxingdui site is located on the Bank of Yazi River in Sanxingdui Town, west of Guanghan City, Sichuan Province, China , It is a bronze age cultural site "]df = cpca.transform(location_str, pos_sensitive=True)print(df)

The effect is as follows :

(base) G:\push\20220623>python 1.py
      province City District Address adcode province _pos City _pos District _pos
0   Guangdong province, shenzhen Futian district Shennan Middle Road, bating street 1025 New town building No 1 layer  440304      0      3      6
1   Shanghai None None .310000     38     -1     -1
2   Sichuan Province deyang Guanghan City By the Duck River in Sanxingdui town in the west of the city , It is a bronze age cultural site  510681      9     -1     12

It marks the identification to the province 、 City 、 Key location of the zone (index), Of course, if it is Deyang City, this special identification will be marked as -1.

3. Advanced use

It can also batch identify multiple regions from large pieces of text :

# official account : Python Practical treasure # 2022/06/23import cpcalong_text = " The evaluation of a city always includes personal feelings . If you like a city , It is likely that I like myself at that time and place ."\    " In Guangzhou 、 I have read in Hong Kong , Worked , Bought a house in Shenzhen 、 A short life , I went on several business trips to Beijing ."\    " I would like to focus on Guangzhou 、 Shenzhen and Hong Kong , By the way, Beijing . in general , I feel comfortable in Guangzhou 、"\    " Hong Kong exquisite 、 Shenzhen is young and has a good atmosphere 、 Beijing has a rough atmosphere . Answer: the Lord has chosen Guangzhou ."df = cpca.transform_text_with_addrs(long_text, pos_sensitive=True)print(df)

The effect is as follows :

(base) G:\push\20220623>python 1.py
           province City District Address adcode province _pos City _pos District _pos
0        Guangdong province, guangzhou  None     440100     -1     44     -1
1    Hong Kong Special Administrative Region  None  None     810000     47     -1     -1
2        Guangdong province, shenzhen  None     440300     -1     58     -1
3        The Beijing municipal  None  None     110000     71     -1     -1
4        Guangdong province, guangzhou  None     440100     -1     86     -1
5        Guangdong province, shenzhen  None     440300     -1     89     -1
6    Hong Kong Special Administrative Region  None  None     810000     92     -1     -1
7        The Beijing municipal  None  None     110000    100     -1     -1
8        Guangdong province, guangzhou  None     440100     -1    110     -1
9    Hong Kong Special Administrative Region  None  None     810000    115     -1     -1
10       Guangdong province, shenzhen  None     440300     -1    120     -1
11       The Beijing municipal  None  None     110000    128     -1     -1
12       Guangdong province, guangzhou  None     440100     -1    143     -1

More Than This , The module also comes with some simple drawing tools , The data output above can be drawn in the form of thermal diagram on the map :

# official account : Python Practical treasure # 2022/06/23import cpcafrom cpca import drawerlong_text = " The evaluation of a city always includes personal feelings . If you like a city , It is likely that I like myself at that time and place ."\    " In Guangzhou 、 I have read in Hong Kong , Worked , Bought a house in Shenzhen 、 A short life , I went on several business trips to Beijing ."\    " I would like to focus on Guangzhou 、 Shenzhen and Hong Kong , By the way, Beijing . in general , I feel comfortable in Guangzhou 、"\    " Hong Kong exquisite 、 Shenzhen is young and has a good atmosphere 、 Beijing has a rough atmosphere . Answer: the Lord has chosen Guangzhou ."df = cpca.transform_text_with_addrs(long_text, pos_sensitive=True)drawer.draw_locations(df[cpca._ADCODE], "df.html")

This error may be reported when running :

(base) G:\push\20220623>python 1.py
Traceback (most recent call last):
  File "1.py", line 12, in <module>
    drawer.draw_locations(df[cpca._ADCODE], "df.html")
  File "G:\Anaconda3\lib\site-packages\cpca\drawer.py", line 41, in draw_locations
    import folium
ModuleNotFoundError: No module named 'folium'

Use pip Can be installed :

pip install folium

Then rerun the code , Will generate... In the current directory df.html, Double-click to open , The effect is as follows :

How to use it? , Does it feel very convenient ? In the future, this module will be sufficient for location identification .

For more details, you can visit the Github Home page reading , The project README Written entirely in Chinese , It's very easy to read :

This is about Python This is the end of the article on how to recognize provinces and cities in characters and draw them , More about Python Please search the previous articles of the software development network or continue to browse the following related articles. I hope you will support the software development network in the future !



  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved