程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Super simple Python Chinese pinyin conversion tool, you must try it

編輯:Python

Turn Chinese characters into Pinyin , It can be used for batch Chinese phonetic notation 、 Word order 、 Common scenarios such as phonetic retrieval of words .

Now there are many Pinyin conversion tools on the Internet , be based on Python There are many open source modules , Today, I will introduce a module with the most functions and features :  pypinyin , It supports the following features :

  • 1. Match the most correct Pinyin intelligently according to the phrase .
  • 2. Support for polyphony .
  • 3. Simple traditional support , Phonetic support .
  • 4. Support a variety of different Pinyin / Phonetic style .
  • 5. Command line tools one click conversion

1. Get ready

Before the start , You have to make sure that Python and pip Has been successfully installed on the computer , without , Please install it first .

( Optional 1)  If you use Python The goal is data analysis , It can be installed directly Anaconda: It has... Built in Python and pip.

( Optional 2)  Besides , Recommended VSCode Editor , It has many advantages .

Please choose one of the following ways to enter the command to install the dependency : 1. Windows Environmental Science open Cmd ( Start - function -CMD). 2. MacOS Environmental Science open Terminal (command+ Space input Terminal). 3. If you're using a VSCode Editor or Pycharm, You can directly use the Terminal.

pip install pypinyin

2. Basic use

The most common Pinyin conversion methods are as follows :

# Python Practical treasure
from pypinyin import pinyin, lazy_pinyin, Style
pinyin(' center ')
# [['zhōng'], ['xīn']]

Recognize polyphonic characters :

# Python Practical treasure
from pypinyin import pinyin, lazy_pinyin, Style
pinyin(' center ', heteronym=True) # Enable polyphonic mode
# [['zhōng', 'zhòng'], ['xīn']]

Set the output style , Only the first letter is recognized :

# Python Practical treasure
from pypinyin import pinyin, lazy_pinyin, Style
pinyin(' center ', style=Style.FIRST_LETTER) # Set Pinyin style
# [['z'], ['x']]

Modify the tone output position , The tone is displayed after the corresponding letter , Or the last display tone of Pinyin :

# Python Practical treasure
from pypinyin import pinyin, lazy_pinyin, Style
# TONE2 The tone is displayed after the corresponding letter
pinyin(' center ', style=Style.TONE2, heteronym=True)
# [['zho1ng', 'zho4ng'], ['xi1n']]
# TONE3 The last display tone of Pinyin
pinyin(' center ', style=Style.TONE3, heteronym=True)
# [['zhong1', 'zhong4'], ['xin1']]

Regardless of polyphony :

# Python Practical treasure
from pypinyin import pinyin, lazy_pinyin, Style
lazy_pinyin(' center ') # Regardless of polyphony
# ['zhong', 'xin']

Don't use v Instead of ü:

# Python Practical treasure
from pypinyin import pinyin, lazy_pinyin, Style
lazy_pinyin(' strategic ', v_to_u=True) # Don't use v Express ü
# ['zhan', 'lüe']

Mark softly :

# Python Practical treasure
from pypinyin import pinyin, lazy_pinyin, Style
# Use 5 Sign whisper
lazy_pinyin(' clothes ', style=Style.TONE3, neutral_tone_with_five=True)
# ['yi1', 'shang5']

Use the command line one key to recognize Pinyin :

# Python Practical treasure
python -m pypinyin music
# yīn yuè

3. Advanced use

Customize the phonetic display style

We can go through register() To realize the requirement of customized Pinyin style :

from pypinyin import lazy_pinyin
from pypinyin.style import register
@register('kiss')
def kiss(pinyin, **kwargs):
    return ' {0}'.format(pinyin)
  
lazy_pinyin(' kiss ', style='kiss')
# [' me', ' me']

You can see , By defining a kiss function , Use register Decorator , We have created a new style, This style Can be directly used for pinyin conversion parameters , Very convenient .

in addition , All modules come with style Its effects are as follows :

@unique
class Style(IntEnum):
    """ Pinyin style """
    #: Common style , Without tone . Such as : China -> ``zhong guo``
    NORMAL = 0
    #: Standard tone style , The tone of Pinyin is on the first letter of vowel ( Default style ). Such as : China -> ``zhōng guó``
    TONE = 1
    #: Tone style 2, That is, the tone of Pinyin comes after each vowel , Use numbers [1-4] To said . Such as : China -> ``zho1ng guo2``
    TONE2 = 2
    #: Tone style 3, That is, the tone of Pinyin comes after each Pinyin , Use numbers [1-4] To said . Such as : China -> ``zhong1 guo2``
    TONE3 = 8
    #: Initials style , Return only the initial consonant of each Pinyin ( notes : Some pinyin have no initials , See `#27`_). Such as : China -> ``zh g``
    INITIALS = 3
    #: The style of the initials , Return only the initials of Pinyin . Such as : China -> ``z g``
    FIRST_LETTER = 4
    #: Vowel style , Only the final part of each pinyin is returned , Without tone . Such as : China -> ``ong uo``
    FINALS = 5
    #: Standard vowel style , With a tone , On the first vowel . Such as : China -> ``ōng uó``
    FINALS_TONE = 6
    #: Vowel style 2, With a tone , The tone comes after each vowel , Use numbers [1-4] To said . Such as : China -> ``o1ng uo2``
    FINALS_TONE2 = 7
    #: Vowel style 3, With a tone , The tone comes after the Pinyin , Use numbers [1-4] To said . Such as : China -> ``ong1 uo2``
    FINALS_TONE3 = 9
    #: Phonetic style , With a tone , the high and level tone ( The first sound ) Not marked . Such as : China -> ``ㄓㄨㄥ ㄍㄨㄛˊ``
    BOPOMOFO = 10
    #: Phonetic style , Just initials . Such as : China -> ``ㄓ ㄍ``
    BOPOMOFO_FIRST = 11
    #: The contrast style between Chinese pinyin and Russian alphabet , The tone comes after the Pinyin , Use numbers [1-4] To said . Such as : China -> ``чжун1 го2``
    CYRILLIC = 12
    #: The contrast style between Chinese pinyin and Russian alphabet , Just initials . Such as : China -> ``ч г``
    CYRILLIC_FIRST = 13

Handle special characters

By default , Special characters in text will not be processed , Return as is :

pinyin(' Hello **')
# [['nǐ'], ['hǎo'], ['**']]

However, if you want to handle these special characters, it is also possible , such as :

ignore : Ignore this character

pinyin(' Hello **', errors='ignore')
# [['nǐ'], ['hǎo']]

errors : Replace with remove   \u   Of unicode code :

pinyin(' Hello **', errors='replace')
# [['nǐ'], ['hǎo'], ['26062606']]

callable object  : Provide a callback function , Accept characters without Pinyin ( strand ) As a parameter , Supported return value types :  unicode or  list or  None:

pinyin(' Hello **', errors=lambda x: 'star')
# [['nǐ'], ['hǎo'], ['star']]
pinyin(' Hello **', errors=lambda x: None)
# [['nǐ'], ['hǎo']]

The return value type is list when , automatically expend list:

pinyin(' Hello **', errors=lambda x: ['star' for _ in x])
# [['nǐ'], ['hǎo'], ['star'], ['star']]
# Specify a polyphone
pinyin(' Hello **', heteronym=True, errors=lambda x: [['star', '*'] for _ in x])
# [['nǐ'], ['hǎo'], ['star', '*'], ['star', '*']]

Custom Pinyin Library

If you feel that the output effect of the module is not satisfactory , Or you want to do something special , Can pass   load_single_dict()  or   load_phrases_dict()   Modify the result by customizing the Pinyin library :

from pypinyin import lazy_pinyin, load_phrases_dict, Style, load_single_dict
hans = ' orange '
lazy_pinyin(hans, style=Style.TONE2)
# ['jie2', 'zi3']
load_phrases_dict({' orange ': [['jú'], ['zǐ']]}) # increase " orange " phrase
lazy_pinyin(hans, style=Style.TONE2)
# ['ju2', 'zi3']
hans = ' Not yet '
lazy_pinyin(hans, style=Style.TONE2)
# ['hua2n', 'me2i']
load_single_dict({ord(' also '): 'hái,huán'}) # adjustment " also " The phonetic order of the characters
lazy_pinyin(' Not yet ', style=Style.TONE2)
# ['ha2i', 'me2i']

  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved