程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

A Lexical Analyzer for High-level Languages ​​Based on Python

編輯:Python

1. Requirements Analysis

Requirement: Describe the function to be completed by the lexical analysis system

Design and implement a lexical analyzer for high-level languages. The basic functions are as follows:

  • The following types of words are recognized:

    • Identifier (consisting of uppercase and lowercase letters, numbers, and underscores, but must start with a letter or underscore)

    • Keywords (① Type keywords: integer, floating point, boolean, record; ② if and else in branch structure; ③ do and while in loop structure; ④ procedure declaration and callkeywords)

    • Operators (① arithmetic operators; ② relational operators; ③ logical operations)

    • Delimiter (① Delimiter used in assignment statement, such as "="; ② Delimiter used at the end of sentence, such as ";"; ③ Delimiter used in array representation, such as "["and "]"; ④ delimiter "." for floating point number representation)

    • Constants (unsigned integers (including octal and hexadecimal numbers), floating point numbers (including scientific notation), string constants, etc.)

    • Comment (// form)

  • Able to perform simple error handling, i.e. identify illegal characters in test cases.When the program outputs the error message, it needs to output the specific error type (ie lexical error), the location of the error (source program line number) and the relevant description text, the format is:

Lexical error at Line [line number]: [description text].

There are no specific requirements for the content of the description text (for example: illegal characters), but the error type and line number of the error must be correct, because this is the only criterion for judging whether the output error message is correct.

  • The input form of the system: It is required to be able to import test cases through files.Test cases should cover the types of words listed in "Experimental Content".

  • The output form of the system: print out the token sequence corresponding to the test case.

2. Grammar Design

Requirements: Expand a description of the following content

  • Give a description of the lexical rules (regular grammar or regular expressions) for each type of word

Identifier:

[_ | [a-z]][\w*]

Keywords:

r'((auto){1}|(double){1}|(int){1}|(if){1}|' \r'(#include){1}|(return){1}|(char){1}|(stdio\.h){1}|(const){1})'

Operator:

r'(\+\+|\+=|\+|--|-=|-|\*=|/=|/|%=|%)'

Delimiter:

r'([,:\{}:)(<>])'

Constant:

r'(\d+[.]?\d+)'
  • Translation diagram of various words

The rest of the word conversion diagrams are simpler

Constant:

3. System Design

Requirements: It is divided into system outline design and system detailed design.

  • System outline design: Provide the necessary macro-level design diagrams of the system, such as system frame diagrams, data flow diagrams, function module diagrams, etc., as well as corresponding text descriptions.

Function modules:

  • Detailed system design: expand the description of the following work

Design of core data structure

Lists using Python list[]

Main function function description

def is_blank(self, index):Determine whether it is a whitespace character
def skip_blank(self, index):skip whitespace
def is_keyword(self, value):Determine whether it is a keyword
def main(self):The main program of lexical analysis

Program flow chart of the core part of the program

4. System implementation and result analysis

Requirements: Expand a description of the following.

  • Problems encountered during system implementation;

The system's recognition of hexadecimal numbers is not taken into account.

The solution is to judge whether the first number of the constant is 0 when judging the constant, then judge whether the following letter is X, if so, judge whether the following string is a series of 0-9 orA-F, if it is, the word is considered constant.

  • Output its lexical analysis results for a test program;

The test sample is as follows:

  • Analyze the experimental results.

The results of the lexical analysis are generally correct, but the && is not recognized, but two &


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved