32.7. `tokenize` — 用于 Python 源代码的 Tokenizer ¶

源代码： Lib/tokenize.py

The tokenize module provides a lexical scanner for Python source code, implemented in Python. The scanner in this module returns comments as tokens as well, making it useful for implementing “pretty-printers,” including colorizers for on-screen displays.

To simplify token stream handling, all 运算符 and 定界符 tokens are returned using the generic token.OP token type. The exact type can be determined by checking the second field (containing the actual token string matched) of the tuple returned from tokenize.generate_tokens() for the character sequence that identifies a specific operator token.

The primary entry point is a generator :

tokenize. generate_tokens ( readline ) ¶

An older entry point is retained for backward compatibility:

tokenize. tokenize ( readline [ , tokeneater ] ) ¶

All constants from the token module are also exported from tokenize , as are two additional token type values that might be passed to the tokeneater function by tokenize() :

tokenize. COMMENT ¶

tokenize. NL ¶

Another function is provided to reverse the tokenization process. This is useful for creating tools that tokenize a script, modify the token stream, and write back the modified script.

tokenize. untokenize ( iterable ) ¶

exception tokenize. TokenError ¶

Note that unclosed single-quoted strings do not cause an error to be raised. They are tokenized as ERRORTOKEN , followed by the tokenization of their contents.

Example of a script re-writer that transforms float literals into Decimal objects:

def decistmt(s):
    """Substitute Decimals for floats in a string of statements.
    >>> from decimal import Decimal
    >>> s = 'print +21.3e-5*-.1234/81.7'
    >>> decistmt(s)
    "print +Decimal ('21.3e-5')*-Decimal ('.1234')/Decimal ('81.7')"
    >>> exec(s)
    -3.21716034272e-007
    >>> exec(decistmt(s))
    -3.217160342717258261933904529E-7
    """
    result = []
    g = generate_tokens(StringIO(s).readline)   # tokenize the string
    for toknum, tokval, _, _, _  in g:
        if toknum == NUMBER and '.' in tokval:  # replace NUMBER tokens
            result.extend([
                (NAME, 'Decimal'),
                (OP, '('),
                (STRING, repr(tokval)),
                (OP, ')')
            ])
        else:
            result.append((toknum, tokval))
    return untokenize(result)

32.7. tokenize — 用于 Python 源代码的 Tokenizer ¶

上一话题

下一话题

本页

快速搜索

32.7. `tokenize` — 用于 Python 源代码的 Tokenizer ¶