C编译器学习笔记(二)词法分析

lex.c

主函数最一开始回调用SetupLexe()函数,这个函数设置了一个Scanners数组,里面是对应符号的函数指针。

void SetupLexer(void)
{
    int i;

    for (i = 0; i < END_OF_FILE + 1; i++)
    {
        if (IsLetter(i))    // [a-z A-Z _ ]
        {
            Scanners[i] = ScanIdentifier;
        }
        else if (IsDigit(i))
        {
            Scanners[i] = ScanNumericLiteral;
        }
        else
        {
            Scanners[i] = ScanBadChar;
        }
    }

    Scanners[END_OF_FILE] = ScanEOF;
    Scanners['\''] = ScanCharLiteral;   // wide chars/strings are parsed in ScanIdentifier(void)
    Scanners['"']  = ScanStringLiteral;
    Scanners['+']  = ScanPlus;
    Scanners['-']  = ScanMinus;
    Scanners['*']  = ScanStar;
    Scanners['/']  = ScanSlash;
    Scanners['%']  = ScanPercent;
    Scanners['<']  = ScanLess;
    Scanners['>']  = ScanGreat;
    Scanners['!']  = ScanExclamation;
    Scanners['=']  = ScanEqual;
    Scanners['|']  = ScanBar;
    Scanners['&']  = ScanAmpersand;
    Scanners['^']  = ScanCaret;
    Scanners['.']  = ScanDot;
    // see Macro SINGLE_CHAR_SCANNER(t)
    Scanners['{']  = ScanLBRACE;
    Scanners['}']  = ScanRBRACE;
    Scanners['[']  = ScanLBRACKET;
    Scanners[']']  = ScanRBRACKET;
    Scanners['(']  = ScanLPAREN;
    Scanners[')']  = ScanRPAREN;
    Scanners[',']  = ScanCOMMA;
    Scanners[';']  = ScanSEMICOLON;
    Scanners['~']  = ScanCOMP;
    Scanners['?']  = ScanQUESTION;
    Scanners[':']  = ScanCOLON;

}

在ast.h中定义宏

#define NEXT_TOKEN  CurrentToken = GetNextToken();

所以每次NEXT_TOKEN,就会调用GetNextToken()函数。这个函数中会按照当前字符调用不同的处理函数:

tok = (*Scanners[*CURSOR])();

如何处理字符串的

a-z A-Z _ 开头,调用ScanIdentifier函数,将接下来连续的字母或数字组成的字符串作为整体,判断其是否为关键字,若是关键字返回该关键字的tok,非关键字返回非关键字固定的tok,并将值存到TokenValue。
0-9开头,调用ScanNumericLiteral函数,处理进制相关后,调用ScanIntLiteral和ScanFloatLiteral,分别处理整形和浮点数。

'开头,调用ScanCharLiteral函数,把字符放到了TokenValue.i里。
"开头,调用ScanStringLiteral函数,把字符放到了TokenValue.p里。

+开头,调用ScanPlus函数,再下一个是+,就是++。再下一个是=,就是+=。否则的话,就是+

其它的很多符号也是类似处理的。

str.c

lex.c中用到了str.c中的两个函数。在str.c中声明了一个名称池NameBuckets,变量名都放进去了。如果两个名字一样,InternName()返回同一个名称池中的值。

AppendSTR()函数用于在字符串后面添加\0

posted @ 2022/01/03 15:43:20