主函数最一开始回调用SetupLexe()函数,这个函数设置了一个Scanners数组,里面是对应符号的函数指针。
void SetupLexer(void)
{
int i;
for (i = 0; i < END_OF_FILE + 1; i++)
{
if (IsLetter(i)) // [a-z A-Z _ ]
{
Scanners[i] = ScanIdentifier;
}
else if (IsDigit(i))
{
Scanners[i] = ScanNumericLiteral;
}
else
{
Scanners[i] = ScanBadChar;
}
}
Scanners[END_OF_FILE] = ScanEOF;
Scanners['\''] = ScanCharLiteral; // wide chars/strings are parsed in ScanIdentifier(void)
Scanners['"'] = ScanStringLiteral;
Scanners['+'] = ScanPlus;
Scanners['-'] = ScanMinus;
Scanners['*'] = ScanStar;
Scanners['/'] = ScanSlash;
Scanners['%'] = ScanPercent;
Scanners['<'] = ScanLess;
Scanners['>'] = ScanGreat;
Scanners['!'] = ScanExclamation;
Scanners['='] = ScanEqual;
Scanners['|'] = ScanBar;
Scanners['&'] = ScanAmpersand;
Scanners['^'] = ScanCaret;
Scanners['.'] = ScanDot;
// see Macro SINGLE_CHAR_SCANNER(t)
Scanners['{'] = ScanLBRACE;
Scanners['}'] = ScanRBRACE;
Scanners['['] = ScanLBRACKET;
Scanners[']'] = ScanRBRACKET;
Scanners['('] = ScanLPAREN;
Scanners[')'] = ScanRPAREN;
Scanners[','] = ScanCOMMA;
Scanners[';'] = ScanSEMICOLON;
Scanners['~'] = ScanCOMP;
Scanners['?'] = ScanQUESTION;
Scanners[':'] = ScanCOLON;
}
在ast.h中定义宏
#define NEXT_TOKEN CurrentToken = GetNextToken();
所以每次NEXT_TOKEN,就会调用GetNextToken()函数。这个函数中会按照当前字符调用不同的处理函数:
tok = (*Scanners[*CURSOR])();
以a-z A-Z _
开头,调用ScanIdentifier函数,将接下来连续的字母或数字组成的字符串作为整体,判断其是否为关键字,若是关键字返回该关键字的tok,非关键字返回非关键字固定的tok,并将值存到TokenValue。
以0-9
开头,调用ScanNumericLiteral函数,处理进制相关后,调用ScanIntLiteral和ScanFloatLiteral,分别处理整形和浮点数。
以'
开头,调用ScanCharLiteral函数,把字符放到了TokenValue.i里。
以"
开头,调用ScanStringLiteral函数,把字符放到了TokenValue.p里。
以+
开头,调用ScanPlus函数,再下一个是+
,就是++
。再下一个是=
,就是+=
。否则的话,就是+
。
其它的很多符号也是类似处理的。
lex.c中用到了str.c中的两个函数。在str.c中声明了一个名称池NameBuckets,变量名都放进去了。如果两个名字一样,InternName()返回同一个名称池中的值。
AppendSTR()函数用于在字符串后面添加\0