MCC(My Compiler Collection)是我自己写的C语言编译器。MCC并不是按照C标准文法实现的,而是进行了修改。本文介绍MCC的文法。
translation-unit:
external-declaration
translation-unit external-declaration
translation-unit是整个C语言文件,它由多个外部声明external-declaration组成。变量的定义、函数的声明、函数的定义统统当作外部声明处理。例如下面就是4个外部声明:
static int a;
char arr[20];
int hello(int a,char b);
typedef int * AstTypeName;
typedef int (* ARR)[3];
int main(int argc, char const *argv[])
{
}
external-declaration:
common-header compound-statement
common-header ;
common-header:
declaration-specifiersopt init-declarator-listopt
external-declaration的第一行是函数定义,compound-statement代表函数体,这部分暂且不管。第二行表示变量或函数的声明。由于这两部分前面都一样,所以抽出公共部分构成common-header。所以common-header所代表的部分是这些:
static int a
char arr[20]
int hello(int a,char b)
typedef int * AstTypeName
typedef int (* ARR)[3]
int main(int argc, char const *argv[])
下面来看一下common-header的第一部分declaration-specifiers:
declaration-specifiers:
storage-class-specifier declaration-specifiersopt
type-specifier declaration-specifiersopt
type-qualifier declaration-specifiersopt
storage-class-specifier: one of
auto register static extern typedef
type-specifier: one of
void char short int long float double signed unsigned struct-OR-union-specifier typedef-name
type-qualifier: one of
const volatile
typedef-name:
identifier
不难看出declaration-specifiers就是一个或多个void、char、 short、 int、 long、 float 等关键字。其中struct-OR-union-specifier相对复杂一些:
struct-OR-union-specifier:
struct-OR-union identifieropt { struct-declaration-list }
struct-OR-union identifier
struct-OR-union: one of
struct union
struct-declaration-list:
struct-declaration
struct-declaration-list struct-declaration
struct-declaration:
specifier-qualifier-list struct-declarator-list ;
specifier-qualifier-list:
type-specifier specifier-qualijfier-listopt
type-qualifier specifier-qualifier-listopt
struct-declarator-list:
declarator
struct-declarator-list, declarator
当这些关键字解析完后,common-header的剩余部分为:
a
arr[20]
hello(int a,char b)
* AstTypeName
(* ARR)[3]
main(int argc, char const *argv[])
以上就是common-header的第二部分init-declarator-list的内容,文法为:
init-declarator-list:
init-declarator
init-declarator-list , init-declarator
init-declarator:
declarator
declarator = initializer
init-declarator-list表达的意思是多个声明可以连在一起写,用逗号隔开。init-declarator表达的意思是声明是由带等号的或是不带等号的组成。这里的文法可以发现一个漏洞,就是函数声明/定义后面接等号也被允许,这个问题放到语意检查中解决。我们的例子中没有用逗号连写的声明,也没有=号,所以declarator所代表的代码为:
a
arr[20]
hello(int a,char b)
* AstTypeName
(* ARR)[3]
main(int argc, char const *argv[])
declarator:
* declarator
postfix-declarator
postfix-declarator:
direct-declarator
postfix-declarator [ conditional-expressionopt ]
postfix-declarator ( parameter-type-list )
direct-declarator:
identifier
( declarator )
declarator的作用是在direct-declarator前面加一个或多个*,去掉*号后,我们的例子符合direct-declarator的这几种形式:
a //第一种
arr[20] //第三种
hello(int a,char b) //第四种
AstTypeName //第一种
(* ARR)[3] //第三种和第二种
main(int argc, char const *argv[]) //第四种
参数列表和声明很相似,但也有些不同,其一是声明支持初值,参数列表不支持初值。其二是参数列表支持可变参数。
parameter-type-list:
parameter-list
parameter-list, ...
parameter-list:
parameter-declaration
parameter-list , parameter-declaration
parameter-declaration:
declaration-specifiers declarator
赋初值是由表达式expression完成的,这部分后面会介绍。
initializer:
expression
{ initializer-list }
{ initializer-list , }
initializer-list:
initializer
initializer-list , initializer
compound-statement:
{ declaration-listopt statement-listopt }
declaration-list:
declaration
declaration-list declaration
declaration:
common-header ;
混合语句compound-statement中包含声明和语句。声明前面已经介绍过了,下面我们来看一下语句。
statement-list:
statement
statement-list statement
statement:
labeled-statement
expression-statement
compound-statement
selection-statement
iteration-statement
jump-statement
语句由标签语句、表达式语句、选择语句、迭代语句、跳转语句构成。
labeled-statement:
identifier : statement
case conditional-expression : statement
default : statement
expression-statement:
expressionopt ;
selection-statement:
if ( expression ) statement
if (expression ) statement else statement
switch ( expression ) statement
iteration-statement:
while ( expression ) statement
do statement while ( expression ) ;
for ( expressionopt ; expressionopt ; expressionopt ) statement
jump-statement:
goto identifier ;
continue ;
break ;
return expressionopt ;
这些语句无非就是关键字+表达式构成的,相对于声明来说非常容易理解。
expression:
conditional-expression
unary-expression assignment-operator expression
assignment-operator: one of
= *= /= %= += -= <<= >>= &= ^= |=
expression可以带赋值符号,也可以不带赋值符号。
下面是二元运算的文法,为了表达二元运算的优先级,设计了“套路相同但又有层次结构”的产生式。
conditional-expression:
logical-OR-expression
logical-OR-expression ? expression : conditional-expression
logical-OR-expression:
logical-AND-expression
logical-OR-expression || logical-AND-expression
logical-AND-expression:
inclusive-OR-expression
logical-AND-expression && inclusive-OR-expression
inclusive-OR-expression:
exclusive-OR-expression
inclusive-OR-expression | exclusive-OR-expression
exclusive-OR-expression:
AND-expression
exclusive-OR-expression ^ AND-expression
AND-expression:
equality-expression
AND-expression & equality-expression
equality-expression:
relational-expression
equality-expression == relational-expression
equality-expression != relational-expression
relational-expression:
shift-expression
relational-expression < shift-expression
relational-expression > shift-expression
relational-expression <= shift-expression
relational-expression >= shifit-expression
shift-expression:
additive-expression
shift-expression << additive-expression
shift-expression >> additive-expression
additive-expression:
multiplicative-expression
additive-expression + multiplicative-expression
additive-expression - multiplicative-expression
multiplicative-expression:
cast-expression
multiplicative-expression * cast-expression
multiplicative-expression / cast-expression
multiplicative-expression % cast-expression
强制类型转换表达式:
cast-expression:
unary-expression
( type-name ) cast-expression
接下来是一元表达式:
unary-expression:
postfix-expression
++ unary-expression
--unary-expression
unary-operator cast-expression
sizeof unary-expression
sizeof ( type-name )
unary-operator: one of
& * + - ~ !
postfix-expression:
primary-expression
postfix-expression [ expression ]
postfix-expression ( argument-expression-listopt )
postfix-expression . identifier
postfix-expression -> identifier
postfix-expression ++
postfix-expression --
primary-expression:
identifier
constant
string
( expression )
constant:
integer-constant
character-constant
floating-constant
enumeration-constant
argument-expression-list:
expression
argument-expression-list , expression
下面的type-name是给类型转换和sizeof用的
type-name:
specifier-qualifier-list abstract-declaratoropt
abstract-declarator:
pointer
pointeropt direct-abstract-declarator
direct-abstract-declarator:
( abstract-declarator )
direct-abstract-declaratoropt [ constant-expressionopt ]
direct-abstract-declaratoropt ( parameter-type-listopt )
语法分析的代码完全是按照文法写的,所以只要看懂文法,代码应该不难理解。因为语法分析的代码太长,本文就不具体分析代码了。
当语法树建立完成后可以用graphviz打印出来,便于调试。