C编译器学习笔记(四)MCC文法

前言

MCC(My Compiler Collection)是我自己写的C语言编译器。MCC并不是按照C标准文法实现的,而是进行了修改。本文介绍MCC的文法。

外部声明

translation-unit:
    external-declaration
    translation-unit external-declaration

translation-unit是整个C语言文件,它由多个外部声明external-declaration组成。变量的定义、函数的声明、函数的定义统统当作外部声明处理。例如下面就是4个外部声明:

static int a;
char arr[20];
int hello(int a,char b);
typedef int * AstTypeName;
typedef int (* ARR)[3];

int main(int argc, char const *argv[])
{
}
external-declaration:
    common-header compound-statement
    common-header ;

common-header:
    declaration-specifiersopt init-declarator-listopt

external-declaration的第一行是函数定义,compound-statement代表函数体,这部分暂且不管。第二行表示变量或函数的声明。由于这两部分前面都一样,所以抽出公共部分构成common-header。所以common-header所代表的部分是这些:

static int a
char arr[20]
int hello(int a,char b)
typedef int * AstTypeName
typedef int (* ARR)[3]

int main(int argc, char const *argv[])

下面来看一下common-header的第一部分declaration-specifiers:

declaration-specifiers:
    storage-class-specifier declaration-specifiersopt
    type-specifier declaration-specifiersopt
    type-qualifier declaration-specifiersopt

storage-class-specifier: one of
    auto register static extern typedef

type-specifier: one of
    void char short int long float double signed unsigned struct-OR-union-specifier typedef-name

type-qualifier:  one of
    const volatile

typedef-name:
    identifier

不难看出declaration-specifiers就是一个或多个void、char、 short、 int、 long、 float 等关键字。其中struct-OR-union-specifier相对复杂一些:

struct-OR-union-specifier:
    struct-OR-union identifieropt { struct-declaration-list }
    struct-OR-union identifier

struct-OR-union: one of
    struct union

struct-declaration-list:
    struct-declaration
    struct-declaration-list struct-declaration

struct-declaration:
    specifier-qualifier-list struct-declarator-list ; 

specifier-qualifier-list:
    type-specifier specifier-qualijfier-listopt 
    type-qualifier specifier-qualifier-listopt 

struct-declarator-list:
    declarator
    struct-declarator-list, declarator

当这些关键字解析完后,common-header的剩余部分为:

a
arr[20]
hello(int a,char b)
* AstTypeName
(* ARR)[3]

main(int argc, char const *argv[])

以上就是common-header的第二部分init-declarator-list的内容,文法为:

init-declarator-list:
    init-declarator
    init-declarator-list , init-declarator

init-declarator:
    declarator
    declarator = initializer

init-declarator-list表达的意思是多个声明可以连在一起写,用逗号隔开。init-declarator表达的意思是声明是由带等号的或是不带等号的组成。这里的文法可以发现一个漏洞,就是函数声明/定义后面接等号也被允许,这个问题放到语意检查中解决。我们的例子中没有用逗号连写的声明,也没有=号,所以declarator所代表的代码为:

a
arr[20]
hello(int a,char b)
* AstTypeName
(* ARR)[3]

main(int argc, char const *argv[])
declarator:
    * declarator
    postfix-declarator

postfix-declarator:
     direct-declarator
     postfix-declarator [ conditional-expressionopt ]
     postfix-declarator ( parameter-type-list )

direct-declarator:
    identifier
    ( declarator )

declarator的作用是在direct-declarator前面加一个或多个*,去掉*号后,我们的例子符合direct-declarator的这几种形式:

a                                   //第一种
arr[20]                             //第三种
hello(int a,char b)                 //第四种
AstTypeName                         //第一种
(* ARR)[3]                          //第三种和第二种

main(int argc, char const *argv[])  //第四种

参数列表和声明很相似,但也有些不同,其一是声明支持初值,参数列表不支持初值。其二是参数列表支持可变参数。

parameter-type-list:
    parameter-list
    parameter-list, ...

parameter-list:
    parameter-declaration
    parameter-list , parameter-declaration

parameter-declaration:
    declaration-specifiers declarator

赋初值是由表达式expression完成的,这部分后面会介绍。

initializer:
    expression
    { initializer-list }
    { initializer-list , }

initializer-list:
    initializer
    initializer-list , initializer

语句

compound-statement:
    { declaration-listopt statement-listopt }

declaration-list:
    declaration
    declaration-list declaration

declaration: 
    common-header ;

混合语句compound-statement中包含声明和语句。声明前面已经介绍过了,下面我们来看一下语句。

statement-list:
    statement
    statement-list statement

statement:
    labeled-statement
    expression-statement
    compound-statement
    selection-statement
    iteration-statement
    jump-statement

语句由标签语句、表达式语句、选择语句、迭代语句、跳转语句构成。

labeled-statement:
    identifier : statement
    case conditional-expression : statement 
    default : statement

expression-statement:
    expressionopt ;

selection-statement:
    if ( expression ) statement
    if (expression ) statement else statement 
    switch ( expression ) statement

iteration-statement:
    while ( expression ) statement
    do statement while ( expression ) ;
    for ( expressionopt ; expressionopt ; expressionopt ) statement 

jump-statement:
    goto identifier ;
    continue ;
    break ;
    return expressionopt ;

这些语句无非就是关键字+表达式构成的,相对于声明来说非常容易理解。

表达式

expression:
    conditional-expression
    unary-expression assignment-operator expression 

assignment-operator: one of
    = *= /= %= += -= <<= >>= &= ^= |=

expression可以带赋值符号,也可以不带赋值符号。

下面是二元运算的文法,为了表达二元运算的优先级,设计了“套路相同但又有层次结构”的产生式。

conditional-expression:
    logical-OR-expression
    logical-OR-expression ? expression : conditional-expression

logical-OR-expression:
    logical-AND-expression
    logical-OR-expression || logical-AND-expression

logical-AND-expression:
    inclusive-OR-expression
    logical-AND-expression && inclusive-OR-expression

inclusive-OR-expression:
    exclusive-OR-expression
    inclusive-OR-expression | exclusive-OR-expression

exclusive-OR-expression:
    AND-expression
    exclusive-OR-expression ^ AND-expression

AND-expression:
    equality-expression
    AND-expression & equality-expression

equality-expression:
    relational-expression
    equality-expression == relational-expression
    equality-expression != relational-expression

relational-expression:
    shift-expression
    relational-expression < shift-expression
    relational-expression > shift-expression
    relational-expression <= shift-expression
    relational-expression >= shifit-expression

shift-expression:
    additive-expression
    shift-expression << additive-expression 
    shift-expression >> additive-expression 

additive-expression:
    multiplicative-expression
    additive-expression + multiplicative-expression 
    additive-expression - multiplicative-expression 

multiplicative-expression:
    cast-expression
    multiplicative-expression * cast-expression 
    multiplicative-expression / cast-expression 
    multiplicative-expression % cast-expression 

强制类型转换表达式:

cast-expression:
    unary-expression
    ( type-name ) cast-expression

接下来是一元表达式:

unary-expression:
    postfix-expression
    ++ unary-expression
    --unary-expression
    unary-operator cast-expression
    sizeof unary-expression
    sizeof ( type-name )

unary-operator: one of
    & * + - ~ !

postfix-expression:
    primary-expression
    postfix-expression [ expression ]
    postfix-expression ( argument-expression-listopt )
    postfix-expression . identifier
    postfix-expression -> identifier
    postfix-expression ++
    postfix-expression --

primary-expression:
    identifier
    constant
    string
    ( expression )

constant:
    integer-constant
    character-constant
    floating-constant
    enumeration-constant

argument-expression-list:
    expression
    argument-expression-list , expression 

下面的type-name是给类型转换和sizeof用的

type-name: 
    specifier-qualifier-list abstract-declaratoropt 

abstract-declarator:
    pointer
    pointeropt direct-abstract-declarator

direct-abstract-declarator:
    ( abstract-declarator )
    direct-abstract-declaratoropt [ constant-expressionopt ]
    direct-abstract-declaratoropt ( parameter-type-listopt )

打印语法树

语法分析的代码完全是按照文法写的,所以只要看懂文法,代码应该不难理解。因为语法分析的代码太长,本文就不具体分析代码了。

当语法树建立完成后可以用graphviz打印出来,便于调试。

image


posted @ 2022/02/07 21:33:22