C编译器学习笔记(三)文法

文法

下面是一个描述句子组成成分的方法:

句子 -> 主语 谓语 宾语 	     // 句子由主语和谓语和宾语构成
主语 -> 名词 | 代词 	     // 主语由名词或代词构成
谓语 -> 动词 		     // 谓语由动词构成
宾语 -> 形容词 | 名词 | 代词    // 宾语由形容词或名词或代词构成
代词 -> "I"                   //
动词 -> "love"                // 能继续推导的叫非终结符, 如动词
名词 -> "cake"                // 不能继续推导的叫终结符, 如"cake"

上面的例子太过简单,如果把这种描述方法运用的复杂一点,不知道你能不能看懂:

// 程序是由混合结构构成的,混合结构是由多个语句构成的
Program -> CompoundStatement
CompoundStatement -> { StatementList_opt }
StatementList -> Statement | StatementList Statement

// 语句由if、while、混合结构、ExpressionStatement构成
Statement -> IfStatement | WhileStatement | CompoundStatement | ExpressionStatement

// if 和 while语句
IfStatement -> if ( expression) Statement 
IfStatement -> if ( expression) Statement else Statement 
WhileStatement -> while(expression) Statement 

// ExpressionStatement由赋值语句和声明构成
ExpressionStatement -> id = Expression ; 
ExpressionStatement -> Declaration ;

// 表达式
Expression -> AdditiveExpression
AdditiveExpression -> MultiplicativeExpression 
AdditiveExpression -> AdditiveExpression + MultiplicativeExpression 
AdditiveExpression -> AdditiveExpression - MultiplicativeExpression 
MultiplicativeExpression -> PrimaryExpression 
MultiplicativeExpression -> MultiplicativeExpression * PrimaryExpression 
MultiplicativeExpression -> MultiplicativeExpression / PrimaryExpression 
PrimaryExpression -> id | num | (Expression) 

// 变量和函数的声明
Declaration -> int DeclaratorList
DeclaratorList -> Declarator | DeclaratorList , Declarator
Declarator -> * Declarator | PostfixDeclarator 
PostfixDeclarator -> DirectDeclarator | PostfixDeclarator[num] | PostfixDeclarator(ParameterList) 
ParameterList -> int | ParameterList, int
DirectDeclarator -> id | (Declarator)

其实它描述的是一个简化版的C语言。

文法描述的是一颗以Program为根节点,终结符为叶节点的树。编译器只需要按照文法,一点一点的分析代码,就可以将这颗树结构建立起来。这颗树被称为语法树

根据文法可以看懂许多复杂的代码,例如:

int(*f(int,int,int))[4];

一般人看不懂这东西,不过根据文法来分析就简单许多了。Declaration对应int(*f(int,int,int))[4],接着往下推导,可以知道PostfixDeclarator为(*f(int,int,int))[4],接下来是关键,因为我们的表达式中带着[],所以只能是PostfixDeclarator[num],所以编译器是先识别数组,再识别括号中的内容。

所以总的识别过程为:
INT ---> ARRAY[4] ---> POINTER ---> FUNCTION

根据识别过程,从前向后捋就可知定义的是什么东西:

  1. int
  2. int数组
  3. int数组的指针
  4. 返回int数组的指针的函数
int(*f(int,int,int))(int,int,int);

识别过程:
INT ---> FUNCTION ---> POINTER ---> FUNCTION

  1. int
  2. 返回int的函数
  3. 返回int的函数的指针
  4. 返回返回int的函数的指针的函数

实际代码中常用一个数组装很多函数指针,然后根据不同情况调用不同的函数,这种方法可以防止switch中的case过多,这个数组的定义如下:

AstExpression (* ExprCheckers[])(AstExpression);

上文的简单文法并不支持这种定义,不过我们还是可以猜测出它的识别过程:
AstExpression ---> FUNCTION ---> POINTER ---> ARRAY

  1. AstExpression
  2. 返回AstExpression的函数
  3. 返回AstExpression的函数的指针
  4. 返回AstExpression的函数的指针的数组

C语言文法

C11标准每章的开头,有对应章节相关的文法。《The C Programming language》书中附录“A.13 Grammar”有完整的文法。

C标准文法的语法与前文的语法有些不同,不过原理都是相同的,很快可以猜出它的语法。C标准文法主要分为外部声明、语句和表达式三部分。看懂文法就相当于看懂了C语言解析器的结构,再看代码事半功倍。其中带opt下标的是可选项。


/********************以下是关于外部声明的文法*******************************
*
* C标准文法将变量的声明、变量的定义、函数的声明、函数的定义统统当作外部声明处理。
*/

translation-unit:
    external-declaration
    translation-unit external-declaration

external-declaration:
    function-definition
    declaration

// declaration-listopt是为了兼容老式风格函数
function-definition:
    declaration-specifiersopt declarator declaration-listopt compound-statement

// 声明的格式:多个描述符 初始化列表;
declaration: 
    declaration-specifiers init-declarator-listopt ;

declaration-list:
    declaration
    declaration-list declaration

// 多次从storage-class-specifier、type-specifier、type-qualifier中选择描述符
declaration-specifiers:
    storage-class-specifier declaration-specifiersopt
    type-specifier declaration-specifiersopt
    type-qualifier declaration-specifiersopt

storage-class-specifier: one of
    auto register static extern typedef

type-specifier: one of
    void char short int long float double signed unsigned struct-OR-union-specifier enum-specifier typedef-name

type-qualifier:  one of
    const volatile

struct-OR-union-specifier:
    struct-OR-union identifieropt { struct-declaration-list }
    struct-OR-union identifier

struct-OR-union: one of
    struct union

struct-declaration-list:
    struct-declaration
    struct-declaration-list struct-declaration

init-declarator-list:
    init-declarator
    init-declarator-list , init-declarator

init-declarator:
    declarator
    declarator = initializer

struct-declaration:
    specifier-qualifier-list struct-declarator-list ; 

specifier-qualifier-list:
    type-specifier specifier-qualijfier-listopt 
    type-qualifier specifier-qualifier-listopt 

struct-declarator-list:
    struct-declarator
    struct-declarator-list, struct-declarator 

struct-declarator:
    declarator
    declarator_opl : constant-expression

enum-specifier:
    enum identifieropt { enumerator-list } 
    enum identifier

enumerator-list:
    enumerator
    enumerator-list , enumerator

enumerator:
    identifier
    identifier = constant-expression

declarator:
    pointeropt direct-declarator

direct-declarator:
    identifier
    ( declarator )
    direct-declarator [ constant-expressionopt ] 
    direct-declarator ( parameter-type-list ) 
    direct-declarator ( identifier-listopt ) 

pointer:
    * type-qualifier-listopt
    * type-qualifier-listopt pointer

type-qualifier-list:
    type-qualifier
    type-qualifier-list type-qualifier

parameter-type-list:
    parameter-list
    parameter-list, ...

parameter-list:
    parameter-declaration
    parameter-list , parameter-declaration

parameter-declaration:
    declaration-specifiers declarator
    declaration-specifiers abstract-declaratoropt

identifier-list:
    identifier
    identifier-list , identifier

initializer:
    assignment-expression
    { initializer-list }
    { initializer-list , }

initializer-list:
    initializer
    initializer-list , initializer

type-name: 
    specifier-qualifier-list abstract-declaratoropt 

abstract-declarator:
    pointer
    pointeropt direct-abstract-declarator

direct-abstract-declarator:
    ( abstract-declarator )
    direct-abstract-declaratoropt [ constant-expressionopt ]
    direct-abstract-declaratoropt ( parameter-type-listopt )

typedef-name:
    identifier

/*************************以下是语句的文法********************************/

compound-statement:
    { declaration-listopt statement-listopt }

statement-list:
    statement
    statement-list statement


statement:
    labeled-statement
    expression-statement
    compound-statement
    selection-statement
    iteration-statement
    jump-statement

labeled-statement:
    identifier : statement
    case constant-expression : statement 
    default : statement

expression-statement:
    expressionopt ;

selection-statement:
    if ( expression ) statement
    if (expression ) statement else statement 
    switch ( expression ) statement

iteration-statement:
    while ( expression ) statement
    do statement while ( expression ) ;
    for ( expressionopt ; expressionopt ; expressionopt ) statement 

jump-statement:
    goto identifier ;
    continue ;
    break ;
    return expressionopt ;

/*************************以下是表达式的文法*******************************
*
* C标准文法中为了表达不同二元运算符的优先级,设计了“套路相同但又有层次结构”的产生式。
* 从这些文法中可以清晰的看出各运算符的优先级。
* 
*/


expression:
    assignment-expression
    expression , assignment-expression

assignment-expression:
    conditional-expression
    unary-expression assignment-operator assignment-expression 

assignment-operator: one of
    = *= /= %= += -= <<= >>= &= ^= |=

conditional-expression:
    logical-OR-expression
    logical-OR-expression ? expression : conditional-expression

constant-expression:
    conditional-expression

logical-OR-expression:
    logical-AND-expression
    logical-OR-expression || logical-AND-expression

logical-AND-expression:
    inclusive-OR-expression
    logical-AND-expression && inclusive-OR-expression

inclusive-OR-expression:
    exclusive-OR-expression
    inclusive-OR-expression | exclusive-OR-expression

exclusive-OR-expression:
    AND-expression
    exclusive-OR-expression ^ AND-expression

AND-expression:
    equality-expression
    AND-expression & equality-expression

equality-expression:
    relational-expression
    equality-expression == relational-expression
    equality-expression != relational-expression

relational-expression:
    shift-expression
    relational-expression < shift-expression
    relational-expression > shift-expression
    relational-expression <= shift-expression
    relational-expression >= shifit-expression

shift-expression:
    additive-expression
    shift-expression << additive-expression 
    shift-expression >> additive-expression 

additive-expression:
    multiplicative-expression
    additive-expression + multiplicative-expression 
    additive-expression - multiplicative-expression 

multiplicative-expression:
    cast-expression
    multiplicative-expression * cast-expression 
    multiplicative-expression / cast-expression 
    multiplicative-expression % cast-expression 

cast-expression:
    unary-expression
    ( type-name ) cast-expression

unary-expression:
    postfix-expression
    ++ unary-expression
    --unary-expression
    unary-operator cast-expression
    sizeof unary-expression
    sizeof ( type-name )

unary-operator: one of
    & * + - ~ !

postfix-expression:
    primary-expression
    postfix-expression [ expression ]
    postfix-expression ( argument-expression-listopt )
    postfix-expression . identifier
    postfix-expression -> identifier
    postfix-expression ++
    postfix-expression --

primary-expression:
    identifier
    constant
    string
    ( expression )

argument-expression-list:
    assignment-expression
    argument-expression-list , assignment-expression 

constant:
    integer-constant
    character-constant
    floating-constant
    enumeration-constant
posted @ 2022/01/04 12:19:43