Formal AviSynth grammar

From Avisynth wiki
(Difference between revisions)
Jump to: navigation, search
m (The AviSynth Grammar in EBNF Notation: spelling: extened -> extended)
 
m (1 revision)
 

Latest revision as of 22:33, 9 May 2013

[edit] Introduction

This page presents the formal grammar of the AviSynth script language. This is a dense representation of all the rules of the AviSynth script language. While it probably is of more interest to a developer than to an average user, it is nevertheless an essential piece of documentation for any programming language and it is thus provided here for those more inclined to abstract mathematical reasoning. Have fun!

Before going to the grammar, a few introductory material will be necessary for all those that don't eat bytes for breakfast. If you are familiar with BNF / EBNF syntax then skip the following section.

[edit] Background Information

Formal grammars of programming and scripting languages are typically written in Backus-Naur Form (BNF) or Extended Backus-Naur Form (EBNF) syntax. We have chosen the EBNF syntax because it is easier for human comprehension and thus it is a slightly better selection for documentation purposes. The syntax used here follows the ISO/IEC 14977 Standard, "Extended BNF". The table below summarizes the notation used (infix means that the operator has left associativity; postfix that it has right associativity).

Extended BNF Operator Meaning Comment
unquoted words Non-terminal symbol A symbol that is a grouping of low-level symbols (ie not a fundamental one).
"..." Terminal symbol A fundamental (ie not further divisible) symbol of the language.
'...' Terminal symbol Same as above.
(...) Brackets Parentheses just group the symbols inside them in a single (non-terminal) symbol.
[...] Optional symbols The symbols inside square braces are optional (ie they are present either 0 or 1 times)
{...} Symbols repeated zero or more (ie >= 0) times
{...}- Symbols repeated one or more (ie >= 1) times Note that the - immediately follows the curly braces.
= infix Defining symbol This is the "assignment" operator of EBNF; the left (non-terminal) symbol is (equal to) the right grouping of symbols.
 ; postfix Rule terminator This operator signals the end of the (assignment) rule (just like in C ; ends a statement).
| infix Alternative Either one of the alternative terminal or non-terminal symbols (and only one) will be matched.
, infix Concatenation Symbols on both ends of the , (comma) operator are joined sequentially to form a single (non-terminal) symbol.
- infix Exception The effect is the logical negation of the rule following. For example -"a" becomes  ? all characters not equal to a ?.
* infix Occurences of The effect of this operator is to repeat the symbol to its right {n} times, where n is the value to its left. For example to state that a (fortran) label has exactly 5 characters, one can state: label = 5 * character;.
(*...*) Comment Arbitrary text documenting something (this is the comment facility of the EBNF language).
 ?...? Special sequence Arbitrary text whose interpretation is beyond the scope of the EBNF standard.

Note: For defining character sequences as terminal symbols one can either use the "string" or 'string' facilities of the EBNF language or to use the concatenation operator: character-a , character-b , ..., character-z. However for some repetitive tasks such as enumerating all characters of the alphabet or all numeric digits, etc. it is common to use a range notation of the form start...end as an extension to the standard. We use it also here.

[edit] The AviSynth Grammar in EBNF Notation

In the formulation of the AviSynth grammar below, there are certain items that are not considered part of the grammar and thus are considered responsibilities of the tokenizer (to process and strip-off). These are the following:

  • Whitespace.
  • Comments (both single-line and multi-line).
  • Line continuations.
  • The end-of-file condition.
script =
            { declaration }-
            ;
declaration =
            statement
            | function_definition 
            ;
function_definition =
            kw_function , identifier , '(' , [ parameters_list ] , ')' , 
            compound_statement
            ;
(* optional arguments must come after all positional arguments *)
parameters_list =
            arguments , ',' , optional_arguments     (* either both types in order *)
            | arguments                              (* or only one (any) of them  *)
            | optional_arguments
            ;
arguments =
            argument , [ { ',' , argument } ]
            ;
(* Note: If type_spec is missing, it is implicitly assumed to be: t_val *)
argument =
            [ type_spec ] , identifier
            ;
optional_arguments =
            optional_argument , [ { ',' , optional_argument } ]
            ;
(* Despite the formulation, the tokenizer does not handle whitespace correctly. *)
(* Example: an optional argument declared simply "int f" parses without error.  *)
optional_argument =
            [ type_spec ] , quote , identifier , quote
            ;
type_spec =
            t_val | t_string | t_bool | t_int | t_float | t_clip
            ;
(* This is probably a parser bug (due to C-origin) because there is only one global *)
(* function table in AviSynth; it should be  '{' , { statement } , '}'  instead and *)
(* statement definition below would include function_definition. Then, declaration  *)
(* would be un-needed as a grammar rule and also: script = { statement }- ;         *)
compound_statement = 
            '{' , { declaration } , '}'
            ;
statement =                  (* free-standing compound statements are ?not? allowed *)
            expression
            | try_statement
            | jump_statement
            ;
try_statement =
            kw_try , compound_statement , 
            kw_catch , '(' , [ identifier ] , ')' , compound_statement
            ;
jump_statement = 
            kw_return , [ expression ]
            ;
(* Although expression has only one subtype, keep as a separate production rule *)
(* for documentation and for easier update of the grammar if extended at future. *)
expression =
            assignment_exp
            ;
assignment_exp =
            conditional_exp
            | [ kw_global ] , identifier , '=' , assignment_exp
            ;
conditional_exp =
            logical_or_exp
            | logical_or_exp , '?' , expression , ':' , conditional_exp
            ;
logical_or_exp =
            logical_and_exp
            | logical_or_exp , '||' , logical_and_exp
            ;
logical_and_exp =
            equality_exp
            | logical_and_exp , '&&' , equality_exp
            ;
equality_exp =
            relational_exp
            | equality_exp , equ_binary_operator , relational_exp
            ;
equ_binary_operator = 
            '==' | '!=' | '<>'
            ;
relational_exp =
            additive_exp
            | relational_exp , rel_binary_operator , additive_exp
            ;
rel_binary_operator = 
            '<' | '>' | '<=' | '>='
            ;
additive_exp =
            multiplicative_exp
            | additive_exp , add_binary_operator , multiplicative_exp
            ;
add_binary_operator = 
            '+' | '-' | '++'                               (* ++ is for clips *)
            ;
multiplicative_exp = 
            unary_exp
            | multiplicative_exp , mul_binary_operator , unary_exp
            ;
mul_binary_operator = 
            '*' |  '/' |  '%'
            ;
unary_exp = 
            [ unary_operator ] , postfix_exp
            ;
unary_operator = 
            sign | '!'
            ;
(* Because OOP notation simply puts the 1st argument of a function in front of its call *)
(* it can be chained to all alternatives of primary_exp; therefore this is its place    *)
postfix_exp = 
            primary_exp
            | function_call
            | primary_exp , { '.' , function_call }-       (* the OOP notation *)
            ;
function_call =
            identifier , [ '(' , [ argument_exp_list ] , ')' ]
            ;
(* Assignment is allowed only to optional arguments, *)
(* which must come after all positional arguments    *)
argument_exp_list = 
            positional_arg_list , ',' , optional_arg_list  (* either both types in order *)
            | positional_arg_list                          (* or only one (any) of them  *)
            | optional_arg_list
            ;
positional_arg_list = 
            expression
            | positional_arg_list , ',' , expression
            ;
optional_arg_list = 
            identifier , '=' , expression
            | optional_arg_list , ',' , identifier , '=' , expression
            ;
primary_exp =
            constant
            | identifier
            | '(' , expression , ')'
            ;
identifier = 
            ( letter | "_" )  , { letter | digit | "_" }
            ;
constant = 
            integer_constant | float_constant | boolean_constant | stringliteral
            ;
stringliteral = 
            quote , { -quote } , quote | tripleqouote , { -tripleqouote } , tripleqouote
            ;
boolean_constant =
            true | false | yes | no
            ;
integer_constant = 
            decimalinteger | hexinteger
            ;
float_constant = 
            [ sign ] , ( [ intpart ] , fraction | intpart , '.' )
            ;
decimalinteger = 
            [ sign ] , ( nzero_digit , { digit } | '0' )
            ;
hexinteger = 
            "$" , { hexdigit }-
            ;
fraction = 
            '.' , intpart
            ;
intpart = 
            { digit }-
            ;
hexdigit = 
            digit | 'a'...'f' | 'A'...'F' 
            ;
letter = 
            'a'...'z' | 'A'...'Z' 
            ;
digit = 
            '0' | nzero_digit
            ;
nzero_digit =
            '1'...'9'
            ;
sign =
            '-' | '+'
            ;

quote       = '"'   ;
triplequote = '"""' ;

true        = i_t , i_r , i_u , i_e ;
false       = i_f , i_a , i_l , i_s , i_e ;
yes         = i_y , i_e , i_s ;
no          = i_n , i_o ;

t_val       = i_v , i_a , i_l ;
t_string    = i_s , i_t , i_r , i_i , i_n , i_g ;
t_bool      = i_b , i_o , i_o , i_l ;
t_int       = i_i , i_n , i_t ;
t_float     = i_f , i_l , i_o , i_a , i_t ;
t_clip      = i_c , i_l , i_i , i_p ;

kw_function = i_f , i_u , i_n , i_c , i_t , i_i , i_o , i_n ;
kw_try      = i_t , i_r , i_y ;
kw_catch    = i_c , i_a , i_t , i_c , i_h ;
kw_global   = i_g , i_l , i_o , i_b , i_a , i_l ;
kw_return   = i_r , i_e , i_t , i_u , i_r , i_n ;

i_a = ( 'a' | 'A' ) ;
i_b = ( 'b' | 'B' ) ;
i_c = ( 'c' | 'C' ) ;
i_e = ( 'e' | 'E' ) ;
i_f = ( 'f' | 'F' ) ;
i_g = ( 'g' | 'G' ) ;
i_h = ( 'h' | 'H' ) ;
i_i = ( 'i' | 'I' ) ;
i_l = ( 'l' | 'L' ) ;
i_n = ( 'n' | 'N' ) ;
i_o = ( 'o' | 'O' ) ;
i_p = ( 'p' | 'P' ) ;
i_r = ( 'r' | 'R' ) ;
i_s = ( 's' | 'S' ) ;
i_t = ( 't' | 'T' ) ;
i_u = ( 'u' | 'U' ) ;
i_v = ( 'v' | 'V' ) ;
i_y = ( 'y' | 'Y' ) ;


Back to the AviSynth Syntax

Personal tools