Expr
AVS+ 

This feature is specific to AviSynth plus. It is not supported in other AviSynth versions. 
Applies a mathematical function, defined by an expression string, on the pixels of the source clip(s). A different expression may be set for each color channel. Users of MaskTools may be familiar with this concept.
Contents 
Syntax and Parameters
Expr( [clip clip[, ...], string exp[, ...], string format,
bool optAvx2, bool optSingleMode, bool optSSE2, string scale_inputs, bool clamp_float] )
 clip clip =
 One or more source clips. Up to 26 input clips can be specified.
 The first three clips are referenced by lowercase letter x, y and z; use 'a', 'b' ... 'w' for the rest.
 Clips may be YUV(A), RGB(A), or greyscale; 816 bit integer or 32 bit float.
 Width, height and color subsampling should be the same; bit depths can be different.
 One or more source clips. Up to 26 input clips can be specified.
 string exp =
 One or more RPN expressions
 A different expression may be set for each color channel (or plane). Plane order is YUVA or RGBA.
 (note: due to a bug, versions prior to r2724 used GBRA ordering)
 When an expression string is not given, the previous one is used.
 The empty string ("") is a valid expression; it causes the plane to be copied (see Expressions below).
 One or more RPN expressions
 string format = ""
 Set color format of the returned clip.
 Use pixel format strings like "YV12", "YUV420P8", "YUV444P16", "RGBP10".
 By default, the output format is the same as the first clip.
 Set color format of the returned clip.
 bool optAvx2 = (auto)
 If false, disable AVX2.
 Enables/Disables AVX2 code generation if available. Do nothing if AVX2 is not supported in Avisynth.
 If false, disable AVX2.
 bool optSingleMode = false
 If true, generate assembly code using only one XMM/YMM register set instead of two; default false.
 Expr generates assembly code that normally uses two 128 (SSE2) or 256 bit (AVX2) registers ("lanes"), thus processing 8 (SSE2)/16 (AVX2) pixels per internal cycle.
 Experimental parameter, optSingleMode=true makes the internal compiler generate instructions for only one register (4/8 pixels  SSE2/AVX2). The parameter was introduced to test the speed of x86 code using one working register. Veryvery complex expressions would use too many XMM/YMM registers which are then "swapped" to memory slots, that could be slow. Using optSingleMode = true may result in using less registers with no need for swapping them to memory slots.
 If true, generate assembly code using only one XMM/YMM register set instead of two; default false.
 bool optSSE2 = (auto)
 If false, disable SSE2.
 Enables/Disables SSE2 code generation when in nonAVX2 mode. Setting optSSE2=false and optAVX2=false forces expression processing in a slow interpreted way (C language)
 If false, disable SSE2.
 string scale_inputs = "none"
 Autoscale any input bit depths to 816 bit for internal expression use, the conversion method is either full range (stretch) or limited YUV range (like bit shift). Feature is similar to the one in masktools2 v2.2.15
 The primary reason of this feature is the "easy" usage of formerly written expressions optimized for 8 bits.
 "int" : scales limited range videos, only integer formats (816bits) to 8 (or bit depth specified by 'i8'..'i16')
 "intf": scales full range videos, only integer formats (816bits) to 8 (or bit depth specified by 'i8'..'i16')
 "float" or "floatf" : only scales 32 bit float format to 8 bit range (or bit depth specified by 'i8'..'i16')
 "all": scales videos to 8 (or bit depth specified by 'i8'..'i16')  conversion uses limited_range logic (mul/div by two's power)
 "allf": scales videos to 8 (or bit depth specified by 'i8'..'i16')  conversion uses full scale logic (stretch)
 "none": no magic
 bool clamp_float = false
 if true: clamps 32 bit float to valid ranges, which is 0..1 for luma or for RGB color space and 0.5..0.5 for YUV chroma UV channels
 Default false: as usual, 32 bit float pixels are not clamped
 Ignored when scale_inputs scales 32bitfloat type pixels
Expressions
Expr accepts 1 to 26 source clips, up to four expression strings (one per color plane), an optional output format string, and some debug options. Output video format is inherited from the first clip, when there is no format override. All clips have to match in their width, height and chroma subsampling.
Expressions are evaluated on each plane, Y, U, V (and A) or R, G, B (,A). When an expression string is not specified, the previous expression is used for that plane – except for plane A (alpha) which is copied by default. When an expression is an empty string ("") then the relevant plane will be copied (if the output clip bit depth is similar). When an expression is a single clip reference letter ("x") and the source/target bit depth is similar, then the relevant plane will be copied. When an expression is constant (after constant folding), then the relevant plane will be filled with an optimized memory fill method.
 Example:
Expr(clip, "255", "128, "128")
fills all three planes.  Example:
Expr(clip, "x", "range_half, "range_half")
copies luma, fills U and V with 128/512/... (bit depth dependent)
Other optimizations: do not call GetFrame for input clips that are not referenced or planecopied
Expressions are written in RPN.
Expressions use 32 bit float precision internally.
For 8..16 bit formats output is rounded and clamped from the internal 32 bit float representation to valid 8, 10, ... 16 bits range. 32 bit float output is not clamped at all.
Expr language/RPN elements
 Clips: letters x, y, z, a..w. x is the first clip parameter, y is the second one, etc.
 Math:
* / + 

%
(modulo), like fmod. Example:result = x  trunc(x/d)*d
. Note: the internal 32bit float can hold only a 24 bit integer number (approximately)  Math constant:
pi
 Functions:
min, max, sqrt, abs, neg, exp, log, pow ^
(synonyms:pow
and)
 Function:
clip
three operand function for clipping. Example:x 16 240 clip
means min((max(x,16),240)  Functions:
sin cos tan asin acos atan
(no SSE2/AVX2 optimization when they appear in Expr)  Logical:
> < = >= <= and or xor not == &  !=
(synonyms:==
and=
,&
andand
,
andor
)  Ternary operator:
?
Example:x 128 < x y ?
 Duplicate stack elements:
dup, dupn
(dup1, dup2, ...)  Swap stack elements:
swap, swapn
(swap1, swap2, ...)  Scale by bit shift:
scaleb
(operand is treated as being a number in 8 bit range unless i8..i16 or f32 is specified)  Scale by full scale stretch:
scalef
(operand is treated as being a number in 8 bit range unless i8..i16 or f32 is specified)
Bitdepth aware constants

ymin, ymax
(ymin_a .. ymin_z for individual clips)  the usual luma limits (16..235 or scaled equivalents) 
cmin, cmax
(cmin_a .. cmin_z)  chroma limits (16..240 or scaled equivalents) 
range_half
(range_half_a .. range_half_z)  half of the range, (128 or scaled equivalents) 
range_size, range_half, range_max
(range_size_a .. range_size_z , etc..)

Keywords for modifying base bit depth

i8, i10, i12, i14, i16, f32
(used withscaleb
andscalef
)

Spatial input variables in expr syntax

sx, sy
(absolute x and y coordinates, 0 to width1 and 0 to height1) 
sxr, syr
(relative x and y coordinates, from 0 to 1.0)

Internal variables
 Uppercase A to Z for storing and loading intermediate results within the expression
 Store:
A@ .. Z@
 Store and pop from stack:
A^ .. Z^
 Use: A..Z
 Example:
"x y  A^ x y 0.5 + + B^ A B / C@ x +"
 Store:

frameno
: use current frame number in expression. 0 <=frameno
< clip_frame_count 
time
: calculation: time = frameno/clip_frame_count. Use relative time position in expression. 0 <= time < frameno/clip_frame_count 
width, height
: clip width and clip height
Pixel addressing
 Indexed, addressable source clip pixels by relative x,y positions.
 Syntax: x[a,b] where
 'x': source clip letter a..z
 'a': horizontal shift. width < a < width
 'b': vertical shift. height < b < height
 'a' and 'b' should be constant. e.g.: "x[1,1] x[1,0] x[1,1] y[0,10] + + + 4 /"
 When an pixel would come from offscreen, the pixels are cloned from the edge.
 Optimized version of indexed pixels require SSSE3, and no AVX2 version is available. NonSSSE3 falls back to C for the whole expression
Autoscale inputs with "scale_inputs"
 Autoscale works by converting any input bit depths to a common 816 bit format for internal expression use, the conversion method is either full range or limited YUV range. Feature is similar to the one in masktools2 v2.2.15
 The primary reason of this feature is the "easy" usage of formerly written expressions optimized for 8 bits.
 Possible values for scale_inputs
 "int" : scales limited range videos, only integer formats (816bits) to 8 (or bit depth specified by 'i8'..'i16')
 "intf": scales full range videos, only integer formats (816bits) to 8 (or bit depth specified by 'i8'..'i16')
 "float" or "floatf" : only scales 32 bit float format to 8 bit range (or bit depth specified by 'i8'..'i16')
 "all": scales videos to 8 (or bit depth specified by 'i8'..'i16')  conversion uses limited_range logic (mul/div by two's power)
 "allf": scales videos to 8 (or bit depth specified by 'i8'..'i16')  conversion uses full scale logic (stretch)
 "none": no magic
 Usually limited range is for normal YUV videos, full scale is for RGB or knowntobefullscale YUV
 By default the internal conversion target is 8 bits, so old expressions written for 8 bit videos will probably work.
 This internal working bitdepth can be overwritten by the i8, i10, i12, i14, i16 specifiers.
 When using autoscale mode, scaleb and scalef keywords are meaningless, because there is nothing to scale.
 Different conversion methods cannot be set for converting before and after the expression. Neither can you specify different methods for distinct input clips (e.g. x is full, y is limited is not supported).
 How it works:
 832 bit inputs ar all scaled to a common bit depth value, which bit depth is 8 by default and can be set to 10, 12, 14 and 16 bits by the 'i10'..'i16' keywords.
 832 bit inputs ar all scaled to a common bit depth value, which bit depth is 8 by default and can be set to 10, 12, 14 and 16 bits by the 'i10'..'i16' keywords.
 For example: scale_inputs="all" converts any inputs to 8 bit range. No truncation occurs however (no precision loss), because even a 16 bit data is converted to 8 bit in floating point precision, using division by 256.0 (2^16/2^8).
 So the conversion is _not_ a simple shiftright8 in the integer domain, which would lose precision.
 Calculates expression (like in masktools2 mt_lut, mt_lutxy, mt_lutxyz and mt_lutxyza do)
 Scales the internal result back to the original video bit depth.
Clamping (clipping to valid range) and converting to an integer output (if applicable) occurs here.
 The predefined constants such as 'range_max', etc. will behave according to the internal working bit depth
 Important note!
 This feature was created for easy porting earlier 8bitvideoonly lut expressions. You have to understand how it works internally.
 Let's see a 16bit input in "all" and "allf" mode (target is the default 8 bits)
 Limited range 16>8 bits conversion has a factor of 1/256.0 (Instead of shift right 8 in integer domain, floatdivision is used or else it would lose presision)
 Full range 16>8 bits conversion has a factor of 255.0/65535
 Using bit shifts (really it's division and multiplication by 2^8=256.0):
result = calculate_lut_value(input / 256.0) * 256.0
 Full scale 16816 bit mode ('intf', 'allf'):
result = calculate_lut_value(input / 65535.0 * 255.0 ) / 255.0 * 65535.0
 Use scale_inputs = "all" ("int", "float") for YUV videos with 'limited' range e.g. in 8 bits: Y=16..235, UV=16..240).
 Use scale_inputs = "allf" (intf, floatf) for RGB or YUV videos with 'full' range e.g. in 8 bits: channels 0..255.
 When input is 32bit float, the 0..1.0 (luma) and 0.5..0.5 (chroma) channel is scaled to 0..255 (8 bits), 0..1023 (i10 mode), 0..4095 (i12 mode), 0..16383(i14 mode), 0..65535(i16 mode) then back.
Compared to MaskTools
Compared to MaskTools2 version 2.2.15, Expr has functionality similar to mt_lut, mt_lutxy, mt_lutxyz, mt_lutxyza and mt_lutspa.
MaskTools2 is very slow for 10+ bit clips, when a LUT (lookup table) cannot be used for memory size reasons, thus the expression is evaluated/interpreted at runtime for each pixel. MaskTools2 (from v2.2.15) however is able to pass the expressions to this Avisynth+ 'Expr' filter with its 'use_expr' parameter, by passing the expression strings, and clamp_float and scale_inputs parameter.
The JIT compiler in Expr (adapted from VapourSynth) turns the expression calculation into realtime assembly code which is much faster and basically bit depth independent.
 In Expr:
 Up to 26 clips are allowed (x,y,z,a,b,...w). Masktools handles only up to 4 clips with its mt_lut, mt_lutxy, mt_lutxyz, mt_lutxyza
 Clips with different bit depths are allowed
 Works with 32 bit floats instead of 64 bit double internally
 Less functions (e.g. no bit shifts)
 Logical 'false' is 0 instead of 1
 The ymin, ymax, etc builtin constants can have a _X suffix, where X is the corresponding clip designator letter. E.g. cmax_z, range_half_x
 mt_lutspalike functionality is available through "sx", "sy", "sxr", "syr" internal predefined variables
 No y= u= v= parameters with negative values for filling plane with constant value, constant expressions are changed into optimized "fill" mode
Examples
Average three clips:
c = Expr(clip1, clip2, clip3, "x y + z + 3 /")
When input clips to have more planes than an implicitely specified output format:
Expr(aYV12Clip, "x 255.0 /", format="Y32") # target is Y only which needs only Y plane from YV12
Yplaneonly clip(s) can be used as source planes when a nonsubsampled (rgb or 444) output format is specified:
Expr(Y, "x", "x 2.0 /", "x 3.0 /", format="RGBPS") # r, g and b expression uses Y plane Expr(Grey_r, Grey_g, Grey_b, "x", "y 2.0 /", "z 3.0 /", format="RGBPS") # r, g and b expression uses Y plane
Using spatial feature:
c = Expr(clip1, clip2, clip3, "sxr syr 1 sxr  1 syr  * * * 4096 scaleb *", "", "")
Mandelbrot zoomer (original code and idea from here: https://forum.doom9.org/showthread.php?p=1738391#post1738391 )
a="X dup * Y dup *  A + T^ X Y 2 * * B + 2 min Y^ T 2 min X^ " b=a+a c=b+b blankclip(width=960,height=640,length=1600,pixel_type="YUV420P8") Expr("sxr 3 * 2  1.2947627  1.01 frameno ^ / 1.2947627 + A@ X^ syr 2 * 1  0.4399695 " \ + " 1.01 frameno ^ / 0.4399695 + B@ Y^ "+c+c+c+c+c+b+a+"X dup * Y dup * + 4 < 0 255 ?", \ "128", "128")
For other ideas of spatial variables, see MaskTools2:mt_lutspa
Changes
r2724 (20180702)  new three operand function: clip new parameter "clamp_float" 
r2574 (20171219)  new: Indexable source clip pixels by relative x,y positions like x[1,1]
new functions: sin cos tan asin acos atan

r2544 (20171115)  optimization; fix scalef 
r2542 (20171114)  first added 