Glossary

From Avisynth wiki
Jump to: navigation, search

Aliasing

Aliasing, also known as "jaggies", is a stair-step type artefact that occurs on curves or diagonal lines that should be smooth.

Aliasing in video can occur for many reasons, the most common being as a side effect of deinterlacing or resizing.


AVI

AVI (Audio/Video Interleaved) is a multimedia container, introduced by Microsoft in 1992 and still in widespread use.

More Information


AVS

.AVS is the file extension for Avisynth scripts. Any program compatible with Avisynth can read these script files as if they were audio and/or video files.


AVSI

An autoloading script. Any script with the .AVSI extension in the AviSynth Plugins folder is automatically imported. This is useful for making script functions available to any new script you create without having to copy and paste. However, any code not wrapped in a function will be executed when the script starts up. Code in .AVSI files should be restricted to user-defined functions, global variable definitions, loading plugins and importing other scripts.


DirectShow

This is what Microsoft says about DirectShow:

"Microsoft® DirectShow® is an architecture for streaming media on the Microsoft Windows® platform. DirectShow provides for high-quality capture and playback of multimedia streams. It supports a wide variety of formats, including Advanced Systems Format (ASF), Motion Picture Experts Group (MPEG), Audio-Video Interleaved (AVI), MPEG Audio Layer-3 (MP3), and WAV sound files. It supports capture from digital and analog devices based on the Windows Driver Model (WDM) or Video for Windows. DirectShow is integrated with other DirectX technologies. It automatically detects and uses video and audio acceleration hardware when available, but also supports systems without acceleration hardware." (copied from msdn.)

Microsoft® DirectShow® and Video for Windows® are registered trademarks and ActiveMovie® is trademark of Microsoft Corporation in the U.S. and/or other countries.

DivX

DivX is a proprietary video codec that implements the MPEG-4 standard.

More Information

DivX homepage


DoomNineForum

While largely dedicated to DVD backup and other kinds of "ripping", the AviSynth Usage forum at Doom9 has a very high s/n ratio for technical people who dig desktop video. It's no coincidence many AviSynth developers hang out there (and the newer AviSynth Development forum); in fact, for any issues beyond structured bug reporting & feature requests, it's probably an even better place to interact with the development team than SourceForge.


ffdshow

ffdshow is DirectShow and VFW codec for decoding/encoding many video and audio formats, including DivX and XviD movies using libavcodec, xvid and other opensourced libraries with a rich set of postprocessing filters.


Field misalignment

Field misalignment happens when video is resized as separate fields without taking into account that when a video is stored as fields, those fields each have a unique spatial offset relative to the other. It can be fixed within the limits of linear interpolation if you know the exact way in which the misalignment was introduced, obviously such information is not usually available. Barring an exact solution you are left with various flavors of blur, which all essentially function by removing the aliased high frequency vertical information that differentiates the fields from each other. (source: *.mp4 guy)

The lines “less oblique than the actual lines” are generally caused by fields individually resized without taking their vertical positions into account. Fortunately the process is more or less non-destructive in the upscale case and can be reverted by resizing the fields to their original size. Finding the exact size is a matter of trial and error. When you get close, it shows some beating patterns that become larger and larger until they disappear completely when you hit the exact size. (source: cretindesalpes)

External Links


Filter

A filter is a computer program or subroutine to process a stream, producing another stream.

While a single filter can be used individually, they are frequently concatened together to form a pipeline or filter graph.

Gstreamer Technical Overview.png


Source:


Float

This page is about the Float audio format.

For the variable type, see Script_variables.
For the Float() function, go to Internal_functions#Float.
For the Deep Color format, go to Float_(color_format)


Float (or fully IEEE floating point with single precision) is one of the AviSynth audio sample formats.

The samples of this type have values between -1.00000 (= 0xBF800000) and 1.000000 (= 0x3F800000).

The value of such a IEEE-754 number is computed as: sign * 2^exponent * 1.mantissa, using the following scheme:

[ 1 Sign Bit | 8 Bit Exponent | 23 Bit Mantissa ]

  • The sign bit is 1 (negative) or 0 (positive).
  • The exponent runs from -127 (00000000) to 0 (0111111) to 128 (11111111).
  • The mantissa m1m2m3 ... means m1/2 + m2/4 + m3/8 + ... in decimal. Sometimes people denote "1.m1m2m3..." as the mantissa (like is done in the converter below).

For an on-line converter between decimal numbers and IEEE 754 floating point, see here.

Examples:

0x3F800000 = 00111111100000000000000000000000 (binary, see [1])

Sign Bit      Exp       1.Mantissa
0          01111111  00000000000000000000000

127 - 127 = 0     1.0 bitshift 0 places

Sign bit is positive, exponent = 0, 1.mantissa = 1, so the number is +1.0

and

0xBF800000 = 10111111100000000000000000000000

Sign Bit      Exp       1.Mantissa
1          01111111  00000000000000000000000

127 - 127 = 0     1.0 bitshift 0 places

Sign bit is negative, exponent = 0, 1.mantissa = 1, so the number is -1.0

and

0x800000 = 01001011000000000000000000000000

Sign Bit      Exp       1.Mantissa
0          10010110  00000000000000000000000

150 - 127 = 23     1.0 bitshift 23 places

Sign bit is positive, exponent = 23, 1.mantissa = 1, so the number is 8388608

source: opferman.net


FourCC

A FourCC (literally, four-character code) is a sequence of four bytes used to uniquely identify data formats.

Technical details

The byte sequence is usually restricted to ASCII printable characters, with space characters reserved for padding shorter sequences. Case sensitivity is preserved.

Four-byte identifiers are useful because they can be made up of four human-readable characters with mnemonic qualities. Thus, the codes can be used efficiently in program code as integers, as well as giving cues in binary data streams when inspected.

AVS+ Some FourCCs, such as those for certain high bit depth raw video formats, contain non-printable characters and are not human-readable without special formatting for display; for example, 10bit planar YUV422, known in AviSynth+ as YUV422P10, can have a FourCC of ('Y', '3', 10, 10)(1) which ffmpeg displays as rawvideo (Y3[10][10] / 0xA0A3359), yuv422p10le.

History

In 1985, Electronic Arts introduced the Interchange File Format (IFF) meta-format (family of file formats), originally devised for use on the Amiga. These files consisted of a sequence of "chunks", which could contain arbitrary data, each chunk prefixed by a four-byte ID. The IFF specification explicitly mentions that the origins of the FourCC idea lie with Apple.(2)

This IFF was adopted by a number of developers including Apple for AIFF files and Microsoft for RIFF files (which were used as the basis for the AVI and WAV file format). Microsoft and Windows developers refer to their four-byte identifiers as FourCCs.

References
  1. RAW.C (ffmpeg.org)
  2. "EA IFF 85" Standard for Interchange Format Files (martinreddy.net)
External links



This page was adapted from Wikipedia: FourCC; version 11 March 2017.

Wikipedia text is available under the Creative Commons Attribution-ShareAlike License


Frameserver

A frameserver is an application that feeds video directly to another application. AviSynth and most other frameservers accomplish this by creating a fake file that other programs can read as if it were a very large (usually uncompressed) AVI file. For more information, see FAQ frameserving.


GraphEdit

GraphEdit is a DirectShow filter graph tool created by Microsoft, part of the Windows Platform SDK. It is mostly useful to the AviSynth user as a way to select codecs manually for DirectShowSource. It also lets you perform administrative tasks, such as permanently changing filter priority (called "merit" in DirectShow).

GraphStudioNext-example.png
shot of GraphStudioNext derivative, with mouse creating a connection
Downloads
See also


Huffyuv

Huffyuv is a lossless video codec created by BenRG (the original developer of AviSynth), patterned after JPEG-LS. It supports RGB, UYVY, and YUY2.

Its homepage used to be at http://math.berkeley.edu/~benrg/huffyuv.html but has disappeared. Version 2.1.1 (binaries and source) is downloadable at Donald Graft's mirror.

Latest Huffyuv can be found here: Huffyuv v2.1.1 CCE SP-Patch v0.2.5, released Dec 22, 2003. Get the file huffyuv_ccesp-patch_025.zip.

Another implementation exists within libavcodec (MPlayer, FFDShow, etc), which extends the format to support YV12.


I420

I420 is the exact same thing as YV12, but with the chroma plane order swapped (YV12 stores the planes in the order Y, Cr, Cb (or Y, V, U), while I420 stores them as Y, Cb, Cr (or Y, U, V)). In many (most?) practical applications, I420 is what people actually mean when they say YV12.

Avisynth treats both formats exactly the same, to the point that Info will report both of them as being YV12, IsYV12 will return true for both of them, and calling ConvertToYV12 on a clip of either format is a no-op regardless of the chroma plane order. For end users it is for most intents and purposes impossible to tell the difference (not that you should care anyway, it's none of your business). Most source filters will return I420 if you ask for YV12. However, the VfW interface will always output YV12, and swap the plane order if necessary.

If you somehow do manage to get the ordering wrong, it will most likely be immediately obvious to you (reddish colors will appear as bluish and vice versa) unless you are color blind or the clip is monochrome. You can trivially fix such problems (wrong chroma plane order, not color blindness) with a call to SwapUV.

For plugin writers: In all places except env->NewVideoFrame(vi), CS_YV12 and CS_I420 are considered identical. This distinction is to allow source filters to import YV12 and I420 video data directly into a PVideoFrame without needing to blit the individual planes.

A historical anecdote

For a very very long time MeGUI had a funny bug where it would complain about the script's output not being YV12, and offering to add ConvertToYV12() to the end of it for you. If you accepted its offer, it'd add ConvertToYV12() and then immediately pop up the same warning again. This would continue ad infinitum until you told it to stop. It'd then work just fine. This bug was eventually "fixed" by adding a "don't ask again" checkbox, and was left in that state for about two years until someone pointed out the difference (or perhaps the similarity) between YV12 and I420 to the MeGUI authors. Since the script output was I420, and the MeGUI authors didn't know what to do with that, they tried to add ConvertToYV12(), but that's a no-op on I420 and so the output format remained the same.

The moral of this story is that the definition of insanity is doing the same thing over and over and expecting different results.
(Attributed to Albert Einstein as well as to Benjamin Franklin and Mark Twain).


Interleaved

Interleaved image format is a format for storing images where all color components needed to represent a pixel are placed at the same place in memory. This is in contrast with how planar images are stored in memory.

  • Supported interleaved formats in AviSynth 2.5/2.6: RGB24, RGB32, YUY2

Related links:


ISSE

Integer SSE is a set of instructions found in most modern processors. Integer SSE is an extension of MMX. Actually, Integer SSE is a subset of the SSE command set, mostly used for video processing. These instructions are also present in AMD Athlon (all versions), AMD Duron (all versions).

Integer SSE instructions:

MASKMOVQ mmreg1, mmreg2
MOVNTQ mem64, mmreg
PAVGB mmreg1, mmreg2
PAVGB mmreg, mem64
PAVGW mmreg1, mmreg2
PAVGW mmreg, mem64
PEXTRW reg32, mmreg, imm8
PINSRW mmreg, reg32, imm8
PINSRW mmreg, mem16, imm8
PMAXSW mmreg1, mmreg2
PMAXSW mmreg, mem64
PMAXUB mmreg1, mmreg2
PMAXUB mmreg, mem64
PMINSW mmreg1, mmreg2
PMINSW mmreg, mem64
PMINUB mmreg1, mmreg2
PMINUB mmreg, mem64
PMOVMSKB reg32, mmreg
PMULHUW mmreg1, mmreg2
PMULHUW mmreg, mem64
PSADBW mmreg1, mmreg2
PSADBW mmreg, mem64
PSHUFW mmreg1, mmreg2, imm8
PSHUFW mmreg, mem64, imm8
PREFETCHNTA mem8
PREFETCHT0 mem8
PREFETCHT1 mem8
PREFETCHT2 mem8
SFENCE

Related links:


MakeAVIS

MakeAVIS is an AVI wrapper which is included in ffdshow (discussion). Note that this program was also included in the installation of AviSynth v2.52.

Get updated versions of MakeAvis from ffdshow-tryout project


MJPEG

MJPEG (Motion JPEG) is a video compression format in which each video frame or interlaced field is compressed separately as a JPEG image. As a popular video capture format, there are many proprietary MJPEG codecs - not all of which agree on such things as color matrix and/or luma range (if said codec outputs RGB)

More Information


MMX

MMX is short for MultiMedia Extensions and was developed by Intel for Pentium MMX.

It contains a set of instructions that allows the programmers to operate on 8 bytes (64 bits) in 8 registers. AviSynth uses this technology to speed up many of the internal filters.

Furthermore see Integer SSE.


Modulo

"mod" redirects here. For the modulo operator, see here. For a modded script, plugin, etc., search for that item by name.


A mathematical operation, defined as the remainder after an integer division.

Thus a mod b = 'the smallest positive remainder of a' after subtracting as many times b as possible.

Examples:

15 mod 4 = 3 since 15-3*4 = 3
42 mod 9 = 6 since 42-4*9 = 6
15 mod 5 = 0 since 15-3*5 = 0, etc ...

In image processing, to say that a resolution is mod 16 means that the both the width and height are a multiple of 16, i.e. width mod 16 = 0 and height mod 16 = 0.

Sometimes written as 'mod8', 'mod16' etc.


MPEG-4

TODO


NLE

Non-Linear (Video) Editing system: in contrast to the older Linear video editing system, which operated by writing video segments to tape, starting at the beginning of the program and working to the end. If a program was to be lengthened or shortened, everything from the cut point until the end would have to be re-edited - or else "dubbed", losing a generation of quality. Needless to say, this was very restrictive.

All modern digital video editing systems are non-linear[citation needed], allowing the (human) editor to insert, delete, trim and move segments at will. Avisynth is a type of NLE (the world's smallest by a factor of 500), as changing clips around, lengthening or trimming them, is very simple and straightforward.

Example

Suppose we start with the following clips, and we want to assemble a program:

clip_intro = AviSource("intro.avi")
clip_main = AviSource("main.avi")
clip_outro = AviSource("outro.avi")
B = BlankClip(clip_main)

Here is our first edit:

edit_1 = clip_intro
\     ++ B.Trim(0, -15) 
\     ++ clip_main 
\     ++ clip_outro
return edit_1

We find it runs a little too long, but it's easy trim it down a bit here and there:

edit_2 = clip_intro
\         .Trim(0, clip_intro.FrameCount-5)
\     ++ B.Trim(0, -15) 
\     ++ clip_main
\         .Trim(0, 12345)
\     ++ clip_main
\         .Trim(12390, 0)
\         .FadeOut(15) 
\     ++ clip_outro
\         .FadeIn(15)
return edit_2
More Information

Wikipedia: Non-linear editing system


NTSC

The NTSC specification describes the analog television system for most of the Americas (and certain other countries - see map). Although not used for modern digital television, it influences still-current standards such DVD. Its two most important characteristics (for our purposes) were its frame rate and color format.

  • NTSC-type video consists of approximately 29.97 (30000/1001 to be exact) interlaced frames of video per second. See the Wikipedia link below for the historical reasons for this seemingly-odd frame rate. In the original analog standard, each frame consisted of 525 scan lines, 483 of which were visible. The remainder was used for vertical synchronization and other purposes. Each frame was composed of two fields; each field therefore consisted of 262.5 scan lines. See interlaced fieldbased for more about interlaced video.
  • NTSC's color is governed by Rec601 (also known as 'Rec.601,' 'BT.601' and 'SMPTE 170M'), which is an international standard that describes color conversion between the RGB color format (as it is taken at the camera and displayed on the screen) and a particular YUV format, which carries the internal signal. See Colorimetry for more on this and related standards.

External links

wikipedia:NTSC (from which this summary was adapted)
Video Basics (doom9.net) An illustrated guide for beginners.


OpenDML

OpenDML is an extension to the original Video for Windows (VfW) file format. It was drawn up to overcome the original 2GByte file length restrictions in avi files.

See John McGowan's AVI Overview: OpenDML AVI File Format Extensions


Ordered dithering

todo - rewrite later

It is ordered input dithering with a 02/31 recursive Bayer pattern (contrast normal 13/42 pattern) modified for equal sums in both rows and columns. Avery

Lee described the recursive generation in his blog a while ago on Dithering. I modified the resultant pattern for equal summing to eliminate an obvious

pattern visible in 16x16 cells.

The dithering is added as an extra lower 8 bits (4bits for chroma) on the input pixels, making 16bit data. This is then used as an index into a 65K LUT to

get the output 8 bit pixel. The dither pattern effectively replaces the 0.5 rounding term in generating the LUT.

without dithering:

i = 0..255
mapR[i] = int(min(max(i * r/255.0, 0.0), 1.0) * 255.0 + 0.5);
example: r=2, i=16 => mapR[16] = int(32/255.0 * 255.0 + 0.5) = 32

     for (int y=0; y<vi.height; ++y) {
       for (int x=0; x<vi.width; ++x) {
         p[x] = map[p[x]];
       }
       p += pitch;
     }

with dithering:

i = 0..256*256-1
mapR[i] = int(min(max(i * r - 127.5)/(255.0*256), 0.0), 1.0) * 255.0 + 0.5);
example: r=2, i=16*256 => mapR[16*256] = int((32*256 - 127.5)/(255.0*256) * 255.0 + 0.5) = 32
bias = -127.5 ??

     for (int y=0; y<vi.height; ++y) {
       const int _y = (y << 4) & 0xf0;
       for (int x=0; x<vi.width; ++x) {
         p[x] = map[ p[x]<<8 | ditherMap[(x&0x0f)|_y] ];
       }
       p += pitch;
     }
// 16x16 dither table:
_y = (y << 4) & 0xf0 = ...
ditherMap[(x&0x0f)|_y] = ...

y=15+3 => _y = (y << 4) & 0xf0 = 18/2^4 & (f*16 + 0*1) = 18/16 & 15*16 = 1 & 1111 0000 = 0
y=15*16 => _y = (y << 4) & 0xf0 = 15*16/2^4 & (f*16 + 0*1) = 15 & 15*16 = 1111 & 1111 0000 = 0
ditherMap[(x&0x0f)|_y] = ditherMap[(x & 1111) | 0] = ditherMap[x & 1111]
y=16*16 => _y = (y << 4) & 0xf0 = 16*16/2^4 & (f*16 + 0*1) = 16 & 15*16 = 1 0000 & 1111 0000 = 1
ditherMap[(x&0x0f)|_y] = ditherMap[(x & 1111) | 1]
* so each 256 pixels, the offset in ditherMap is shifted by one. 

// 4x4 dither table:
const int _y = (y << 2) & 0xC;
ditherMap4[(x&0x3)|_y];

http://web.archive.org/web/20130512190753/http://white.stanford.edu/~brian/psy221/reader/Bayer.1973.pdf


PAL

The Phase Alternating Line (PAL) specification describes the analog television system for Europe (and other countries - see map). Although not used for modern digital television, it influences still-current standards such DVD. Its two most important characteristics (for our purposes) were its frame rate and color format.

  • PAL-type video consists of 25 interlaced frames of video per second. In the original analog standard, each frame consisted of 625 scan lines, 576 of which were visible. The remainder was used for vertical synchronization and other purposes. Each frame was composed of two fields; each field therefore consisted of 312.5 scan lines. See interlaced fieldbased for more about interlaced video.
  • PAL's color is governed by Rec601 (also known as 'Rec.601,' 'BT.601' and 'SMPTE 170M'), which is an international standard that describes color conversion between the RGB color format (as it is taken at the camera and displayed on the screen) and a particular YUV format, which carries the internal signal. See Colorimetry for more on this and related standards.

External links

wikipedia:PAL (from which this summary was adapted)
Video Basics (doom9.net) An illustrated guide for beginners.


PCM

PCM (Pulse Code Modulation) is, in the multimedia context, a type of audio encoding. It is uncompressed.

More Information


Pitch

TODO


Planar

Planar image format is a format for storing images where each color component needed to represent a pixel is placed at a separate place (block) in memory. This is in contrast with how interleaved images are stored in memory.

  • Supported planar formats in AviSynth 2.5: YV12
  • Supported planar formats in AviSynth 2.6: Y8, YV12, YV16, YV24, YV411

Some examples of other planar formats: I420 (same as YV12, but the chroma plane order is swapped).


Related links:


PSNR

PSNR stands for Peak Signal-to-Noise Ratio. It is used as a measure of video quality. It is expressed in decibels. It's defined as <math>PSNR(I,K) = 20 \cdot \log_{10}{(\frac{255}{\sqrt{MSE(I,K)}})}</math>

where I is the reference image, K is the image under test, and MSE is the Mean Squared Error between the two:

<math>MSE(I,K) = \frac{1}{M} \cdot \sum_{j=0}^{width-1} \sum_{k=0}^{height-1} | I(j,k) - K(j,k) |^{2}</math>

Where M is the number of pixels in a frame (width · height)

The double-Σ term states that (j,k) runs over all the pixels, summing the square of the difference between reference image I and test image K.

External Links

  • For more details see wikipedia:PSNR.
  • See also, wikipedia:SSIM (Structural SIMilarity), another widely-used quality measure.
  • See also, wikipedia:Video Quality for a discussion of the difficulties involved in trying to make an "objective" quality measurement.


RGB

RGB (from Red, Green, Blue) is a color model - or color space - that consists of three primary colors (Red, Green and Blue, of course) which are added together at different proportions to create any other color. Thus RGB is an additive color model.

The number of bits used to provide R,G,B information (ie the color depth) determines the maximum number of color variations that can be represented by a specific RGB model. Typical values for modern video is 8 bits (one byte) for each R,G,B primary. Typical specific RGB formats for video are RGB32 and RGB24.


Related Links


RGB24

RGB24 is an RGB video format where each pixel of the image contains one byte for each of the R (red), G (green) and B (blue) components in successive places of memory. Since one byte occupies 8 bits, the total number of bits consumed by one pixel is 3*8 = 24 and thus the 24 at the end of the format's name.

As it is apparent, the layout of bytes in memory for an RGB24 video frame follows (assuming a least-significant-bit-is-first memory layout) the pattern below:

low memory address    ---->      high memory address
|pixel|pixel|pixel|pixel|pixel|pixel|pixel|pixel|...
|-----|-----|-----|-----|-----|-----|-----|-----|...
|B|G|R|B|G|R|B|G|R|B|G|R|B|G|R|B|G|R|B|G|R|B|G|R|...


RGB32

RGB32 is an RGB video format where each pixel of the image contains one byte for each of the R (red), G (green) and B (blue) components plus an additional byte for transparency mask (A component) in successive places of memory. Since one byte occupies 8 bits, the total number of bits consumed by one pixel is 4*8 = 32 and thus the 32 at the end of the format's name.

As it is apparent, the layout of bytes in memory for an RGB32 video frame follows (assuming a least-significant-bit-is-first memory layout) the pattern below:

low memory address    ---->      high memory address
| pixel | pixel | pixel | pixel | pixel | pixel |...
|-------|-------|-------|-------|-------|-------|...
|B|G|R|A|B|G|R|A|B|G|R|A|B|G|R|A|B|G|R|A|B|G|R|A|...

Using the RGB32 video format provides in modern processors faster access to video data because the data is aligned to machine's word boundaries. For this reason many applications use it instead of RGB24 even when there is no transparency mask information in the fourth (A) byte, since in general the improved processing speed outweighs the memory overhead introduced by the unused A byte per pixel.


SourceForge

SourceForge is the world's largest OpenSource development website, with the largest repository of OpenSource code and applications available on the Internet. Our SourceForge project page provides us with version control, bug and issue tracking, project management, backups and archives, and communication and collaboration resources.

Perhaps most importantly, SourceForge is where you'll get a copy of our software for yourself.


SSE

SSE is a new instruction set (set of processor commands) present in most modern processors such as P3, P4, newer Celerons, Athlon XP, MP. It is an extension of the MMX instruction set. A subset of SSE to operate on integers, Integer SSE (or ISSE for short) is of special interest to Avisynth plugins developers because it is mostly used for video processing.

Personal tools