BM3D_CPU (clip, clip "ref", float[] "sigma", int[] "block_step", int[] "bm_range", int "radius", int[] "ps_num", int[] "ps_range", bool "chroma")

BM3D_CUDA (clip, clip "ref", float[] "sigma", int[] "block_step", int[] "bm_range", int "radius", int[] "ps_num", int[] "ps_range", bool "chroma", int "device_id", bool "fast", int "extractor_exp")

The reference clip. Must be of the same format, width, height, number of frames as clip.

sigma float[] [3.0,3.0,3.0]

The strength of denoising.

The strength is similar (but not strictly equal) as VapourSynth-BM3D due to differences in implementation. (coefficient normalization is not implemented, for example).

block_step int[] [8,8,8]

Sliding step to process every next reference block, valid range [1-8].

Total number of reference blocks to be processed can be calculated approximately by (width / block_step) * (height / block_step).

Smaller step results in processing more reference blocks, and is slower.

bm_range int[] [9,9,9]

Length of the side of the search neighborhood for block-matching, valid range [1-8].

The size of search window is (bm_range * 2 + 1) x (bm_range * 2 + 1).

Line 54: | Line 54: | ||

Thus, feel free to use large radius as long as your RAM is large enough :D

ps_num int[] [2,2,2]

The number of matched locations used for predictive search, valid range [1-8].

Larger value increases the possibility to match more similar blocks, with tiny increasing in computational cost. But in the original MATLAB implementation of V-BM3D, it's fixed to 2 for all profiles except "lc", perhaps larger value is not always good for quality?

ps_range int[] [4,4,4]

Length of the side of the search neighborhood for predictive-search block-matching, valid range [1, +inf).

Note: parameters sigma, block_step, bm_range, ps_num, and ps_range are arrays. If chroma is set to True, only the first value is in effect. Otherwise an array of values may be specified for each plane (except radius).

chroma bool false

## Description

BM3D denoising filter for AviSynth+, implemented in CUDA. Also includes a cpu version implemented in AVX and AVX2 intrinsics that serves as a reference implementation on CPU. However, bitwise identical outputs are not guaranteed across CPU and CUDA implementations.

## Requirements

- CPU with AVX support (AVX2 required for
`BM3D_CPU`

). - CUDA-enabled GPU(s) of compute capability 5.0 or higher (Maxwell+).
- GPU driver 450 or newer.

## Syntax and Parameters

BM3D_CPU (clip, clip "ref", float[] "sigma", int[] "block_step", int[] "bm_range", int "radius", int[] "ps_num", int[] "ps_range", bool "chroma")

BM3D_CUDA (clip, clip "ref", float[] "sigma", int[] "block_step", int[] "bm_range", int "radius", int[] "ps_num", int[] "ps_range", bool "chroma", int "device_id", bool "fast", int "extractor_exp")

*clip*=

- The input clip. Must be planar 32 bit float format. Each plane is denoised separately if
`chroma`

is set to False.

- The input clip. Must be planar 32 bit float format. Each plane is denoised separately if

*clip*ref =

- The reference clip. Must be of the same format, width, height, number of frames as clip.

*float[]*sigma =*[3.0,3.0,3.0]*

- The strength of denoising.
- The strength is similar (but not strictly equal) as VapourSynth-BM3D due to differences in implementation. (coefficient normalization is not implemented, for example).

*int[]*block_step =*[8,8,8]*

- Sliding step to process every next reference block, valid range [1-8].
- Total number of reference blocks to be processed can be calculated approximately by (width / block_step) * (height / block_step).
- Smaller step results in processing more reference blocks, and is slower.

*int[]*bm_range =*[9,9,9]*

- Length of the side of the search neighborhood for block-matching, valid range [1-8].
- The size of search window is (bm_range * 2 + 1) x (bm_range * 2 + 1).
- Larger is slower, with more chances to find similar patches.

*int*radius =*0*

- The temporal radius for denoising, valid range [1, 16].
- For each processed frame, (radius * 2 + 1) frames will be requested, and the filtering result will be returned to these frames by BM3D_VAggregate.
- Increasing radius only increases tiny computational cost in block-matching and aggregation, and will not affect collaborative filtering, but the memory consumption can grow quadratically.
- Thus, feel free to use large radius as long as your RAM is large enough :D

*int[]*ps_num =*[2,2,2]*

- The number of matched locations used for predictive search, valid range [1-8].
- Larger value increases the possibility to match more similar blocks, with tiny increasing in computational cost. But in the original MATLAB implementation of V-BM3D, it's fixed to 2 for all profiles except "lc", perhaps larger value is not always good for quality?

*int[]*ps_range =*[4,4,4]*

- Length of the side of the search neighborhood for predictive-search block-matching, valid range [1, +inf).

**Note:**parameters`sigma`

,`block_step`

,`bm_range`

,`ps_num`

, and`ps_range`

are arrays. If`chroma`is set to`True`

, only the first value is in effect. Otherwise an array of values may be specified for each plane (except`radius`

).

*bool*chroma =*false*

- CBM3D algorithm. Input clip must be of YUV444PS format.
- Y channel is used in block-matching of chroma channels.

*int*device_id =*0*

- Set GPU to be used.

*bool*fast =*true*

- Multi-threaded copy between CPU and GPU at the expense of 4x memory consumption.

*int*extractor_exp =*0*

- Used for deterministic (bitwise) output. This parameter is not present in the cpu version since the implementation always produces deterministic output.
- Pre-rounding is employed for associative floating-point summation.
- The value should be a positive integer not less than 3, and may need to be higher depending on the source video and filter parameters.

### BM3D_VAggregate

BM3D_VAggregate should be called after temporal filtering.

BM3D_VAggregate (clip, int "radius")

*clip*=

- The input clip. Must be of 32 bit float format.

*int*radius =*0*

- Same as BM3D.

## Examples

DGSource("sample.dgi") ConvertBits(bits=32) BM3D_CUDA(sigma=0.5, radius=2) BM3D_VAggregate(radius=2) ConvertBits(bits=16)

## Changelog

Version Date Changes

test3 2021/08/06 - Separates VAggregate and compiles for AVX test2 2021/08/01 - CPU version test1 2021/07/25 - Initial release

## External Links

- GitHub - Source code repository.

**Back to External Filters ←**