# BM3DCUDA

(BM3DCUDA) |
(Document arrays) |
||

Line 23: | Line 23: | ||

== [[Script variables|Syntax and Parameters]] == | == [[Script variables|Syntax and Parameters]] == | ||

− | {{Template:FuncDef|BM3D_CPU (clip, clip "ref", float "sigma", int "block_step", int "bm_range", int "radius", int "ps_num", int "ps_range", bool "chroma")}} | + | {{Template:FuncDef|BM3D_CPU (clip, clip "ref", float[] "sigma", int[] "block_step", int[] "bm_range", int "radius", int[] "ps_num", int[] "ps_range", bool "chroma")}} |

− | {{Template:FuncDef|BM3D_CUDA (clip, clip "ref", float "sigma", int "block_step", int "bm_range", int "radius", int "ps_num", int "ps_range", bool "chroma", int "device_id", bool "fast", int "extractor_exp")}} | + | {{Template:FuncDef|BM3D_CUDA (clip, clip "ref", float[] "sigma", int[] "block_step", int[] "bm_range", int "radius", int[] "ps_num", int[] "ps_range", bool "chroma", int "device_id", bool "fast", int "extractor_exp")}} |

<br> | <br> | ||

Line 34: | Line 34: | ||

:::The reference clip. Must be of the same format, width, height, number of frames as clip. | :::The reference clip. Must be of the same format, width, height, number of frames as clip. | ||

<br> | <br> | ||

− | ::{{Par2|sigma|float|3.0}} | + | ::{{Par2|sigma|float[]|[3.0,3.0,3.0]}} |

:::The strength of denoising. | :::The strength of denoising. | ||

:::The strength is similar (but not strictly equal) as VapourSynth-BM3D due to differences in implementation. (coefficient normalization is not implemented, for example). | :::The strength is similar (but not strictly equal) as VapourSynth-BM3D due to differences in implementation. (coefficient normalization is not implemented, for example). | ||

<br> | <br> | ||

− | ::{{Par2|block_step|int|8 | + | ::{{Par2|block_step|int[]|[8,8,8]}} |

:::Sliding step to process every next reference block, valid range [1-8]. | :::Sliding step to process every next reference block, valid range [1-8]. | ||

:::Total number of reference blocks to be processed can be calculated approximately by (width / block_step) * (height / block_step). | :::Total number of reference blocks to be processed can be calculated approximately by (width / block_step) * (height / block_step). | ||

:::Smaller step results in processing more reference blocks, and is slower. | :::Smaller step results in processing more reference blocks, and is slower. | ||

<br> | <br> | ||

− | ::{{Par2|bm_range|int|9}} | + | ::{{Par2|bm_range|int[]|[9,9,9]}} |

:::Length of the side of the search neighborhood for block-matching, valid range [1-8]. | :::Length of the side of the search neighborhood for block-matching, valid range [1-8]. | ||

:::The size of search window is (bm_range * 2 + 1) x (bm_range * 2 + 1). | :::The size of search window is (bm_range * 2 + 1) x (bm_range * 2 + 1). | ||

Line 54: | Line 54: | ||

:::Thus, feel free to use large radius as long as your RAM is large enough :D | :::Thus, feel free to use large radius as long as your RAM is large enough :D | ||

<br> | <br> | ||

− | ::{{Par2|ps_num|int|2}} | + | ::{{Par2|ps_num|int[]|[2,2,2]}} |

:::The number of matched locations used for predictive search, valid range [1-8]. | :::The number of matched locations used for predictive search, valid range [1-8]. | ||

:::Larger value increases the possibility to match more similar blocks, with tiny increasing in computational cost. But in the original MATLAB implementation of V-BM3D, it's fixed to 2 for all profiles except "lc", perhaps larger value is not always good for quality? | :::Larger value increases the possibility to match more similar blocks, with tiny increasing in computational cost. But in the original MATLAB implementation of V-BM3D, it's fixed to 2 for all profiles except "lc", perhaps larger value is not always good for quality? | ||

<br> | <br> | ||

− | ::{{Par2|ps_range|int|4}} | + | ::{{Par2|ps_range|int[]|[4,4,4]}} |

::: Length of the side of the search neighborhood for predictive-search block-matching, valid range [1, +inf). | ::: Length of the side of the search neighborhood for predictive-search block-matching, valid range [1, +inf). | ||

+ | <br> | ||

+ | ::<span style="color:red">'''Note:'''</span> parameters {{Template:FuncDef2|sigma}}, {{Template:FuncDef2|block_step}}, {{Template:FuncDef2|bm_range}}, {{Template:FuncDef2|ps_num}}, and {{Template:FuncDef2|ps_range}} are arrays. If {{Template:FuncDef3|chroma}} is set to <code>True</code>, only the first value is in effect. Otherwise an array of values may be specified for each plane (except {{Template:FuncDef2|radius}}). | ||

<br> | <br> | ||

::{{Par2|chroma|bool|false}} | ::{{Par2|chroma|bool|false}} |

## Revision as of 10:37, 12 August 2021

Abstract | |
---|---|

Author | WolframRhodium |

Version | test3 |

Download | BM3DCUDA_AVS-test3.zip |

Category | Denoisers |

License | GPLv2 |

Discussion | Doom9 Forum |

## Contents |

## Description

BM3D denoising filter for AviSynth+, implemented in CUDA. Also includes a cpu version implemented in AVX and AVX2 intrinsics that serves as a reference implementation on CPU. However, bitwise identical outputs are not guaranteed across CPU and CUDA implementations.

## Requirements

- CPU with AVX support (AVX2 required for
`BM3D_CPU`

). - CUDA-enabled GPU(s) of compute capability 5.0 or higher (Maxwell+).
- GPU driver 450 or newer.

## Syntax and Parameters

BM3D_CPU (clip, clip "ref", float[] "sigma", int[] "block_step", int[] "bm_range", int "radius", int[] "ps_num", int[] "ps_range", bool "chroma")

BM3D_CUDA (clip, clip "ref", float[] "sigma", int[] "block_step", int[] "bm_range", int "radius", int[] "ps_num", int[] "ps_range", bool "chroma", int "device_id", bool "fast", int "extractor_exp")

*clip*=

- The input clip. Must be planar 32 bit float format. Each plane is denoised separately if
`chroma`

is set to False.

- The input clip. Must be planar 32 bit float format. Each plane is denoised separately if

*clip*ref =

- The reference clip. Must be of the same format, width, height, number of frames as clip.

*float[]*sigma =*[3.0,3.0,3.0]*

- The strength of denoising.
- The strength is similar (but not strictly equal) as VapourSynth-BM3D due to differences in implementation. (coefficient normalization is not implemented, for example).

*int[]*block_step =*[8,8,8]*

- Sliding step to process every next reference block, valid range [1-8].
- Total number of reference blocks to be processed can be calculated approximately by (width / block_step) * (height / block_step).
- Smaller step results in processing more reference blocks, and is slower.

*int[]*bm_range =*[9,9,9]*

- Length of the side of the search neighborhood for block-matching, valid range [1-8].
- The size of search window is (bm_range * 2 + 1) x (bm_range * 2 + 1).
- Larger is slower, with more chances to find similar patches.

*int*radius =*0*

- The temporal radius for denoising, valid range [1, 16].
- For each processed frame, (radius * 2 + 1) frames will be requested, and the filtering result will be returned to these frames by BM3D_VAggregate.
- Increasing radius only increases tiny computational cost in block-matching and aggregation, and will not affect collaborative filtering, but the memory consumption can grow quadratically.
- Thus, feel free to use large radius as long as your RAM is large enough :D

*int[]*ps_num =*[2,2,2]*

- The number of matched locations used for predictive search, valid range [1-8].
- Larger value increases the possibility to match more similar blocks, with tiny increasing in computational cost. But in the original MATLAB implementation of V-BM3D, it's fixed to 2 for all profiles except "lc", perhaps larger value is not always good for quality?

*int[]*ps_range =*[4,4,4]*

- Length of the side of the search neighborhood for predictive-search block-matching, valid range [1, +inf).

**Note:**parameters`sigma`

,`block_step`

,`bm_range`

,`ps_num`

, and`ps_range`

are arrays. If`chroma`is set to`True`

, only the first value is in effect. Otherwise an array of values may be specified for each plane (except`radius`

).

*bool*chroma =*false*

- CBM3D algorithm. Input clip must be of YUV444PS format.
- Y channel is used in block-matching of chroma channels.

*int*device_id =*0*

- Set GPU to be used.

*bool*fast =*true*

- Multi-threaded copy between CPU and GPU at the expense of 4x memory consumption.

*int*extractor_exp =*0*

- Used for deterministic (bitwise) output. This parameter is not present in the cpu version since the implementation always produces deterministic output.
- Pre-rounding is employed for associative floating-point summation.
- The value should be a positive integer not less than 3, and may need to be higher depending on the source video and filter parameters.

### BM3D_VAggregate

BM3D_VAggregate should be called after temporal filtering.

BM3D_VAggregate (clip, int "radius")

*clip*=

- The input clip. Must be of 32 bit float format.

*int*radius =*0*

- Same as BM3D.

## Examples

DGSource("sample.dgi") ConvertBits(bits=32) BM3D_CUDA(sigma=0.5, radius=2) BM3D_VAggregate(radius=2) ConvertBits(bits=16)

## Changelog

Version Date Changes

test3 2021/08/06 - Separates VAggregate and compiles for AVX test2 2021/08/01 - CPU version test1 2021/07/25 - Initial release

## External Links

- GitHub - Source code repository.

**Back to External Filters ←**