BM3DCUDA
(BM3DCUDA) |
(Document arrays) |
||
Line 23: | Line 23: | ||
== [[Script variables|Syntax and Parameters]] == | == [[Script variables|Syntax and Parameters]] == | ||
− | {{Template:FuncDef|BM3D_CPU (clip, clip "ref", float "sigma", int "block_step", int "bm_range", int "radius", int "ps_num", int "ps_range", bool "chroma")}} | + | {{Template:FuncDef|BM3D_CPU (clip, clip "ref", float[] "sigma", int[] "block_step", int[] "bm_range", int "radius", int[] "ps_num", int[] "ps_range", bool "chroma")}} |
− | {{Template:FuncDef|BM3D_CUDA (clip, clip "ref", float "sigma", int "block_step", int "bm_range", int "radius", int "ps_num", int "ps_range", bool "chroma", int "device_id", bool "fast", int "extractor_exp")}} | + | {{Template:FuncDef|BM3D_CUDA (clip, clip "ref", float[] "sigma", int[] "block_step", int[] "bm_range", int "radius", int[] "ps_num", int[] "ps_range", bool "chroma", int "device_id", bool "fast", int "extractor_exp")}} |
<br> | <br> | ||
Line 34: | Line 34: | ||
:::The reference clip. Must be of the same format, width, height, number of frames as clip. | :::The reference clip. Must be of the same format, width, height, number of frames as clip. | ||
<br> | <br> | ||
− | ::{{Par2|sigma|float|3.0}} | + | ::{{Par2|sigma|float[]|[3.0,3.0,3.0]}} |
:::The strength of denoising. | :::The strength of denoising. | ||
:::The strength is similar (but not strictly equal) as VapourSynth-BM3D due to differences in implementation. (coefficient normalization is not implemented, for example). | :::The strength is similar (but not strictly equal) as VapourSynth-BM3D due to differences in implementation. (coefficient normalization is not implemented, for example). | ||
<br> | <br> | ||
− | ::{{Par2|block_step|int|8 | + | ::{{Par2|block_step|int[]|[8,8,8]}} |
:::Sliding step to process every next reference block, valid range [1-8]. | :::Sliding step to process every next reference block, valid range [1-8]. | ||
:::Total number of reference blocks to be processed can be calculated approximately by (width / block_step) * (height / block_step). | :::Total number of reference blocks to be processed can be calculated approximately by (width / block_step) * (height / block_step). | ||
:::Smaller step results in processing more reference blocks, and is slower. | :::Smaller step results in processing more reference blocks, and is slower. | ||
<br> | <br> | ||
− | ::{{Par2|bm_range|int|9}} | + | ::{{Par2|bm_range|int[]|[9,9,9]}} |
:::Length of the side of the search neighborhood for block-matching, valid range [1-8]. | :::Length of the side of the search neighborhood for block-matching, valid range [1-8]. | ||
:::The size of search window is (bm_range * 2 + 1) x (bm_range * 2 + 1). | :::The size of search window is (bm_range * 2 + 1) x (bm_range * 2 + 1). | ||
Line 54: | Line 54: | ||
:::Thus, feel free to use large radius as long as your RAM is large enough :D | :::Thus, feel free to use large radius as long as your RAM is large enough :D | ||
<br> | <br> | ||
− | ::{{Par2|ps_num|int|2}} | + | ::{{Par2|ps_num|int[]|[2,2,2]}} |
:::The number of matched locations used for predictive search, valid range [1-8]. | :::The number of matched locations used for predictive search, valid range [1-8]. | ||
:::Larger value increases the possibility to match more similar blocks, with tiny increasing in computational cost. But in the original MATLAB implementation of V-BM3D, it's fixed to 2 for all profiles except "lc", perhaps larger value is not always good for quality? | :::Larger value increases the possibility to match more similar blocks, with tiny increasing in computational cost. But in the original MATLAB implementation of V-BM3D, it's fixed to 2 for all profiles except "lc", perhaps larger value is not always good for quality? | ||
<br> | <br> | ||
− | ::{{Par2|ps_range|int|4}} | + | ::{{Par2|ps_range|int[]|[4,4,4]}} |
::: Length of the side of the search neighborhood for predictive-search block-matching, valid range [1, +inf). | ::: Length of the side of the search neighborhood for predictive-search block-matching, valid range [1, +inf). | ||
+ | <br> | ||
+ | ::<span style="color:red">'''Note:'''</span> parameters {{Template:FuncDef2|sigma}}, {{Template:FuncDef2|block_step}}, {{Template:FuncDef2|bm_range}}, {{Template:FuncDef2|ps_num}}, and {{Template:FuncDef2|ps_range}} are arrays. If {{Template:FuncDef3|chroma}} is set to <code>True</code>, only the first value is in effect. Otherwise an array of values may be specified for each plane (except {{Template:FuncDef2|radius}}). | ||
<br> | <br> | ||
::{{Par2|chroma|bool|false}} | ::{{Par2|chroma|bool|false}} |
Revision as of 09:37, 12 August 2021
Abstract | |
---|---|
Author | WolframRhodium |
Version | test3 |
Download | BM3DCUDA_AVS-test3.zip |
Category | Denoisers |
License | GPLv2 |
Discussion | Doom9 Forum |
Contents |
Description
BM3D denoising filter for AviSynth+, implemented in CUDA. Also includes a cpu version implemented in AVX and AVX2 intrinsics that serves as a reference implementation on CPU. However, bitwise identical outputs are not guaranteed across CPU and CUDA implementations.
Requirements
- CPU with AVX support (AVX2 required for
BM3D_CPU
). - CUDA-enabled GPU(s) of compute capability 5.0 or higher (Maxwell+).
- GPU driver 450 or newer.
Syntax and Parameters
BM3D_CPU (clip, clip "ref", float[] "sigma", int[] "block_step", int[] "bm_range", int "radius", int[] "ps_num", int[] "ps_range", bool "chroma")
BM3D_CUDA (clip, clip "ref", float[] "sigma", int[] "block_step", int[] "bm_range", int "radius", int[] "ps_num", int[] "ps_range", bool "chroma", int "device_id", bool "fast", int "extractor_exp")
- clip =
- The input clip. Must be planar 32 bit float format. Each plane is denoised separately if
chroma
is set to False.
- The input clip. Must be planar 32 bit float format. Each plane is denoised separately if
- clip =
- clip ref =
- The reference clip. Must be of the same format, width, height, number of frames as clip.
- clip ref =
- float[] sigma = [3.0,3.0,3.0]
- The strength of denoising.
- The strength is similar (but not strictly equal) as VapourSynth-BM3D due to differences in implementation. (coefficient normalization is not implemented, for example).
- float[] sigma = [3.0,3.0,3.0]
- int[] block_step = [8,8,8]
- Sliding step to process every next reference block, valid range [1-8].
- Total number of reference blocks to be processed can be calculated approximately by (width / block_step) * (height / block_step).
- Smaller step results in processing more reference blocks, and is slower.
- int[] block_step = [8,8,8]
- int[] bm_range = [9,9,9]
- Length of the side of the search neighborhood for block-matching, valid range [1-8].
- The size of search window is (bm_range * 2 + 1) x (bm_range * 2 + 1).
- Larger is slower, with more chances to find similar patches.
- int[] bm_range = [9,9,9]
- int radius = 0
- The temporal radius for denoising, valid range [1, 16].
- For each processed frame, (radius * 2 + 1) frames will be requested, and the filtering result will be returned to these frames by BM3D_VAggregate.
- Increasing radius only increases tiny computational cost in block-matching and aggregation, and will not affect collaborative filtering, but the memory consumption can grow quadratically.
- Thus, feel free to use large radius as long as your RAM is large enough :D
- int radius = 0
- int[] ps_num = [2,2,2]
- The number of matched locations used for predictive search, valid range [1-8].
- Larger value increases the possibility to match more similar blocks, with tiny increasing in computational cost. But in the original MATLAB implementation of V-BM3D, it's fixed to 2 for all profiles except "lc", perhaps larger value is not always good for quality?
- int[] ps_num = [2,2,2]
- int[] ps_range = [4,4,4]
- Length of the side of the search neighborhood for predictive-search block-matching, valid range [1, +inf).
- int[] ps_range = [4,4,4]
- Note: parameters
sigma
,block_step
,bm_range
,ps_num
, andps_range
are arrays. If chroma is set toTrue
, only the first value is in effect. Otherwise an array of values may be specified for each plane (exceptradius
).
- Note: parameters
- bool chroma = false
- CBM3D algorithm. Input clip must be of YUV444PS format.
- Y channel is used in block-matching of chroma channels.
- bool chroma = false
- int device_id = 0
- Set GPU to be used.
- int device_id = 0
- bool fast = true
- Multi-threaded copy between CPU and GPU at the expense of 4x memory consumption.
- bool fast = true
- int extractor_exp = 0
- Used for deterministic (bitwise) output. This parameter is not present in the cpu version since the implementation always produces deterministic output.
- Pre-rounding is employed for associative floating-point summation.
- The value should be a positive integer not less than 3, and may need to be higher depending on the source video and filter parameters.
- int extractor_exp = 0
BM3D_VAggregate
BM3D_VAggregate should be called after temporal filtering.
BM3D_VAggregate (clip, int "radius")
- clip =
- The input clip. Must be of 32 bit float format.
- clip =
- int radius = 0
- Same as BM3D.
- int radius = 0
Examples
DGSource("sample.dgi") ConvertBits(bits=32) BM3D_CUDA(sigma=0.5, radius=2) BM3D_VAggregate(radius=2) ConvertBits(bits=16)
Changelog
Version Date Changes
test3 2021/08/06 - Separates VAggregate and compiles for AVX test2 2021/08/01 - CPU version test1 2021/07/25 - Initial release
External Links
- GitHub - Source code repository.
Back to External Filters ←