BM3DCUDA

From Avisynth wiki
(Difference between revisions)
Jump to: navigation, search
(BM3DCUDA)
 
(test10)
 
(2 intermediate revisions by one user not shown)
Line 2: Line 2:
 
{{Filter3
 
{{Filter3
 
|1=[https://github.com/WolframRhodium WolframRhodium]
 
|1=[https://github.com/WolframRhodium WolframRhodium]
|2=test3
+
|2=test10 (58dbc0a)
|3=[https://github.com/WolframRhodium/VapourSynth-BM3DCUDA/issues/7 BM3DCUDA_AVS-test3.zip]
+
|3=[https://github.com/WolframRhodium/VapourSynth-BM3DCUDA/issues/7 BM3DCUDA_AVS-test10.zip]
 
|4=Denoisers  
 
|4=Denoisers  
 
|5=[https://www.gnu.org/licenses/gpl-2.0.txt GPLv2]
 
|5=[https://www.gnu.org/licenses/gpl-2.0.txt GPLv2]
Line 14: Line 14:
 
<br>
 
<br>
 
== Requirements ==
 
== Requirements ==
* [x86] / [x64]: [[AviSynth+]]
+
* [x64]: [[AviSynth+]]
 
* Supported color formats: 32-bit Y/YUV/RGB [[planar]]
 
* Supported color formats: 32-bit Y/YUV/RGB [[planar]]
<br>
+
 
 
*CPU with [[AVX]] support ([[AVX2]] required for <code>BM3D_CPU</code>).
 
*CPU with [[AVX]] support ([[AVX2]] required for <code>BM3D_CPU</code>).
 
*CUDA-enabled GPU(s) of [https://developer.nvidia.com/cuda-gpus compute capability] 5.0 or higher (Maxwell+).
 
*CUDA-enabled GPU(s) of [https://developer.nvidia.com/cuda-gpus compute capability] 5.0 or higher (Maxwell+).
Line 23: Line 23:
  
 
== [[Script variables|Syntax and Parameters]] ==
 
== [[Script variables|Syntax and Parameters]] ==
{{Template:FuncDef|BM3D_CPU (clip, clip "ref", float "sigma", int "block_step", int "bm_range", int "radius", int "ps_num", int "ps_range", bool "chroma")}}
+
{{Template:FuncDef|BM3D_CPU (clip, clip "ref", float[] "sigma", int[] "block_step", int[] "bm_range", int "radius", int[] "ps_num", int[] "ps_range", bool "chroma")}}
  
{{Template:FuncDef|BM3D_CUDA (clip, clip "ref", float "sigma", int "block_step", int "bm_range", int "radius", int "ps_num", int "ps_range", bool "chroma", int "device_id", bool "fast", int "extractor_exp")}}
+
{{Template:FuncDef|BM3D_CUDA (clip, clip "ref", float[] "sigma", int[] "block_step", int[] "bm_range", int "radius", int[] "ps_num", int[] "ps_range", bool "chroma", int "device_id", bool "fast", int "extractor_exp")}}
  
 
<br>
 
<br>
Line 34: Line 34:
 
:::The reference clip. Must be of the same format, width, height, number of frames as clip.
 
:::The reference clip. Must be of the same format, width, height, number of frames as clip.
 
<br>
 
<br>
::{{Par2|sigma|float|3.0}}
+
::{{Par2|sigma|float[]|[3.0,3.0,3.0]}}
 
:::The strength of denoising.
 
:::The strength of denoising.
 
:::The strength is similar (but not strictly equal) as VapourSynth-BM3D due to differences in implementation. (coefficient normalization is not implemented, for example).
 
:::The strength is similar (but not strictly equal) as VapourSynth-BM3D due to differences in implementation. (coefficient normalization is not implemented, for example).
 
<br>
 
<br>
::{{Par2|block_step|int|8.0}}
+
::{{Par2|block_step|int[]|[8,8,8]}}
 
:::Sliding step to process every next reference block, valid range [1-8].
 
:::Sliding step to process every next reference block, valid range [1-8].
 
:::Total number of reference blocks to be processed can be calculated approximately by (width / block_step) * (height / block_step).
 
:::Total number of reference blocks to be processed can be calculated approximately by (width / block_step) * (height / block_step).
 
:::Smaller step results in processing more reference blocks, and is slower.
 
:::Smaller step results in processing more reference blocks, and is slower.
 
<br>
 
<br>
::{{Par2|bm_range|int|9}}
+
::{{Par2|bm_range|int[]|[9,9,9]}}
:::Length of the side of the search neighborhood for block-matching, valid range [1-8].
+
:::Length of the side of the search neighborhood for block-matching.
 
:::The size of search window is (bm_range * 2 + 1) x (bm_range * 2 + 1).
 
:::The size of search window is (bm_range * 2 + 1) x (bm_range * 2 + 1).
 
:::Larger is slower, with more chances to find similar patches.
 
:::Larger is slower, with more chances to find similar patches.
Line 54: Line 54:
 
:::Thus, feel free to use large radius as long as your RAM is large enough :D
 
:::Thus, feel free to use large radius as long as your RAM is large enough :D
 
<br>
 
<br>
::{{Par2|ps_num|int|2}}
+
::{{Par2|ps_num|int[]|[2,2,2]}}
 
:::The number of matched locations used for predictive search, valid range [1-8].
 
:::The number of matched locations used for predictive search, valid range [1-8].
 
:::Larger value increases the possibility to match more similar blocks, with tiny increasing in computational cost. But in the original MATLAB implementation of V-BM3D, it's fixed to 2 for all profiles except "lc", perhaps larger value is not always good for quality?
 
:::Larger value increases the possibility to match more similar blocks, with tiny increasing in computational cost. But in the original MATLAB implementation of V-BM3D, it's fixed to 2 for all profiles except "lc", perhaps larger value is not always good for quality?
 
<br>
 
<br>
::{{Par2|ps_range|int|4}}
+
::{{Par2|ps_range|int[]|[4,4,4]}}
 
::: Length of the side of the search neighborhood for predictive-search block-matching, valid range [1, +inf).
 
::: Length of the side of the search neighborhood for predictive-search block-matching, valid range [1, +inf).
 +
<br>
 +
::<span style="color:red">'''Note:'''</span> parameters {{Template:FuncDef2|sigma}}, {{Template:FuncDef2|block_step}},  {{Template:FuncDef2|bm_range}}, {{Template:FuncDef2|ps_num}}, and {{Template:FuncDef2|ps_range}} are arrays. If {{Template:FuncDef3|chroma}} is set to <code>True</code>, only the first value is in effect. Otherwise an array of values may be specified for each plane (except {{Template:FuncDef2|radius}}).
 
<br>
 
<br>
 
::{{Par2|chroma|bool|false}}
 
::{{Par2|chroma|bool|false}}
Line 97: Line 99:
 
== Changelog ==
 
== Changelog ==
 
  Version      Date            Changes<br>
 
  Version      Date            Changes<br>
 +
test10      2023/01/25      - update to cuda 12
 +
                              - bug fixes
 +
                              - add support for Ada Lovelace
 +
                              - remove support for Kepler and x86
 +
test9        2022/07/15      - fix temporal padding (x86 build is deprecating)
 +
test8        2022/02/15      - remove avx
 +
test7        2022/02/14      - fix performance regression introduced in test6
 +
                              - restore avx on win64 and msvc rt dynamic linking)
 +
test6        2022/02/14      - add support for cc 3.5, links to the static msvc rt, remove avx)
 +
                              - (don't use; accidentally build with debug config)
 +
test5        2021/10/16      - bm_range now defaults to 9 instead of 8, fixes parameter check
 +
test4        2021/09/08      - fix array parameter
 
  test3        2021/08/06      - Separates VAggregate and compiles for AVX
 
  test3        2021/08/06      - Separates VAggregate and compiles for AVX
 
  test2        2021/08/01      - CPU version
 
  test2        2021/08/01      - CPU version

Latest revision as of 23:20, 30 January 2023

Abstract
Author WolframRhodium
Version test10 (58dbc0a)
Download BM3DCUDA_AVS-test10.zip
Category Denoisers
License GPLv2
Discussion Doom9 Forum

Contents

[edit] Description

BM3D denoising filter for AviSynth+, implemented in CUDA. Also includes a cpu version implemented in AVX and AVX2 intrinsics that serves as a reference implementation on CPU. However, bitwise identical outputs are not guaranteed across CPU and CUDA implementations.

[edit] Requirements

  • CPU with AVX support (AVX2 required for BM3D_CPU).
  • CUDA-enabled GPU(s) of compute capability 5.0 or higher (Maxwell+).
  • GPU driver 450 or newer.


[edit] Syntax and Parameters

BM3D_CPU (clip, clip "ref", float[] "sigma", int[] "block_step", int[] "bm_range", int "radius", int[] "ps_num", int[] "ps_range", bool "chroma")

BM3D_CUDA (clip, clip "ref", float[] "sigma", int[] "block_step", int[] "bm_range", int "radius", int[] "ps_num", int[] "ps_range", bool "chroma", int "device_id", bool "fast", int "extractor_exp")


clip   =
The input clip. Must be planar 32 bit float format. Each plane is denoised separately if chroma is set to False.


clip  ref =
The reference clip. Must be of the same format, width, height, number of frames as clip.


float[]  sigma = [3.0,3.0,3.0]
The strength of denoising.
The strength is similar (but not strictly equal) as VapourSynth-BM3D due to differences in implementation. (coefficient normalization is not implemented, for example).


int[]  block_step = [8,8,8]
Sliding step to process every next reference block, valid range [1-8].
Total number of reference blocks to be processed can be calculated approximately by (width / block_step) * (height / block_step).
Smaller step results in processing more reference blocks, and is slower.


int[]  bm_range = [9,9,9]
Length of the side of the search neighborhood for block-matching.
The size of search window is (bm_range * 2 + 1) x (bm_range * 2 + 1).
Larger is slower, with more chances to find similar patches.


int  radius = 0
The temporal radius for denoising, valid range [1, 16].
For each processed frame, (radius * 2 + 1) frames will be requested, and the filtering result will be returned to these frames by BM3D_VAggregate.
Increasing radius only increases tiny computational cost in block-matching and aggregation, and will not affect collaborative filtering, but the memory consumption can grow quadratically.
Thus, feel free to use large radius as long as your RAM is large enough :D


int[]  ps_num = [2,2,2]
The number of matched locations used for predictive search, valid range [1-8].
Larger value increases the possibility to match more similar blocks, with tiny increasing in computational cost. But in the original MATLAB implementation of V-BM3D, it's fixed to 2 for all profiles except "lc", perhaps larger value is not always good for quality?


int[]  ps_range = [4,4,4]
Length of the side of the search neighborhood for predictive-search block-matching, valid range [1, +inf).


Note: parameters sigma, block_step, bm_range, ps_num, and ps_range are arrays. If chroma is set to True, only the first value is in effect. Otherwise an array of values may be specified for each plane (except radius).


bool  chroma = false
CBM3D algorithm. Input clip must be of YUV444PS format.
Y channel is used in block-matching of chroma channels.


int  device_id = 0
Set GPU to be used.


bool  fast = true
Multi-threaded copy between CPU and GPU at the expense of 4x memory consumption.


int  extractor_exp = 0
Used for deterministic (bitwise) output. This parameter is not present in the cpu version since the implementation always produces deterministic output.
Pre-rounding is employed for associative floating-point summation.
The value should be a positive integer not less than 3, and may need to be higher depending on the source video and filter parameters.


[edit] BM3D_VAggregate

BM3D_VAggregate should be called after temporal filtering.

BM3D_VAggregate (clip, int "radius")

clip   =
The input clip. Must be of 32 bit float format.


int  radius = 0
Same as BM3D.


[edit] Examples

DGSource("sample.dgi")
ConvertBits(bits=32)
BM3D_CUDA(sigma=0.5, radius=2)
BM3D_VAggregate(radius=2)
ConvertBits(bits=16)


[edit] Changelog

Version      Date            Changes
test10 2023/01/25 - update to cuda 12 - bug fixes - add support for Ada Lovelace - remove support for Kepler and x86 test9 2022/07/15 - fix temporal padding (x86 build is deprecating) test8 2022/02/15 - remove avx test7 2022/02/14 - fix performance regression introduced in test6 - restore avx on win64 and msvc rt dynamic linking) test6 2022/02/14 - add support for cc 3.5, links to the static msvc rt, remove avx) - (don't use; accidentally build with debug config) test5 2021/10/16 - bm_range now defaults to 9 instead of 8, fixes parameter check test4 2021/09/08 - fix array parameter test3 2021/08/06 - Separates VAggregate and compiles for AVX test2 2021/08/01 - CPU version test1 2021/07/25 - Initial release


[edit] External Links

  • GitHub - Source code repository.




Back to External Filters

Personal tools