BM3DCUDA
(BM3DCUDA) |
(test10) |
||
(2 intermediate revisions by one user not shown) | |||
Line 2: | Line 2: | ||
{{Filter3 | {{Filter3 | ||
|1=[https://github.com/WolframRhodium WolframRhodium] | |1=[https://github.com/WolframRhodium WolframRhodium] | ||
− | |2= | + | |2=test10 (58dbc0a) |
− | |3=[https://github.com/WolframRhodium/VapourSynth-BM3DCUDA/issues/7 BM3DCUDA_AVS- | + | |3=[https://github.com/WolframRhodium/VapourSynth-BM3DCUDA/issues/7 BM3DCUDA_AVS-test10.zip] |
|4=Denoisers | |4=Denoisers | ||
|5=[https://www.gnu.org/licenses/gpl-2.0.txt GPLv2] | |5=[https://www.gnu.org/licenses/gpl-2.0.txt GPLv2] | ||
Line 14: | Line 14: | ||
<br> | <br> | ||
== Requirements == | == Requirements == | ||
− | * | + | * [x64]: [[AviSynth+]] |
* Supported color formats: 32-bit Y/YUV/RGB [[planar]] | * Supported color formats: 32-bit Y/YUV/RGB [[planar]] | ||
− | + | ||
*CPU with [[AVX]] support ([[AVX2]] required for <code>BM3D_CPU</code>). | *CPU with [[AVX]] support ([[AVX2]] required for <code>BM3D_CPU</code>). | ||
*CUDA-enabled GPU(s) of [https://developer.nvidia.com/cuda-gpus compute capability] 5.0 or higher (Maxwell+). | *CUDA-enabled GPU(s) of [https://developer.nvidia.com/cuda-gpus compute capability] 5.0 or higher (Maxwell+). | ||
Line 23: | Line 23: | ||
== [[Script variables|Syntax and Parameters]] == | == [[Script variables|Syntax and Parameters]] == | ||
− | {{Template:FuncDef|BM3D_CPU (clip, clip "ref", float "sigma", int "block_step", int "bm_range", int "radius", int "ps_num", int "ps_range", bool "chroma")}} | + | {{Template:FuncDef|BM3D_CPU (clip, clip "ref", float[] "sigma", int[] "block_step", int[] "bm_range", int "radius", int[] "ps_num", int[] "ps_range", bool "chroma")}} |
− | {{Template:FuncDef|BM3D_CUDA (clip, clip "ref", float "sigma", int "block_step", int "bm_range", int "radius", int "ps_num", int "ps_range", bool "chroma", int "device_id", bool "fast", int "extractor_exp")}} | + | {{Template:FuncDef|BM3D_CUDA (clip, clip "ref", float[] "sigma", int[] "block_step", int[] "bm_range", int "radius", int[] "ps_num", int[] "ps_range", bool "chroma", int "device_id", bool "fast", int "extractor_exp")}} |
<br> | <br> | ||
Line 34: | Line 34: | ||
:::The reference clip. Must be of the same format, width, height, number of frames as clip. | :::The reference clip. Must be of the same format, width, height, number of frames as clip. | ||
<br> | <br> | ||
− | ::{{Par2|sigma|float|3.0}} | + | ::{{Par2|sigma|float[]|[3.0,3.0,3.0]}} |
:::The strength of denoising. | :::The strength of denoising. | ||
:::The strength is similar (but not strictly equal) as VapourSynth-BM3D due to differences in implementation. (coefficient normalization is not implemented, for example). | :::The strength is similar (but not strictly equal) as VapourSynth-BM3D due to differences in implementation. (coefficient normalization is not implemented, for example). | ||
<br> | <br> | ||
− | ::{{Par2|block_step|int|8 | + | ::{{Par2|block_step|int[]|[8,8,8]}} |
:::Sliding step to process every next reference block, valid range [1-8]. | :::Sliding step to process every next reference block, valid range [1-8]. | ||
:::Total number of reference blocks to be processed can be calculated approximately by (width / block_step) * (height / block_step). | :::Total number of reference blocks to be processed can be calculated approximately by (width / block_step) * (height / block_step). | ||
:::Smaller step results in processing more reference blocks, and is slower. | :::Smaller step results in processing more reference blocks, and is slower. | ||
<br> | <br> | ||
− | ::{{Par2|bm_range|int|9}} | + | ::{{Par2|bm_range|int[]|[9,9,9]}} |
− | :::Length of the side of the search neighborhood for block-matching | + | :::Length of the side of the search neighborhood for block-matching. |
:::The size of search window is (bm_range * 2 + 1) x (bm_range * 2 + 1). | :::The size of search window is (bm_range * 2 + 1) x (bm_range * 2 + 1). | ||
:::Larger is slower, with more chances to find similar patches. | :::Larger is slower, with more chances to find similar patches. | ||
Line 54: | Line 54: | ||
:::Thus, feel free to use large radius as long as your RAM is large enough :D | :::Thus, feel free to use large radius as long as your RAM is large enough :D | ||
<br> | <br> | ||
− | ::{{Par2|ps_num|int|2}} | + | ::{{Par2|ps_num|int[]|[2,2,2]}} |
:::The number of matched locations used for predictive search, valid range [1-8]. | :::The number of matched locations used for predictive search, valid range [1-8]. | ||
:::Larger value increases the possibility to match more similar blocks, with tiny increasing in computational cost. But in the original MATLAB implementation of V-BM3D, it's fixed to 2 for all profiles except "lc", perhaps larger value is not always good for quality? | :::Larger value increases the possibility to match more similar blocks, with tiny increasing in computational cost. But in the original MATLAB implementation of V-BM3D, it's fixed to 2 for all profiles except "lc", perhaps larger value is not always good for quality? | ||
<br> | <br> | ||
− | ::{{Par2|ps_range|int|4}} | + | ::{{Par2|ps_range|int[]|[4,4,4]}} |
::: Length of the side of the search neighborhood for predictive-search block-matching, valid range [1, +inf). | ::: Length of the side of the search neighborhood for predictive-search block-matching, valid range [1, +inf). | ||
+ | <br> | ||
+ | ::<span style="color:red">'''Note:'''</span> parameters {{Template:FuncDef2|sigma}}, {{Template:FuncDef2|block_step}}, {{Template:FuncDef2|bm_range}}, {{Template:FuncDef2|ps_num}}, and {{Template:FuncDef2|ps_range}} are arrays. If {{Template:FuncDef3|chroma}} is set to <code>True</code>, only the first value is in effect. Otherwise an array of values may be specified for each plane (except {{Template:FuncDef2|radius}}). | ||
<br> | <br> | ||
::{{Par2|chroma|bool|false}} | ::{{Par2|chroma|bool|false}} | ||
Line 97: | Line 99: | ||
== Changelog == | == Changelog == | ||
Version Date Changes<br> | Version Date Changes<br> | ||
+ | test10 2023/01/25 - update to cuda 12 | ||
+ | - bug fixes | ||
+ | - add support for Ada Lovelace | ||
+ | - remove support for Kepler and x86 | ||
+ | test9 2022/07/15 - fix temporal padding (x86 build is deprecating) | ||
+ | test8 2022/02/15 - remove avx | ||
+ | test7 2022/02/14 - fix performance regression introduced in test6 | ||
+ | - restore avx on win64 and msvc rt dynamic linking) | ||
+ | test6 2022/02/14 - add support for cc 3.5, links to the static msvc rt, remove avx) | ||
+ | - (don't use; accidentally build with debug config) | ||
+ | test5 2021/10/16 - bm_range now defaults to 9 instead of 8, fixes parameter check | ||
+ | test4 2021/09/08 - fix array parameter | ||
test3 2021/08/06 - Separates VAggregate and compiles for AVX | test3 2021/08/06 - Separates VAggregate and compiles for AVX | ||
test2 2021/08/01 - CPU version | test2 2021/08/01 - CPU version |
Latest revision as of 23:20, 30 January 2023
Abstract | |
---|---|
Author | WolframRhodium |
Version | test10 (58dbc0a) |
Download | BM3DCUDA_AVS-test10.zip |
Category | Denoisers |
License | GPLv2 |
Discussion | Doom9 Forum |
Contents |
[edit] Description
BM3D denoising filter for AviSynth+, implemented in CUDA. Also includes a cpu version implemented in AVX and AVX2 intrinsics that serves as a reference implementation on CPU. However, bitwise identical outputs are not guaranteed across CPU and CUDA implementations.
[edit] Requirements
- CPU with AVX support (AVX2 required for
BM3D_CPU
). - CUDA-enabled GPU(s) of compute capability 5.0 or higher (Maxwell+).
- GPU driver 450 or newer.
[edit] Syntax and Parameters
BM3D_CPU (clip, clip "ref", float[] "sigma", int[] "block_step", int[] "bm_range", int "radius", int[] "ps_num", int[] "ps_range", bool "chroma")
BM3D_CUDA (clip, clip "ref", float[] "sigma", int[] "block_step", int[] "bm_range", int "radius", int[] "ps_num", int[] "ps_range", bool "chroma", int "device_id", bool "fast", int "extractor_exp")
- clip =
- The input clip. Must be planar 32 bit float format. Each plane is denoised separately if
chroma
is set to False.
- The input clip. Must be planar 32 bit float format. Each plane is denoised separately if
- clip =
- clip ref =
- The reference clip. Must be of the same format, width, height, number of frames as clip.
- clip ref =
- float[] sigma = [3.0,3.0,3.0]
- The strength of denoising.
- The strength is similar (but not strictly equal) as VapourSynth-BM3D due to differences in implementation. (coefficient normalization is not implemented, for example).
- float[] sigma = [3.0,3.0,3.0]
- int[] block_step = [8,8,8]
- Sliding step to process every next reference block, valid range [1-8].
- Total number of reference blocks to be processed can be calculated approximately by (width / block_step) * (height / block_step).
- Smaller step results in processing more reference blocks, and is slower.
- int[] block_step = [8,8,8]
- int[] bm_range = [9,9,9]
- Length of the side of the search neighborhood for block-matching.
- The size of search window is (bm_range * 2 + 1) x (bm_range * 2 + 1).
- Larger is slower, with more chances to find similar patches.
- int[] bm_range = [9,9,9]
- int radius = 0
- The temporal radius for denoising, valid range [1, 16].
- For each processed frame, (radius * 2 + 1) frames will be requested, and the filtering result will be returned to these frames by BM3D_VAggregate.
- Increasing radius only increases tiny computational cost in block-matching and aggregation, and will not affect collaborative filtering, but the memory consumption can grow quadratically.
- Thus, feel free to use large radius as long as your RAM is large enough :D
- int radius = 0
- int[] ps_num = [2,2,2]
- The number of matched locations used for predictive search, valid range [1-8].
- Larger value increases the possibility to match more similar blocks, with tiny increasing in computational cost. But in the original MATLAB implementation of V-BM3D, it's fixed to 2 for all profiles except "lc", perhaps larger value is not always good for quality?
- int[] ps_num = [2,2,2]
- int[] ps_range = [4,4,4]
- Length of the side of the search neighborhood for predictive-search block-matching, valid range [1, +inf).
- int[] ps_range = [4,4,4]
- Note: parameters
sigma
,block_step
,bm_range
,ps_num
, andps_range
are arrays. If chroma is set toTrue
, only the first value is in effect. Otherwise an array of values may be specified for each plane (exceptradius
).
- Note: parameters
- bool chroma = false
- CBM3D algorithm. Input clip must be of YUV444PS format.
- Y channel is used in block-matching of chroma channels.
- bool chroma = false
- int device_id = 0
- Set GPU to be used.
- int device_id = 0
- bool fast = true
- Multi-threaded copy between CPU and GPU at the expense of 4x memory consumption.
- bool fast = true
- int extractor_exp = 0
- Used for deterministic (bitwise) output. This parameter is not present in the cpu version since the implementation always produces deterministic output.
- Pre-rounding is employed for associative floating-point summation.
- The value should be a positive integer not less than 3, and may need to be higher depending on the source video and filter parameters.
- int extractor_exp = 0
[edit] BM3D_VAggregate
BM3D_VAggregate should be called after temporal filtering.
BM3D_VAggregate (clip, int "radius")
- clip =
- The input clip. Must be of 32 bit float format.
- clip =
- int radius = 0
- Same as BM3D.
- int radius = 0
[edit] Examples
DGSource("sample.dgi") ConvertBits(bits=32) BM3D_CUDA(sigma=0.5, radius=2) BM3D_VAggregate(radius=2) ConvertBits(bits=16)
[edit] Changelog
Version Date Changes
test10 2023/01/25 - update to cuda 12 - bug fixes - add support for Ada Lovelace - remove support for Kepler and x86 test9 2022/07/15 - fix temporal padding (x86 build is deprecating) test8 2022/02/15 - remove avx test7 2022/02/14 - fix performance regression introduced in test6 - restore avx on win64 and msvc rt dynamic linking) test6 2022/02/14 - add support for cc 3.5, links to the static msvc rt, remove avx) - (don't use; accidentally build with debug config) test5 2021/10/16 - bm_range now defaults to 9 instead of 8, fixes parameter check test4 2021/09/08 - fix array parameter test3 2021/08/06 - Separates VAggregate and compiles for AVX test2 2021/08/01 - CPU version test1 2021/07/25 - Initial release
[edit] External Links
- GitHub - Source code repository.
Back to External Filters ←