Avisynthplus/Developers
AviSynth+ |
---|
Contents |
AviSynth Plugin Writing Tips
#1: Exceptions
- Source: Doom9 Forum
Exceptions thrown from a module should only be caught in the same module. Otherwise you can experience weird and hard-to-debug errors in the plugin. Not adhering to this advice will result in code that can sporadically fail, or work on your computer consistently but fail on other machines.
Unfortunately, avisynth.h contains the AviSynthError class, giving plugin authors the false impression that it is safe to throw and catch these exception objects. It is not. The problem is not in the definition of this class, but in the implicit encouragement to throw C++ exceptions across DLL boundaries. Here are some tips to avoid getting caught in the deepest pits of hell:
- When throwing exceptions on your own, it is best not to use AviSynthError. Not using it will stop you thinking that AviSynthError has some special meaning, or that it can be used to throw to (or to catch from) avisynth.dll.
- Exceptions thrown by you should always be caught inside your plugin. You should not let exceptions propagate outside of your DLL (unless thrown using ThrowError), to AviSynth.
- Errors thrown by Avisynth should not be caught by you. In specific, don't wrap calls to AviSynth in try-catch blocks, because you cannot rely on it working correctly in every situation. If you need to detect errors, validate user parameters in your plugin, or use other API facilities provided by AviSynth, like IScriptEnvironment->FunctionExists().
- If you want to throw an exception to the user and/or to AviSynth, then only use IScriptEnvironment->ThrowError(). You should not call C++'s "throw" yourself for this purpose (see 2. point), and you should not catch the error thrown by ThrowError() yourself (see 3. point).
- If you want to catch an exception, want to do something based on that and finally raise an exception to AviSynth, don't rethrow. Catch your own exception (unless thrown by ThrowError), then call ThrowError separately.
Ignoring the above tips can still result in a fully working binary, but that is only guaranteed under very specific circumstances, more specifically when you've compiled your plugin with the *exact* same compiler version as the avisynth.dll was compiled with, AND when linking to the CRT runtime dynamically. Given that plugin authors can use whatever compiler they want, and that an avisynth binary can be supplied by any community member, it is unwise to rely on such detail.
These tips apply to all AviSynth versions (e.g. to 2.5 and to 2.6, to "classic" AviSynth and to AviSynth+, etc).
#2: Parallel execution
- Source: Doom9 forum
AviSynth-MT and VapourSynth both support multithreading, and it is being implemented in AviSynth+ too. All of them require the same things from your plugin. Here is a list of what you as a plugin author can do to support execution on multiple threads. If you are the author of any AviSynth plugin, please update your filter according to these rules if needed. Doing so will make sure your plugin can execute seamlessly when multithreaded. Furthermore, following these rules will not only guarantee correct execution in multihtreaded environments, it will also provide optimal mulithreaded performance.
Short list for those on the run:
- Unless you have the slowest filter in the world, don't start threads in your plugin.
- Never use global or static variables. In addition, your filter class should only have read-only members which are initialized during construction.
- Don't reuse the IScriptEnvionment pointer between method executions.
And again, the same points with a bit of more explanation:
- In general, do not slice up your frame and start multiple threads on your own. Threading has its own performance overhead, and it is only worth doing it manually if your filter takes a lot of time to execute. And even if your filter is extremely slow (like fft3dfilter), you should try to optimize its single-threaded performance (by using SIMD instructions or choosing a more efficient algorithm) rather then manually threading it. Optimize for single-threaded performance, and as long as you follow the other rules below, you will get automatic and correct multithreading from AviSynth.
- Do not cache frames yourself. You might think it is efficient because you won't have to request/compute them in the next frame, but you are wrong, and there are several reasons why. First, if you write your own cache, you will have to introduce a global state to your filter, which means you will have to take care of synchronization between multiple threads too, which is not easy to do efficiently with caches. Second, keeping copies of past frames also means there will always be multiple references to them, thus AviSynth cannot pass them as write pointers to other filters, and will have to do an extra copy of it more often. And last but not least, AviSynth has a very extensive caching mechanism, and if you request the same frame multiple times (even when you need it for different frame requests), chances are you will get it for free anyway, so your own caching is just pure overhead.
- As a general extension to the previous rule, try not to keep any state between frames. In the optimal case your filter class should only have read-only members which are initialized during construction. Surely this is not always possible with every algorithm, but most times it is, and this is what you should strive for.
- As stated before, for best multithreading always try to implement algorithms which require no state between frames. Whenever this is violated, be sure to group reads and writes to the state (do not spread them), and guard them in critical sections (as few and as short as possible). For example, it is a good practice to copy all your writable class variables (in a critical section) at the start of each frame into local stack variables, compute the whole frame outside of the critical section (updating the local variables that captured the global state as needed), then write them back together at the end of your frame in another critical section. Do not request an automatic lock around your whole filter from AviSynth, because it will serialize your filter's execution.
- If you have class variables that must be writable in every frame, you will also have to keep in mind that AviSynth does not guarantee that frames will be processed in their natural order. Just a reminder.
- Only store variables in classes and in method stacks. Per-frame heap allocations should be avoided, because they can act as implicit synchronization points between threads. And most importantly, never store anything in static variables or in the (global) namespace scope. Read the previous sentence a few more times.
- Do not store the IScriptEnvironment pointer anywhere yourself (except locally on the stack), and never reuse those pointers outside of the methods where they were supplied to you. Not even between different executions of the same method! There is a reason why you get that pointer separately for each method, which is that it may be different every time, especially in multithreaded scenarios. If you reuse it, the consequences will be different between every implementation, but you can get anything from race conditions to program crashes.
Choosing your AviSynth header
- Source: Doom9 Forum
So you are writing your own AviSynth plugin (cool!), and obviously one of the first things you have to do in your code is to include the AviSynth header. But which one? With all the different header variants lying around it is easy to get lost if you haven't been following AviSynth's development for a long time. Should you copy the header from another plugin? Should you copy it from the AviSynth64 project to be 64-bits compatible? Should you take the 2.5 header as it is the latest release that is officially stable? Do you need SEt's AviSynth-MT header if you want multithreading compatibility? Do you need separate headers for 32- and 64-bits like most plugins ship it? Should you just take the latest header from the AviSynth 2.6 project? And what about AviSynth+'s header?
Fortunately, no matter how you answer the above questions, there is one (and just one) solution that is easy to implement and fits all needs: Use AviSynth+'s header. And if you'd like to know why, read on.
So let's tackle the above questions.
Should you copy the header from another plugin?
No. Most plugins are older then AviSynth project releases, and so they ship with outdated (and sometimes buggy) headers. Also, some plugins have both separate 32- and 64-bit sources, so you still wouldn't know which one to take. And if you are really unlucky, you might stumble on a plugin that was written for AviSynth 2.5, and using that header would be the worst of all your header-related options.
Should you take the 2.5 header as it is the latest release that is officially stable?
No. 2.5 is no more. Most plugins that have originally been written for 2.5 have been already recompiled for 2.6. Don't try to be smart and support both versions, because they are not compatible. 2.6 has been around for many years now, and the existing plugin ecosystem builds extensively around this version. Technically speaking, it is stable. Nobody uses 2.5 any more.
Should you copy it from the AviSynth64 project to be 64-bits compatible?
No. While AviSynth64's header will work perfectly if you want your plugin to *only* run in 64-bit mode, that is most likely not the case. That project isn't maintained any more, and thanks to that the 32-bit part is out of date.
Do you need seperate headers for 32- and 64-bits like most plugins ship it?
No. You will see plugins around that have both avisynth.h and avisynth64.h. Same for many applications hosting avisynth.dll. This is because the original AviSynth project never supported 64-bit processing (not even today), so these other projects took the 32-bit header from the latest AviSynth version that was available when they were created, and they took the 64-bit header from the AviSynth64 project. This resulted in an ecosystem where the 64-bit versions didn't see any improvements over the years. On the upside, avisynth64.h stayed stable. On the downside, the 32-bit and 64-bit headers started drifting apart. Nevertheless, a merge of the avisynth.h and avisynth64.h headers is easily possible, which is exactly what AviSynth+ has done. There is no need for two separate headers, it only results in additional code, complexity, and maintenance burden.
Do you need AviSynth-MT's header if you want multithreading compatibility?
No. While properly supporting multithreaded versions does require special coding considerations from plugin writers (see parallel execution), none of those considerations affect the choice of header. There is no API or ABI difference between multi-threaded and single-threaded AviSynth versions. You can perfectly support MT-capable AviSynth versions even if using the header from an AviSynth variant that has no MT-support.
Should you just take the latest header from the AviSynth 2.6 project?
No. This project (sometimes people refer to it as the "original" or "official" AviSynth, though somewhat incorrect) always has the latest version, but it will do you no good if you want to support 64-bit processing. You cannot compile your plugin using its header in 64-bit mode, which is why people started using avisynth64.h in the first place. Even if it decided to support 64-bit in the future, it wouldn't be compatible to the existing (and pretty large) 64-bit ecosystem anymore, throwing away all the 64-bit plugin and application development that has been done in the past 6 years or so. And as already said, using two separate headers is completely unnecessary and only leads to additional complications down the road.
What about AviSynth+'s header?
The headers of AviSynth+ are up to date in every aspect and provide the greatest possible compatibility. By using AviSynth+'s headers, applications and plugins can cleanly compile and run in 32-bits and 64-bits. It is 100% compatible to the latest 32-bit development on the original AviSynth 2.6 project, while supporting all 64-bit binaries. And of course, you can use it regardless if you support multithreading or not. Furthermore and importantly, it is fully compatible to installations of the AviSynth 2.6, AviSynth-MT, AviSynth64, and of course the AviSynth+ projects, so your plugin/application will be able to run on any user's machine.
Avisynth+'s header defines all the new high bit-depth and alpha-aware colorspaces introduced in the second half of 2016. There are also supporting functions for high bit depth, such as BitsPerComponent, Is444, IsY. These functions are non-existant in 8 bit-only Avisynth versions, but instead of giving an exception on older systems, they silently fallback to a 8 bit function. For example IsY will automatically call IsY8, Is444 is mapped to IsYV24. Other functions which have no similar counterparts on 8 bit systems, will return a default value (such as BitsPerComponent will return 8). In this way plugin authors can use the new VideoInfo helper function without any concern, regardless of possible Avisynth target version, the code will work both with high bit depth Avisynth installations and with classic 8 bit versions as well.
It is not only the avisynth.h which can be useful to plugin writers. cpuid.h defines new processor type constants, AVX, AVX2, FMA3, FMA4, F16C, and even AVX512 versions. The default alignment in Avisynth+ is 32 bytes (this could be changed later to 64 bytes), plugin writers can be sure that their AVX or AVX2 code will not fail since they get the proper alignment requirement.
Writing better AviSynth plugins
By tp7
Lately I’ve been doing a lot of AviSynth-related development, mostly improving older plugins and making them available on x64. I’ll try to give some tips to fellow AviSynth devs, hopefully helping them improving the quality of their plugins and maintainability of their codebase. Without the further ado, let’s begin.
Stop YUY2
Currently there are five types of YUY2 support in AviSynth world:
- Convert YUY2 frame to planar, process it and convert back. This is one of the most common solutions and it was pretty much the only option in AviSynth 2.5. Example: deblock.
- Use a specific YUY2 path but leave it completely unoptimized and possibly broken because well, “no one uses YUY2″. Example: msharpen.
- Convert both YUY2 and planar to some intermediate format and use the same set of routines to process both. Possible example: ttempsmooths (I’m not done reading its code so I might be wrong here, this way still might be used somewhere).
- Support YUY2 only, optimized. Example: layer (AviSynth core).
- Have separate code paths for both YUY2 and planar, both optimized. Examples: some filters in the AviSynth core I don’t remember.
Most YUY2-filters fall either into the first two categories. And actually, if you think about it – none of these options are good. (1, 3) waste memory and time for conversion between planar and interleaved formats, (2) requires you to maintain two code paths but with assumptions that no one will be using the second one (why bother with it at all then?), (4) doesn’t support planar and (5) takes a lot of effort.
So what do? The answer is simple: let it go. In AviSynth 2.6 (and AviSynth+) there’s a new colorspace, called YV16, which is the same YUY2 format except planar. So you can process it with the same planar routines you’re using for YV12, Y8 and YV24. Zero effort on your side. And users? They’ll be calling ConvertToYV16().a_lot_of_filters().ConvertToYUY2() if they have yuy2 source and want to keep it. You don’t waste a lot of memory and time on converting in each filter in between these convert calls and AviSynth built-in YUY2<->YV16 conversion is very fast (optimized up to SSSE3 in AviSynth+ I think). Not supporting YUY2 is, in most cases, better for them too.
This somewhat applies to RGB too, since now there are planar RGB/RGBA formats. RGB24, RGB32, RGB48 and RGB64 can be losslessly converted to their planar RGB equivalents. While old packed RGB formats only support 8 or 16 bits, planar RGB equivalents support 8, 10, 12, 14, 16 bit and 32 bit float formats.
Stop C++
Okay, this might sounds a bit strange, considering I’m one of the people who don’t understand why people write C when there’s C++. No, I’m not suggesting you to write plain C, but rather restrict C++ things you’re using. There are tons of guides on this question, just look around for some. The most important issue I have with it: stop overusing member functions. They’re terrible – they allow you to use any variable in the same class, dramatically increasing the scope size.
Imagine you see prepare_buffer(src_frame, buffer)
inside GetFrame
, where src_frame
is PVideoFrame
and buffer
is a raw uint8_t pointer
. If prepare_buffer is a free function outside of the class, you can assume that it just takes the frame and writes it to the pre-allocated buffer in some way. If this is a member function, you can’t assume anything. Does it modify any class variables? Does it depend on these class variables having some value? You have no way of knowing this and need to go and inspect the function code. In large codebases it instantly makes the code a lot harder to understand.
Another, although not so important issue, is taking pointers to these functions. I usually template the same function for different instruction set and store a pointer to in as a class variable, doing dynamic dispatch in constructor and calling this function through a pointer in GetFrame, and doing this with member functions is a lot harder. Also, having free functions simplifies porting to other codebases and frameservers with only C api, e.g. VapourSynth. Of course it’s possible to migrate the whole class, but why?
Stop implementing memcpy
This is probably not so relevant for newer plugins but you can see this a lot in older ones. A single routine called memcpy_amd being copypasted across many plugins. The purpose of this was to copy frames faster compared to memcpy and built-in BitBlt methods. Yeah it probably wasn’t such a bad idea some years ago. Does it make sense now? Not at all.
Current memcpy in MS runtime is optimized for SSE2. This is still somewhat slower than the old memcpy_amd routine in some cases but it’s fast enough. Unless copying frames is all your filter is doing, you aren’t gonna notice the difference. And if you do, there’s a better approach – env->BitBlt. This functions uses memcpy_amd internally if certain conditions are met and if they don’t, fallbacks to default memcpy. Of course this is an implementation detail and you should not depend on it, but it’s reasonable to assume BitBlt won’t get any slower in the future.
Unfortunately, BitBlt is not able to use the most efficient memcpy_amd routine every time. For it to be used, passed parameters should meet a simple condition: dst_pitch, src_pitch and width should be equal. Obviously you can just ensure this condition is met on your side if you always want to use the most efficient copying method. Ultim has this crazy idea to define BitBlt as a function that always uses memcpy_amd tier routine, but this idea is quite bad and might not get implemented in AviSynth+ at all.
Stop copypasting code for different planes
DegrainMedian seems to be the most severe example of this: here’s a part of its GetFrame method, defined in degrainmedian.cpp. You can see that the very same code with some changes is copypasted for three planes. The same kind of code with minimal changes is copypasted for progressive routines. Makes you wonder what kind of programmers write the plugins you use daily.
What’s a better way to do this? First of all, you can process all planes in a loop. Generic version looks somewhat like this:
const static int planes[] = { PLANAR_Y, PLANAR_U, PLANAR_V }; for (int pid = 0; pid < (vi.IsY8() ? 1 : 3); pid++) { int plane = planes[pid]; int width = dst->GetRowSize(plane); //more code }
or in high bit depth Avisynth+
const int planesYUV[4] = { PLANAR_Y, PLANAR_U, PLANAR_V, PLANAR_A}; const int planesRGB[4] = { PLANAR_G, PLANAR_B, PLANAR_R, PLANAR_A}; const int *planes = vi.IsYUV() || vi.IsYUVA() ? planesYUV : planesRGB; for (int pid = 0; pid < vi.NumComponents(); pid++) { int plane = planes[pid]; int rowsize = dst->GetRowSize(plane); int width = rowsize / vi.ComponentSize(); //more code }
This handles all existing planar colorspaces, which can have either one, three or four (with alpha) planes, and works for planar RGB/RGBA. Inside the loop body, variable plane will have the value of the current plane, e.g. PLANAR_Y, so you can use it in calls to AviSynth API as usual.
One more thing about that DegrainMedian code – there’s no point in doing dispatching at every frame in a huge if-else block. Since parameters don’t change during processing, you can either do it in constructor the way it is now, or use a simple lookup table of processors, automatically selecting it based on provided parameters (which can be used as array indexes). This improves readability a lot, making your code more declarative. You can check Fog’s C++ guide for some additional info on dispatching (and a lot of other very useful things).
Don’t be afraid of alloca
Yes, I know this is a bad programming practice, but well, most of video processing is one huge bad programming practice. The point of alloca is to do practically free memory allocations on stack, in cases where you’d usually use a static array but you don’t know the size at advance and you don’t want waste some memory by preallocating a static array with maximum allowed size. Example usage: Average plugin. In this case you could replace it with preallocated buffer but there are cases when it’s harder to do – for example, SangNom2 uses it to store a line buffer (one full line of video, which is smaller than stack frame size for any reasonable resolution). Using stack allocation you can avoid costly memory allocating with new/delete on every frame while keeping your GetFrame re-entrant.
But there are some pitfalls with alloca. First, you should never use it after the function it was created in returns as the pointer won’t be valid anymore. You also have to call destructors yourself because there’s no delete[] call. And you have to zero allocated memory if you ever store PVideoFrame in it because when you write something like
memory[i] = child->GetFrame(n, env);
destructor of the frame in memory[i] will be called. Inside this destructor, PVideoFrame tries to call VideoFrame’s Release method if the pointer to it is not null, which will fail because said pointer points to garbage instead of a real VideoFrame. Calling memset on this memory prevents this. Also you should never make any assumptions on alloca alignment, so it’s probably better to avoid using aligned loads in SIMD when working with it (vc110 seems to return at least 16-byte aligned memory though).
In general you should prefer static arrays over alloca just because it’s a bit simper and also handles contstuction/destruction problems automatically. I’m probably overusing it a bit. Still, I consider using it a better practice than doing a heap allocation on every frame.
This is it. There are more suggestions left like “stop using VC6″, “stop static linking” and “drop MMX” but those are obvious anyway.
Useful links
What is the "v8 interface"?
- Source: Doom9 Forum
- Date: 14th May 2020
Changes That Affect Future Plugins
- Source: Doom9 Forum
- Date: 21st May 2020
- Also see this discussion on Doom9 (read surrounding post).