SetFilterMTMode

From Avisynth wiki
(Difference between revisions)
Jump to: navigation, search
(OnCPU: Add content)
(OnCUDA: Add content)
Line 103: Line 103:
  
 
==== OnCUDA ====
 
==== OnCUDA ====
 +
:{{FuncDef|OnCUDA(clip, int “num_prefetch”, int “device_index”)}}
 +
<br>
 +
::{{Par2||clip|}}
 +
::This clip is processed by CUDA. In other words, the processing before this is processed by CUDA. A filter that does not support CUDA processing will result in an error. Currently, internal filters are rarely supported, so you can only use external filters that are specially made.
 +
<br>
 +
::{{Par2|num_prefetch|int|}}
 +
:::Same as OnCPU prefetch. Here you specify the number of frames to prefetch. About 2 will give you enough performance. Unlike Prefetch, it has only one thread because it is a prefetch for parallelizing processing on the GPU and CPU.
 +
:::default: 0
 +
<br>
 +
::{{Par2|device_index|int|}}
 +
:::Specifies the GPU to run. If you have only one GPU, you can only use 0. If you have two GPUs, you can specify 0 or 1. There is no limit on the number.
 +
:::default: 0
 +
<br>
 +
 +
Of course, valid only on Avisynth+ built with CUDA option and works if the system has proper device and driver combination.
  
 
=== Usage ===
 
=== Usage ===

Revision as of 16:04, 10 February 2021

By default, AviSynth+ is single-threaded. This page documents how to enable multi-threading.


Syntax and Parameters

SetFilterMTMode

SetFilterMTMode (string filter, int mode, bool "force")


string   =
filter: name of the filter you want to set an MT Mode for. You cannot set the MT mode on script function calls, only on binary filters.
"DEFAULT_MT_MODE", sets the default MT mode for all filters that do not have an MT mode explicitly set. Does not affect for source filters and filters that self-register their own MT mode.


int   =
Sets MT Mode, there are three basic MT modes (1,2,3) and an experimental workaround mode (4). Instead of the numbers, you can also use symbolic names for MT modes:
  • 1 : MT_NICE_FILTER
  • 2 : MT_MULTI_INSTANCE
  • 3 : MT_SERIALIZED
  • 4 : MT_SPECIAL_MT


bool  force = false
Force MT mode. Default is false


Prefetch

Prefetch (clip, int “threads”, int “frames”)


clip   =
Input clip.


int  threads =
Number of threads to use. If it is 0, it passes without doing anything.
default: (number of logical cores in the system) +1


int  frames =
Number of frames to prefetch. Again, if it is 0, it passes without doing anything.
default: threads * 2


In the original Avisynth+ (before v3.6), you could only use one Prefetch, but started from Neo fork you can use as many as you like. Also, a "frames" argument has been added to specify the number of frames to prefetch. Neo's multithreading enhancements were backported to classic Avisynth+ and are available since v3.6

Example: Pipeline parallelization

 Filtering A
 Prefetch(1,4)
 Filtering B
 Prefetch(1,4)
 Filtering C
 Prefetch(1,4)

Prefetch (1,4) makes one thread stand and read four frames ahead. In the above example, the filtering processes A, B, and C are executed in parallel in a pipeline. Since the number of threads of each Prefetch is arbitrary, for example, filter processing B is heavy, so if you want to increase the number of parallels by that amount, you can increase the number of threads as follows:

 Filtering A
 Prefetch(1,4)
 Filtering B
 Prefetch(4)
 Filtering C
 Prefetch(1,4)

OnCPU

OnCPU/OnCUDA (collectively called OnDevice)

Since 3.6 AvisynthNeo features were backported to Avisynth+. Such as supporting "devices", like CPU and CUDA (CUDA support needs special build, still experimental after v3.7). And the data transfer between them.

If all are valid, the chain will be as follows.

Upstream → Upstream cache → Thread → Transfer → Downstream cache → Downstream → is the flow of frame data (reverse of GetFrame call direction)

Number of prefetch frames

  • 0: Synchronous call without all cache
  • 1: Synchronous call, but only transfer is read ahead and executed asynchronously. Downstream cache is enabled.
  • 2 or more: Pre-read upstream processing using threads. Both upstream and downstream caches are valid.

The number of upstream threads is fixed at 1 thread when prefetch = 2 or more, and the number of prefetches is fixed at 2. The downstream look-ahead number is set to the specified prefetch sheet.


OnCPU(clip, int “num_prefetch”)


clip   =
This clip is processed by the CPU. In other words, the processing before this is processed by the CPU.


int  num_prefetch =
Here you specify the number of frames to prefetch. About 2 will give you enough performance. Unlike Prefetch, it has only one thread because it is a prefetch for parallelizing processing on the GPU and CPU.
default: 0


If 0 is specified, it will be a synchronous call without using threads.

OnCUDA

OnCUDA(clip, int “num_prefetch”, int “device_index”)


clip   =
This clip is processed by CUDA. In other words, the processing before this is processed by CUDA. A filter that does not support CUDA processing will result in an error. Currently, internal filters are rarely supported, so you can only use external filters that are specially made.


int  num_prefetch =
Same as OnCPU prefetch. Here you specify the number of frames to prefetch. About 2 will give you enough performance. Unlike Prefetch, it has only one thread because it is a prefetch for parallelizing processing on the GPU and CPU.
default: 0


int  device_index =
Specifies the GPU to run. If you have only one GPU, you can only use 0. If you have two GPUs, you can specify 0 or 1. There is no limit on the number.
default: 0


Of course, valid only on Avisynth+ built with CUDA option and works if the system has proper device and driver combination.

Usage

So, how to use MT in AviSynth+?
By default, your script will run in single-threaded mode, just like with SEt's build. Also, just like in SEt's build, you'll have to make sure that filters use the correct MT mode, or else they might wreak havoc. There are three basic MT modes (1,2,3) and an experimental workaround mode (4) since r2440, and modes 1-3 are the same modes as in (yeah you guessed correctly) SEt's build. Which means you can use the same modes that you have used with AviSynth-MT. There are some things though that are different and/or new in AviSynth+. The first difference is *how* you set the MT mode. In AviSynth-MT, you had to use SetMTMode(X), which caused all filters following that line to use mode X (until the next call to SetMTMode()). This meant if you needed to use multiple MT modes, you had to insert all those calls in the middle of your script, littered over many places.


Setting MT Modes

AviSynth+ does it differently. In AviSynth+, you specify the MT-mode for only specific filters, and those filters will then automatically use their own mode, even if there were other MT-modes in between. This means you can specify all the MT modes at the beginning without polluting your script. You can even make a SetMTMode.avsi if you wish and let it autoload for all of your scripts, or import() it from their top. This is much cleaner, and it allows you to maintain all your MT-modes centrally at a single place. To make this distinction clear from AviSynth+, SetMTMode() is called SetFilterMTMode() in AviSynth+.


Enabling MT

The other difference is how you actually enable multithreading. Calling SetFilterMTMode() is not enough, it sets the MT mode, but the MT mode only has an effect if MT is enabled at all. Note this means you can safely include/import/autoload your SetFilterMTMode() calls in even single-threaded scripts, and they will not be messed up. Uhm, onto the point: You enable MT by placing a single call to Prefetch(X) at the end of your script, where X is the number of threads to use. If there is a return statement in your script it must be placed after Prefetch().


Example

# This line causes all filters that don't have an MT mode explicitly use mode 2 by default.
# Mode 2 is a relatively safe choice until you don't know most of your calls to be either mode 1 or 3.
# Compared with mode 1, mode 2 trades memory for MT-safety, but only a select few filters will work with mode 1.
SetFilterMTMode("DEFAULT_MT_MODE", 2)
or
SetFilterMTMode("DEFAULT_MT_MODE", MT_MULTI_INSTANCE)

# FFVideoSource(), like most of all source filters, needs MT mode 3. 
# Note: starting  with AviSynth+ r2069, it will now automatically recognize source filters.
# If it sees a source filter which has no MT-mode specified at all, it will automatically use 
# mode 3 instead of the default MT mode.
SetFilterMTMode("FFVideoSource", 3)
or 
SetFilterMTMode("FFVideoSource", MT_SERIALIZED)

# Now comes your script as usual
FFVideoSource(...)
Trim(...)
QTGMC(...)
...

# Enable MT!
Prefetch(4)


Help filling MT modes

The following script contains MT modes for various plugins, save it as mtmodes.avsi and place in your auto-load folder. The script is a work-in-progess, there's still lots of plugins that need to be tested and validated. When the script is finalized, the only thing the user will have to write in his script is the Prefetch call, all SetFilterMtMode calls will be hidden in a single .avsi script.

Note for filter writers: filters can report their mt modes in their SetCacheHint, in this case the manual setting is not necessary.


Choosing the correct MT mode

Please do check if the actual output is correct. Fast but corrupted output is useless. Easy way of checking would be using something like ColorBars(1920, 1080, "YV12").AddGrainC(10000, 10000, seed=1) as a source filter. It doesn't always work right but will do for most stuff.[1]

Source: http://forum.doom9.org/showthread.php?p=1667439#post1667439

  • MT_NICE_FILTER: Some filters (like nnedi3) use some buffers to do their dirty work and with mode 1 you get multiple threads writing data from different frames to the same buffer. This causes corruption when later someone tries to read from this buffer and gets not what was expected. Most of the "more complicated" filters use some kind of temporary storage thus won't work well with this mode. Simple filters might.
  • MT_MULTI_INSTANCE: Mode 2 doesn't have this issue because multiple threads will get their own buffers and no data will be shared. Hence mode 2 is the "default" mode which should work with most filters, but it wastes memory like crazy (take SangNom2 for example - for 1080p YV12 frame, size of temporary buffers is about 10MB, so with 4 threads you get 40MBs on single filter invocation. Now add some usual supersampling to this and multiple invocations in most aa scripts and... you get the idea).
  • MT_SERIALIZED: If the filter requires sequential access or uses some global storage, then mode 3 is the only way to go. Source filter (filters without clip parameter) are autodetected, they do not need an explicit MT mode setting, they will automatically use MT_SERIALIZED.
  • MT_SPECIAL_MT: Experimental. Now use only for MP_Pipeline, the filter is like a source filter (no input clip parameter), internally multithreaded, and suffer heavy performance degradation from any of the three regular mt modes. Really, this is a workaround. Available from AviSynth+ version r2440. Avisynth+ 3.6 has serious mt fixes, maybe this mode is not needed anymore.


Closing notes (don't skip!)

  • Remember that MT is only stable as long as you have specified a correct MT mode for all filters.
  • Instead of the numbers 1-2-3-4, you can also use symbolic names for MT modes: MT_NICE_FILTER (1), MT_MULTI_INSTANCE (2), MT_SERIALIZED (3), MT_SPECIAL_MT (4)
  • Mode 3 (MT_SERIALIZED) is evil. It is necessary for some filters, and it is usually no problem for source filters, but it can literally completely negate all advantages of MT, if such a filter is placed near the end of your script. Let us know if you meet a non-source mode 3 filter, we might be able to do something about it, but in general, avoid such calls if you want performance. (And of course, insert what you have found into here.)
  • The new caches will save you a lot of memory in single-threaded scripts, but due to the way they work, they will also use more memory than before with MT enabled. The memory usage will scale much closer with the number of threads you have. Just something to keep in mind.
  • MT-enabled AviSynth+ triggers a latent bug in AvsPmod v2.5.1. Use AvsPmod v2.6.x.
  • Using too many threads can easily hurt performance a lot, because there are other bottlenecks too in your PC than just the CPU. For example, if you have a quad-core machine with 8 logical cores, less than 8 threads will often work much better than 8 or more.


Changes

xxxxx xxxxx
Personal tools