Nnedi3 resize16

Abstract
Author	mawen1250
Version	v3.3
Download	nnedi3_resize16_v3.3.avsi
Category	Resizers
License
Discussion	NMM-HD Thread - [Chinese]

Description

nnedi3_resize16 is an advanced script for image resizing and colorspace conversion.

Requirements

[x86]: AviSynth+ or AviSynth 2.6.0
[x64]: AviSynth+
Progressive input only
Supported color formats: Y8, YV12, YV16, YV24

Required Plugins

Latest version of the following plugins are recommended unless stated otherwise.

Dither
MaskTools2
nnedi3
RgTools
SmoothAdjust
FTurn - not necessarily required but will improve speed; AviSynth+ already includes these optimizations so there's no need for FTurn.

Optional script:

ContraSharpen_mod.avsi - script is only required when sharp>0

Syntax and Parameters

nnedi3_resize16_v3.3.avsi

nnedi3_resize16	(clip input, int "target_width", int "target_height", float "src_left", float "src_top", float "src_width", float "src_height",
\	string "kernel_d", string "kernel_u", float "f_d", float "f_u", int "taps",
\	float "a1", float "a2", float "a3", bool "invks_d", bool "invks_u", int "invkstaps", bool "noring",
\	int "nsize", int "nns", int "qual", int "etype", int "pscrn", int "threads",
\	float "ratiothr", bool "mixed", float "thr", float "elast", float "sharp",
\	string "output", bool "tv_range", string "cplace", string "matrix", string "curve", float "gcor",
\	int "Y", int "U", int "V", bool "lsb_in", bool "lsb", int "dither")

Input clip and resizing parameters

input

clip input =

Input clip to be processed.

target_width

int target_width =

Target width; default value is the width of the input clip.

target_height

int target_height =

Target height; default value is the height of the input clip.

src_left

float src_left = 0.0

src_top

float src_top = 0.0

Coordinate of the top-left corner of the picture sub-area used as source for the resizing. They can be fractional. If negative, the picture is extended by replicating the left pixel column.

src_width

float src_width = width(input)

src_height

float src_height = height(input)

Size in pixels of the sub-area to resize. They can be fractional. If 0, the area has the same size as the source clip. If negative, they define coordinates relative to the bottom-right corner, in a Crop-like manner.

Default value is the width and height of the input clip.

Just like AviSynth's resizers you can use an expanded syntax which crops before resizing. The same operations are performed as if you crop just before resizing, there can be a slight beneficial speed difference.
Note the edge semantics are slightly different, cropping gives a hard absolute boundary, the resizer filter lobes can extend into the cropped region but not beyond the physical edge of the image.
Use crop to remove any hard borders or any other unwanted noise, using the resizer cropping may propagate the noise into the adjacent output pixels.
Use the resizer cropping to maintain accurate edge rendering when cropping a part of a complete image.

Scaling Ratio Calculation

ratiothr

float ratiothr = 1.125

When scale ratio is larger than ratiothr, use nnedi3+Dither_resize16 upscale method instead of pure Dither_resize16.
When horizontal/vertical scale ratio > "ratiothr", we assume it's upscaling
When horizontal/vertical scale ratio <= "ratiothr", we assume it's downscaling

Parameters for merging edge and flat upscaled clip

mixed

bool mixed = true

nnedi3_resize16 uses nnedi3+Dither_resize16 for edge upscaling, this parameter defines whether to combine nnedi3+Dither_resize16(edge area) and Dither_resize16(flat area) in upscaling, which achieves higher precision upscaling result(mainly for flat area).

thr

float thr = 1.0

The same with "thr" in Dither_limit_dif16, valid value range is (0, 10.0].

Threshold between reference data and filtered data.

elast

float elast = 1.5

The same with "elast" in Dither_limit_dif16, valid value range is [1, 10.0].

To avoid artifacts, the threshold has some kind of elasticity. Value differences falling over this threshold are gradually attenuated, up to thr * elast. > 1.

PDiff: pixel value diff between flat clip and edge clip (edge clip as reference)
ODiff: pixel value diff between merged clip and edge clip (edge clip as reference)
PDiff, thr and elast is used to calculate ODiff:
ODiff = PDiff when [PDiff <= thr]
ODiff gradually smooths from thr to 0 when [thr <= PDiff <= thr * elast]
for elast>2.0, ODiff reaches maximum when [PDiff == thr * elast / 2]
ODiff = 0 when [PDiff >= thr * elast]

Larger "thr" will result in more pixels being taken from flat area upscaled clip (Dither_resize16)
Larger "thr" will result in less pixels being taken from edge area upscaled clip (nnedi3+Dither_resize16)
Larger "elast" will result in more pixels being blended from edge&flat area upscaled clip, for smoother merging

Parameters for nnedi3

nsize

int nsize = 0

Sets the size of the local neighborhood around each pixel that is used by the predictor neural network.

Possible settings (x_diameter x y_diameter):

0 - 8x6
1 - 16x6
2 - 32x6
3 - 48x6
4 - 8x4
5 - 16x4
6 - 32x4

For image enlargement it is recommended to use 0 or 4. Larger y_diameter settings will result in sharper output.

For deinterlacing larger x_diameter settings will allow connecting lines of smaller slope. However, what setting to use really depends on the amount of aliasing (lost information) in the source.

If the source was heavily low-pass filtered before interlacing then aliasing will be low and a large x_diameter setting wont be needed, and vice versa.

nns

int nns = 3

Sets the number of neurons in the predictor neural network. Possible settings are 0, 1, 2, 3, and 4. 0 is fastest. 4 is slowest, but should give the best quality.

This is a quality vs speed option; however, differences are usually small. The difference in speed will become larger as 'qual' is increased.

0 - 16
1 - 32
2 - 64
3 - 128
4 - 256

qual

int qual = 1

Controls the number of different neural network predictions that are blended together to compute the final output value.

Each neural network was trained on a different set of training data. Blending the results of these different networks improves generalization to unseen data.

Possible values are 1 or 2. Essentially this is a quality vs speed option. Larger values will result in more processing time, but should give better results.

However, the difference is usually pretty small. I would recommend using qual>1 for things like single image enlargement.

etype

int etype = 0

Controls which set of weights to use in the predictor nn. Possible settings:

0 - weights trained to minimize absolute error
1 - weights trained to minimize squared error

pscrn

int pscrn = 2

Controls whether or not the prescreener neural network is used to decide which pixels should be processed by the predictor neural network and which can be handled by simple cubic interpolation.

The prescreener is trained to know whether cubic interpolation will be sufficient for a pixel or whether it should be predicted by the predictor nn. The computational complexity of the prescreener nn is much less than that of the predictor nn.

Since most pixels can be handled by cubic interpolation, using the prescreener generally results in much faster processing. The prescreener is pretty accurate, so the difference between using it and not using it is almost always unnoticeable.

Version 0.9.3 adds a new, faster prescreener with three selectable 'levels', which trade off the number of pixels detected as only requiring cubic interpolation versus incurred error.

Therefore, pscrn is now an integer with possible values of 0, 1, 2, 3, and 4.

0 - no prescreening (same as false in prior versions)
1 - original prescreener (same as true in prior versions)
2 - new prescreener level 0
3 - new prescreener level 1
4 - new prescreener level 2

Higher levels for the new prescreener result in cubic interpolation being used on fewer pixels (so are slower, but incur less error). However, the difference is pretty much unnoticeable.

Level 2 is closest to the original prescreener in terms of incurred error, but is much faster.

threads

int threads = 0

Controls how many threads will be used for processing. If set to 0, threads will be set equal to the number of detected processors.

Parameters for Dither_resize16

kernel_d

string kernel_d = "Spline36Resize"

kernel_u

string kernel_u = "Spline64Resize"

"kernelh","kernelv" of Dither_resize16; kernel_d is used in downscaling and kernel_u is used in upscaling.

Kernel used by the resizer. Possible values are:

`"point"`	Nearest neighbor interpolation. Same as PointResize().
`"rect"` or `"box"`	Box filter.
`"linear"` or `"bilinear"`	Bilinear interpolation. Same as BilinearResize().
`"cubic"` or `"bicubic"`	Bicubic interpolation. Same as BicubicResize(). The b and c variables are mapped on a1 and a2 and are both set to 1/3 by default.
`"lanczos"`	Sinc function windowed by the central lobe of a sinc. Use taps to specify its impulse length. Same as LanczosResize().
`"blackman"`	Blackman-Harris windowed sinc. Use taps to control its length. Same as BlackmanResize().
`"blackmanminlobe"`	Another kind of Blackman windowed sinc, with a bit less ringing. Use taps for you know what.
`"spline16"`	Cubic spline based kernel, 4 sample points. Same as Spline16Resize().
`"spline36"`	Spline, 6 sample points. Same as Spline36Resize().
`"spline64"`	Spline, 8 sample points. Same as Spline64Resize().
`"spline"`	Generic splines, number of sample points is twice the taps parameter, so you can use taps = 6 to get a Spline144Resize() equivalent.
`"gauss"` or `"gaussian"`	Gaussian kernel. The p parameter is mapped on a1 and controls the curve width. The higher p, the sharper. It is set to 30 by default. This resizer is the same as GaussResize(), but taps offers a control on the filter impulse length. For low p values (soft and blurry), it’s better to increase the number of taps to avoid truncating the Gaussian curve too early and creating artifacts.
`"sinc"`	Truncated sinc function. Use taps to control its length. Same as SincResize().
`"impulse"`	Offers the possibility to create your own kernel (useful for convolutions). Add your coefficients in the string after “impulse”, separated with spaces (ex: "impulse 1 2 1"). The number of coefficients must be odd. The curve is linearly interpolated between the provided points. You can oversample the impulse by setting kovrspl to a value > 1.

f_d

float f_d = 1.0

f_u

float f_u = 1.0

"fh","fv" of Dither_resize16; f_d is for downscaling, f_u is for upscaling.

Horizontal and vertical frequency factors, also known as inverse kernel support. They are multipliers on the theoretical kernel cutoff frequency in both directions.

Values below 1.0 spatially expand the kernel and blur the picture. Values over 1.0 shrink the kernel and let higher frequencies pass. The result will look sharper but more aliased.

The multiplicator is applied after the kernel scaling in case of downsizing. Negative values force the processing, even if the horizontal size doesn’t change. The filter will use the absolute parameter value.

taps

int taps = 4

"taps" of Dither_resize16.

Some kernels have a variable number of sample points, given by this parameter. Actually this counts half the number of lobes (or equivalent) ; in case of downscaling, the actual number of sample points may be greater than the specified value. Range: 1–128

a1

float a1 =

a2

float a2 =

a3

float a3 =

Specific parameters, depending on the selected kernel.

invks_d

bool invks_d = false

invks_u

bool invks_u = false

"invksh","invksv" of Dither_resize16; invks_d is used in downscaling, invks_u is used in upscaling.

Activates the kernel inversion mode for the specified direction (use invks for both). Inverting the kernel allows to “undo” a previous upsizing by compensating the loss in high frequencies, giving a sharper and more accurate output than classic kernels, closer to the original. This is particularly useful for clips upscaled with a bilinear kernel. All the kernel-related parameters specify the kernel to undo. The target resolution must be as close as possible to the initial resolution. The kernel inversion is mainly intended to downsize an upscaled picture. Using it for upsizing will not restore details but will give a slightly sharper look, at the cost of a bit of aliasing and ringing. This mode is somewhat equivalent to the debilinear plug-in but works with a different principle.

invkstaps

int invkstaps = 5

In kernel inversion mode (invks=true), this parameter sets the number of taps for the inverted kernel. Use it as a tradeof between softness and ringing. Range: 1–128

noring

bool noring = false

True use non-ringing algorithm of Dither_resize16 in flat area scaling

It actually doesn't make much sense for nnedi3_resize16(which uses nnedi3 for upscaling), while it may produce blurring and aliasing when downscaling. You'd better not setting it to True unless you know what you are doing.

Post-Process

sharp

int sharp = 0

Strength of Contra-Sharpen mod, for sharper edge. 0 means no sharpening, common value is about 100.

Only* when {horizontal or vertical}{scale ratio}>{ratiothr} will sharpening take effect (when nnedi3 is used for upscaling).

Input / Output

Y

int Y = 3

U

int U = 3

V

int V = 3

Choose what planes to process; works just like MaskTools2.

2 : copy from input clip
3 : process

lsb_in

bool lsb_in = false

input clip is 16-bit stacked or not.

lsb

bool lsb = false

Output clip is 16-bit stacked or not, processing precision is 16-bit or not.

tv_range

bool tv_range = true

Input clip is TV-range (16-235) or PC-range (0-255).

dither

int dither =

Dither mode for 16-bit to 8-bit conversion. If tv_range=true, it defaults to 6, if false it defaults to 50.

Dithering method:

−1	no dither, round to the closest value
0	8-bit ordered dither + noise.
6	Serpentine Floyd-Steinberg error diffusion + noise. Well-balanced algorithm.
7	Stucki error diffusion + noise. Looks “sharp” and preserve light edges and details well.
8	Atkinson error diffusion + noise. Generates distinct patterns but keeps clean the flat areas.

Modes 1 to 5 have no real interest over mode 0 and can be considered deprecated.

output

string output =

Output format. Possible values are:

"Y8"	Regular Y8 colorspace. Parameter "lsb" works on this output mode.
"YV12"	Regular YV12 colorspace. Parameter "lsb" works on this output mode.
"YV16"	Regular YV16 colorspace. Parameter "lsb" works on this output mode.
"YV24"	Regular YV24 colorspace. Parameter "lsb" works on this output mode.
"RGB24"	Regular RGB24 colorspace.
"RGB32"	Regular RGB32 colorspace.
"RGB48YV12"	48-bit RGB conveyed on YV12. Use it for raw video export only. Not suitable for display or further processing (it will look like garbage).
"RGB48Y"	48-bit RGB. The components R, G and B are conveyed on three YV12 or Y8 (if supported) stack16 clips interleaved on a frame basis.

If output is not defined it will default to the colorspace of the input clip.

cplace

string cplace = "MPEG2"

Placement of the chroma subsamples. Can be one of these strings:

"MPEG1"	4:2:0 subsampling used in MPEG-1. Chroma samples are located on the center of each group of 4 pixels.
"MPEG2"	Subsampling used in MPEG-2 4:2:x. Chroma samples are located on the left pixel column of the group.

matrix

string matrix =

The matrix used to convert the YUV pixels to computer RGB. Possible values are:

"601"	ITU-R BT.601 / ITU-R BT.470-2 / SMPTE 170M. For Standard Definition content.
"709"	ITU-R BT.709. For High Definition content.
"240"	SMPTE 240M
"FCC"	FCC
"YCgCo"	YCgCo

When the parameter is not defined, "601" and "709" are automatically selected depending on the clip definition. If either the width is greater than 1024 or the height is greater than 576 then matrix defaults to "709", if equal to or less than, it defaults to "601".

curve

string curve = "linear"

Type of gamma mapping (transfer characteristic) for gamma-aware resize (only take effects for Dither_resize16 processing parts).

"709"	ITU-R BT.709 transfer curve for digital video
"601"	ITU-R BT.601 transfer curve, same as "709"
"170"	SMPTE 170M, same as "709"
"240"	SMPTE 240M (1987)
"srgb"	sRGB curve
"2020"	ITU-R BT.2020 transfer curve, for 12-bit content. For sources of lower bitdepth, use the "709" curve.
"linear"	linear curve without gamma-aware processing

gcor

float gcor = 1.0

Gamma correction, applied on the linear part.

Examples

nnedi3_resize16 with default values (TODO):

AviSource("Blah.avi")
nnedi3_resize16()

Convert a 4:2:0 JPEG to RGB. Remember that most JPEGs use full range levels and the BT.601 color matrix. MPEG1 chroma placement is very common among 4:2:0 JPEGs while 4:2:2 JPEGs use the MPEG2 chroma placement.

JpegSource("420.jpg")
nnedi3_resize16(output="RGB32", tv_range=false, cplace="MPEG1", matrix="601")

External Links

NMM-HD Forum - nnedi3_resize16 discussion [Chinese].

Back to External Filters ←