Lobes vs. Taps. The short version.
Any signal-processing filter has "taps" which are the discrete input values that are used to calculate each output value. In audio processing it's pretty straightforward - a 20-tap FIR resampling filter takes 20 audio samples, multiplies each by one value in a 20-element filter "kernel," adds all the results together, and spits out a single output sample. Lather, rinse, repeat. In video it's a little different, but that's the rough idea.
When a filter is based on a windowed sinc function (sinc is (sin(x)/x)), it's convenient to talk about the number of lobes of the sinc function that are being used. X=0 to X=pi is one lobe. X=pi to X=2pi is another lobe. X=2pi to X=3pi is another lobe and so forth. Because all windowed FIR sinc filters are symmetrical around 0, we only talk about the number of lobes on one side of 0. There are an equal number of lobes on the the other side.
So when people talk about Lanczos2, they mean a 2-lobe Lanczos-windowed sinc function. There are actually 4 lobes -- 2 on each side. The function, graphed in the range -2pi to 2pi looks kinda like a sombrero. I've attached a graph of Lanczos2. Catmull-Rom looks almost identical, but is not a windowed sinc filter, or in fact any kind of sinc filter. The fact that two resizing filters created using completely different mathematical principles look almost identical is not a coincidence, but understanding why it turned out that way was certainly a big ah-ha! moment for me. And way too complicated to explain here, sadly.
For upsampling (making the image larger), the filter is sized such that the entire equation falls across 4 input samples, making it a 4-tap filter. It doesn't matter how big the output image is going to be - it's still just 4 taps. For downsampling (making the image smaller), the equation is sized so it will fall across 4 *destination* samples, which obviously are spaced at wider intervals than the source samples. So for downsampling by a factor of 2 (making the image half as big), the filter covers 8 input samples, and thus 8 taps. For 3X downsampling, you need 12 taps, and so forth.
The total number of taps you need for downsampling is the downsampling ratio times the number of lobes, times 2. And practically, one needs to round that up to the next even integer. For upsampling, it's always 4 taps.
So a 2-lobe (which is really a 4-lobe) Lanczos filter could require any number of taps. To downsample a 1080p image down to 108 pixels high would require 40 taps.
All of these numbers are, of course, just in one dimension. To get the full number of taps required for an output pixel, you would theoretically calculate the number of taps required in each dimension (height & width) and then multiply them together. So upsampling would require 4x4=16 taps. By convention, though, we still call that a 4-tap filter (unless we're in marketing, and then we call it a 48-tap filter: 16 taps times 3 components R, G, and B ).
Practically speaking, for performance you would always scale the image in two passes, which is perfectly kosher (or at least it is with these filters, which are *separable*, meaning that we can apply them separately in X and Y and get the same result as applying the combined filter). Conceptually you scale just in the width dimension into a new buffer that is as wide as the target image and as high as the original. Then you scale that intermediate image just in the height dimension into a final target buffer. Thus in the upsampling case you are applying two 4-tap filters instead of a single 16-tap filter.
Again, there are a huge number of practical considerations. You end up pipelining things very differently than the "conceptual" algorithm, mainly to maximize cache coherency. In order to do any kind of arbitrary scaling, you need to calculate a large number of filter kernels, as the phase between the input and output samples changes as you work your way across the image.
Man, if that's the short version I hope I never have to write the long version. Again, the thing to do if you want to understand practical scaling algorithms is to read the Turkowski paper I referenced earlier in the thread and read the code for Paul Heckbert's "Zoom" (it's not the cleanest code in the world, but understanding why he does various things is a real education). Start there and then branch out.
And if you have questions, you can always email me. My email address is not hard to find. I can't always promise I can answer every question, as among other things some of the stuff I'm doing right now with scaling is about to be patented, but I'm happy to steer folks in the right direction.
As for the questions about the best ways to use ffdshow, I have to admit that I don't know. At home I use the scaler built into the VW10HT, which looks to me like a basic bicubic, which is perfectly fine. At work I use my own scaler. I don't really have time to evaluate ffdshow extensively.
I will say that I am adamantly against filters with more than 2 lobes when scaling images. Their advantage is extra sharpness, but they ring like crazy. In my own scaler, I adjust sharpness by combining a modified sharpening kernel with the basic scaling kernel. It makes the image just as sharp as a multi-lobe Lanczos, but without the multi-lobe ringing.
source: AVSforum post of dmunsil.