If I read from a mipmap level rather than a "full" texture. Is it faster? For example, if I do col = tex2Dlod(_MainTex, float4(i.uv, 0, 1)); is that faster than col = tex2D(_MainTex, i.uv); because the tex2Dlod is reading from a mipmap level 1 which is 1/4 the size of the full texture? Or does tex2Dlod have to read the whole texture anyway, or have I completely misunderstood the whole thing?
The size of the texture itself being read doesn't really impact performance much, it's kind of like asking is reading index 8 in a 100 length array faster than reading index 8 in a 10,000 length array, it's both just pointing to a location in memory. tex2Dlod can be faster because it doesn't need to compute the mip-level to sample from using implicit derivatives. Mip-maps aren't really a sampling performance measure, they are a GPU memory saving feature if only lower level mips need to be used (like with mip-streaming or when user lowers their graphics setting texture quality), and offer a way to have a texture be filtered or look slightly different at different distances.
Like @Invertex said, using tex2Dlod instead of tex2D is theoretically slightly more efficient in terms of the GPU not having to compute the mip level. If you're doing something like a full screen, screen space texture sample (like for a post process effect) where the texture is only ever going to need a specific mip level, then yes, this could be faster. How much faster? Honestly, on most modern hardware you won't even be able to measure a difference. It'll likely be on the order of a few microseconds faster (millionths of a second), or less. Otherwise, if you're talking about a 3D scene with this shader being applied on a variety of surfaces at different distances and angles to the camera, then tex2D is going to be way, way faster. Measurably so on desktop & console, and very likely visible so on mobile if the texture is large enough. The way GPUs work when it displays a texture is it loads only small chunks (usually called a block, which is some number of contiguous pixels, like a 4x4 group) of textures at one time into small, fast memory (cache) that sits near the hardware that actually reads the textures (Texture Mapping Unit). If multiple pixels next to each other all need the same few blocks then it can reuse the blocks already in that fast cache. If a pixel needs a block that's not in the cache, then the GPU needs to fetch it from the main memory, and that takes time, which means it takes longer for the shader to get that sample, which means it takes longer. Lets imagine an infinite floor plane that goes into the horizon. In the foreground just below your feet the the texture might be displayed at a size that is larger than the source texture. Large portions of the screen might be reusing the same cache, so this is super fast to render, and tex2Dlod and tex2D aren't going to matter much here. Sure forcing a lower mip level might be slightly cheaper, but the cache is already getting reused a lot so it isn't a major bottleneck. No look into the distance. Far at the horizon every pixel is sampling a different part of the texture. With tex2Dlod you've locked it to a specific mip level and now the block needed for each pixel is different. Now suddenly the GPU is needing to fetch a new set of blocks from main GPU memory for every single pixel being rendered. With tex2D the GPU has already calculated it only needs a smaller mip level. Potentially one of the smallest mip levels, each of which is the size of or smaller than a single block of memory. Now the blocks for those mip levels all fit into cache at one time, so all of the pixels in the distance out to the horizon are able to reuse the data already in the cache.