6/17/24: Modern Depth Estimators, Part 1: Scenery
cross
parallel
depth map
cross
parallel
depth map
cross
parallel
depth map

Since my 2021 3D-Con workshop on 3D tools for AI (read/watch HERE!), there have been numerous updates to the field of 2D–to-3D conversion. As always, there are more promises and research papers than actual usable tools, but nonetheless, our tool box for estimating depth maps is now larger and better than before. MiDaS, which I featured in the workshop, is now at version 3.1; other depth estimators now available include Depth Anything, ZoeDepth, PatchFusion, and Marigold. These five are available to the public to use for free, and with the exception of MiDaS v3.1 (which needs to be used through a somewhat janky Google Colab notebook), they are simple to use! Test them out yourself with the above links.

In this 4-part series, I will compare the 5 free depth estimators listed above, and attempt to suss out their relative advantages and disadvantages. After this Part 1: Scenery, next I’ll look at portraits, more artistic works, and finally, real photographs.

In addition to these five free depth estimators, LeiaPix’s converter, now rebranded as Immersity, has a recently-updated depth estimator engine within its one-stop beginner-friendly 3D animator, although this is a paid option. For comparison, I will include only a few samples of Immersity outputs here and there in this series (where it performs well), but include it fully in Part 5, on photographs.

For this Part 1: Scenery, I’ll use three AI images: the 1st realistic, the 2nd realistic but with extreme perspective, and the 3rd more fanciful / with sci-fi elements. All the depth maps shown are rendered in the Inferno colormap palette, for visibility. All stereopairs are created with StereoPhoto Maker from the original 2D image and depth map, with disparity calibrated to approximately match.


Image 1: Coastal Arches

2D image made with: Stable Diffusion XL (DreamShaper XL Turbo DPM++ SDE), Detail Tweaker XL LoRA, Dissolve Style LoRA, editing

cross
parallel
depth map

MiDaS v3.1

To my knowledge, MiDaS is the oldest depth estimator of these five, but impressively, its version 3.1 still consistently manages to be among the top, especially with regards to overall scene composition. Here, the scene looks largely correct, and clean—only the large rocks are perhaps a bit flat, as is the crashing wave.

SCORE: 5 of 5. Usable on its own if need be.


cross
parallel
depth map

Depth Anything

This second-oldest estimator (again AFAIK) also does a generally good job. The rocks have more depth than with MiDaS. However, there are some ragged edges visible, especially at the high-contrast boundaries between the arches and sky.

Note: unlike MiDaS, Depth Anything (at least the demo version I have linked here) outputs colorized depth maps, using what looks like the Inferno colormap palette. The process of converting such a colored map back to grayscale unfortunately introduces some artifacts, which you can see here.

SCORE: 4 of 5. Usable if ragged edges are cleaned up.


cross
parallel
depth map

ZoeDepth

Here we see arguably impressive details, such as with the arches and to some extent the wave, but the overall scene has problems, like a noticeable bulge/curve to the foreground, and the odd flatness of the pillar between the two arches—a flaw readily apparent from the depth map. There is a slight bit of ragged edges around the top of the arches.

SCORE: 2 of 5. Good details but several major flaws.


cross
parallel
depth map

PatchFusion*

The raw output depth map from PatchFusion is so bright—that is, so far skewed towards close depth values—that it is unusable as is. Simply making the map less bright does not solve the problem—as too many depth values are essentially crowded together near the white end of the spectrum. A partial remedy—call it a patch?—is to equalize the depth map: to spread out its brightness values evenly between black and white. While this is artificial, it can make the depth map salvageable.

In this case, the equalized depth map (with editing represented throughout by the * asterisk) has some impressive details around the arches, but like ZoeDepth, has an incorrectly curved foreground, and too-flat center pillar. In addition, the foreground of the depth map has poor resolution with too few gradations (a result of the equalization fix), which is visible in the stereopairs. Edges, however, are clean.

SCORE: 2 of 5. Good details but several major flaws.


cross
parallel
depth map

Marigold*

Like PatchFusion, this Marigold depth map required equalization to be usable. In this case, we have the same impressive detailing to the arches, but the foreground is more correctly flat than PatchFusion and ZoeDepth, plus the shore rocks have depth to them. The only general depth problem is in the midground, which is too curved, and slight ragged edges, most notably inside the arches.

SCORE: 3 of 5. Good details, but with a major flaw in midground depth.


cross
parallel
depth map

The Winner: Composite of MiDaS + Marigold*

The best depth map here turned out to be a 2:1 mix of MiDaS to (equalized) Marigold—the former for the overall scene depth, and the latter to add a bit more details. I also applied a slight edge dilation to clean up the ragged boundaries. If you compare these stereopairs to those from MiDaS and Marigold*, you will see the composite is better balanced than either alone.



Image 2: “Nap Time is Over”

2D image made with: Stable Diffusion 1.5 (epiCPhotoGasm Last Unicorn), Detail Tweaker LoRA, editing

cross
parallel
depth map

MiDaS v3.1

MiDaS once again does the overall scene depth extremely well. The only flaws are around the feet of the sleeper (granted, the original artwork is a bit weird there), and some ragged edges around the hat and left side of innertube.

SCORE: 4 of 5. Usable with minor adjustments.


cross
parallel
depth map

Depth Anything

The output here is very close to that of MiDaS above, but it improves details of the sleeper, and corrects the feet. The same ragged edges are present.

SCORE: 5 of 5. Usable as is if need be.


cross
parallel
depth map

ZoeDepth

Overall scene depth is off here, again with a strange curvature from mid to foreground. Although the ragged edges seen with MiDaS and Depth Anything are slightly improved, the center background is oddly flat, and the sleeper’s shorts are too close.

SCORE: 2 of 5. Some good details but multiple major flaws.


cross
parallel
depth map

PatchFusion*

Once again this depth map required equalization, which results in far too few depth layers for the foreground. On the other hand, the distant background in the top third of the scene is arguably the best so far.

SCORE: 2 of 5. Good details but multiple major flaws.


cross
parallel
depth map

Marigold*

Once again, this required equalization. While there are some impressive details, the overall scene depth in the lower two-thirds of the image is incorrectly curved.

SCORE: 3 of 5. Good details but major flaw in foreground depth.


cross
parallel
depth map

The Winner: Depth Anything*

The best option here is Depth Anything, but with added edge dilation to reduce the ragged edges.



Image 3: Martian Canyon

2D image made with: Stable Diffusion XL (DreamShaper XL Turbo DPM++ SDE), Detail Tweaker XL LoRA, editing

cross
parallel
depth map

MiDaS v3.1

As usual, MiDaS does a good job with overall scene depth, but could use more details, such as with the small figures at bottom. The small canyon-spanning bridge or crane is a bit strangely curved.

SCORE: 5 of 5. Usable as is if need be.


cross
parallel
depth map

Depth Anything

Depth Anything gives significantly more depth to the canyon, more details to the buildings, vehicles, and figures at bottom, and straightens out the bridge/crane. The only major complaint is ragged edges around the periphery of the foreground framing canyon.

SCORE: 5 of 5. Usable as is if need be.


cross
parallel
depth map

ZoeDepth

ZoeDepth gives even more depth and details to the canyon, buildings, and bottom elements (though the largest vehicle is a bit off). But the bridge/crane is slightly warped, and the distant background flattens out too quickly.

SCORE: 3 of 5. Generally good, but flat background is a major flaw.


cross
parallel
depth map

PatchFusion*

(Equalized) Here we see the most relative depth and detail given to the canyon and buildings. The bridge/crane is straight. Bottom elements are picked out in detail, but the bottom canyon floor is a bit too tilted, and some clouds in the far distance are at odd depths.

SCORE: 4 of 5. Usable (arguably) with small adjustments.


cross
parallel
depth map

Marigold*

(Equalized) As with PatchFusion, we get plenty of (and arguably too much) detail overall, but ragged edges are widespread, and like PatchFusion, the very bottom of the canyon floor is needlessly sloped.

SCORE: 3 of 5. Probably usable if ragged edges are cleaned up, but it would be laborious.


cross
parallel
depth map

The Winner: Composite of ZoeDepth + PatchFusion*

Though ZoeDepth on its own would arguably be fine, I decided to mix it 1:1 with PatchFusion, for more detail, to help straight the bridge/crane, and to slightly improve the large vehicle. Notice the widespread ragged edges of PatchFusion have thankfully been covered up in its mixing with ZoeDepth, which is how things tend to work.


cross
parallel

Bonus: Immersity

As it is a paid option, I only tested Immersity briefly with this set of images, and opted not to pay to export depth maps. Here, though, you can see Immersity’s depth estimator did quite a good job with this image—what would be a 5 of 5 (immediately usable) score. However, it performed much less well with the first two images, producing depth maps I would rate as unusable.



Conclusions

With these 3 scenic images, here is how the 5 depth estimators ‘scored,’ including their victories, along with apparent trends for each:

MiDaS v3.1: 14 out of 15 + 1/2 win. Consistently great at overall scene depth, but can lack in details. Its one victory was a split win, but it could have been fine on its own as well, scoring 5 of 5 for that image plus another.

Depth Anything: 14 out of 15 + 1 win. Similarly to MiDaS, good at overall scene depth, plus adds additional detail, but at the cost of also adding ragged edges. The latter problem can sometimes be easily fixed by edge dilation, but not always. Its 1 full win required such a fix.

ZoeDepth: 7 out of 15 + 1/2 win. It had trouble with the flat-surface perspectives of the first 2 images, but was OK with the 3rd. It adds more details than Depth Anything, with fewer ragged edges, but has the tendency to flatten out backgrounds, which is a major flaw. Its 1 win was a split win, and would not quite have been usable alone. It did not score any 5 of 5 results—i.e. was not ever usable as is.

PatchFusion: 8 out of 15 + 1/2 win. After equalization, this produces some great details, and with zero ragged edges, but perhaps as a consequence of it requiring equalization, image foregrounds tend to be either at the wrong depths, or with too few depth level gradations, and these are major flaws. Its one win was split, and it would not have been usable alone. It did not score any 5s (usable as is), but managed one 4 (usable with minor fixes).

Marigold: 9 out of 15 + 1/2 win. After equalization, this had major issues with foreground and/or midground depth, and also tends towards ragged edges. It consistently scored 3 of 5—usable only with a major fix. Its one victory was shared.

Overall, at least with this set of scenic images, it is obvious that MiDaS and Depth Anything, though the oldest two estimators here, were the most dependable by a large margin. They are by far the best bets for ‘usable out of the box’ depth maps. Marigold and PatchFusion, meanwhile, were useful mostly to add detail, via blending, to MiDaS or Depth Anything outputs. ZoeDepth was somewhere in between.

Next, we’ll look at portraits, and see if these trends and conclusions hold.

TAGS: AI > depth estimators > Depth Anything, Marigold, MiDaS, PatchFusion, ZoeDepth; art > AI art > Stable Diffusion; depth maps, Mars, scenery

?Subject=6/17/24">Leave a comment

<< Stereoscopy Issue 137 Cover Art

copyright © 2024 Gordon Au