It's very difficult, if not impossible, for us humans to understand how robots see the world.
Their cameras work like our eyes do, but the space between the image that a camera captures and actionable information about that image is filled with a black box of machine learning algorithms that are trying to translate patterns of features into something that they're familiar with.
Training these algorithms usually involves showing them a set of different pictures of something (like a stop sign), and then seeing if they can extract enough common features from those pictures to reliably identify stop signs that aren’t in their training set.
Here's an example of the kind of adversarial image we're used to seeing:
Obviously, it's totally, uh, obvious to us that both images feature a panda.
This kind of thing also works with street signs, causing signs that look like one thing to us to look like something completely different to the vision system of an autonomous car, which could be very dangerous for obvious reasons.