Confronted by a chaotic and rapidly changing world, as humans we are constantly performing computations on the torrent of visual input we receive, deciding whether the shape in front of us is a building or a car (what), where said car is (where), and whether its currently moving towards us (what is it doing). To determine this quickly, efficiently, and ideally in parallel, the human visual system is thought to be divided into multiple, distinct processing streams. In the experiments described here, I use functional magnetic resonance imaging (fMRI), diffusion magnetic resonance imaging (dMRI) and neural networks to investigate how structure and function are linked to produce the organization of the visual cortex into differentiated streams. In Study 1, I focus in on the case study of the well-characterized system of face processing to ask how the most basic computations differ across processing streams and then how anatomical connections from the earliest stages of the cortical visual system to higher-level face-selective areas reflect different computational demands across streams. In Study 2, I zoom out to ask what are the underlying principles behind the organization of visual cortex into processing streams. While the streams are typically discussed in the context of what behaviors they each may serve, I find that instead their emergence can be explained by optimization pressures to learn generally useful visual representations, while fitting those representations in the constrained physical space of the skull. Together, these findings provide a window of insight into why and how the functional organization of human visual cortex into specialized processing streams emerges, and may have implications for how to efficiently build more general artificial intelligence systems.