Is conceptual knowledge derived from sensory/motor experience (embodied) or from language? Empirical neuroscience suggests that it is derived from sensory embodiment, corresponding to the processing and organization principles of basic information which is modality-specific in human brain such as vision, hearing, movement and so on. On the other hand, the success of OpenAI in recent years has prompted people realize the powerful ability of language to abstract and compress the structure of the world. In the normal human brain, it is difficult to distinguish between the two. After all, language is mainly used to describe the information we see, hear, and perceive, and there is a strong correlation among the sensory modalities, and the content and structure of the information obtained from different modalities is also related. Due to the correlation of across modalities in general, we don't know whether the knowledge spaces constructed by these modalities are exactly the same. If they are different, then which information channel do humans use to construct knowledge?
Shape, as a particularly basic property for identifying objects, is one of the important pieces of information for us to perceive and understand the world. We can see shapes through vision, touch shapes through tactile, and obtain knowledge of shapes through language. Although the vocabulary used to describe shapes in language is relatively simple, there may be a lot of complex shape information embedded between the vocabulary (many hidden structures). In normal human brain, even in congenitally blind (no visual experience) people’s brain, our visual cortex, ventral pathway, and dorsal pathway can all decode shape information of objects. Although there is mounting evidence for the shared cognitive and neural representation space between visual and tactile shape, previous research tended to rely on dissimilarity structures between objects and had not examined the detailed properties of shape representation in the absence of vision. So, if there is no vision, how does the human brain represent shapes? Are the shape representations formed through touch the same as those obtained through vision? What would happen to shape representations if touch were not possible? To what extent can language play a role in shape representations?
To answer these questions, we selected three domains of familiar objects with varying levels of tactile exposure, including tools, large nonmanipulable objects, and animals, and conducted three explicit object shape knowledge production experiments (verbal feature generation, clay modelling with Play-Doh, and drawing) with congenitally blind and sighted participants. In another experiment, we selected two categories of familiar non-object concrete concepts with less or almost no tactile exposure (fig-weather/fig-scenes, and nonfig-weather/nonfig-scenes), and asked the participants to produce 2D drawings of these concepts. By using multiple methods, such as feature coding, large language model text analysis, subjective similarity evaluation, computational vision, and multidimensional scaling analysis (MDS), we conducted four studies to reveal the effects of vision, touch, and language on the shape representation of objects and non-object concrete concepts from the perspectives of shape attributes of language features, general quality of shape knowledge production, and inter-subject consistency.
Study 1 mainly focused the impact of vision absent on the shape space of language features. We categorized the generated features by manually coding, and quantified the shape attributes of language features using an AI language model. Then we did a series of analyses about the distribution of feature types, distance from shape words, and inter-subject consistency. We found that the blind group generated more multimodal sensory features than the sighted group, and the features generated by the blind people were less strongly linguistically associated with shape features than those by the sighted group. But for tools, the blind group had higher consistency between individuals than the sighted group, while for large objects, it was the opposite. For animals, there was no difference in inter-subject consistency between the two groups. These findings indicate that the space of verbal shape features may not be fully immune to vision absence.
Study 2 explored the shape representation of objects with different levels of tactile exposure by a direct shape manifestation experiment (3D model making). We used a subjective evaluation method to measure the goodness of the models, and a computer graphics approach to measure the (dis)agreement within and across subject groups for the models. We further visualize the shape representation space for each domain objects using the method of MDS. The results showed that the absence of visual experience led to poorer abilities to make 3D models for animals and did not affect the abilities for tools and large objects on the group average level. However, the blind subjects had greater variations in their 3D modelling for tools relative to the sighted, and the tool models made by the blind group were shorter. The lack of visual experience did lead to specific changes in the operable of objects.
Study 3 explored how the absence of visual experience would impact participants’ production of 2D shapes for objects by another shape manifestation experiment (2D drawing). In this study, we recruited an independent group of raters to name and score the drawings to quantify the goodness of the drawings, and used the multi-arrangement task to see whether blindness affects the inter-subject alignment in object representation. The results showed that the lack of visual experience had the strongest impact on how well they drew animals. For tools with rich tactile/manipulation experiences, the blind group drew with similar quality as the sighted group, indicating that vision helps to align the shapes at tools. The level of individual differences between blind and sighted group was comparable, and there were systematic differences between groups, but it is difficult to determine whether these differences are due to visual experience or artistic experience.
Study 4 used non-object concrete concepts with different visual importance as experimental materials to examine whether the findings about object shapes were applicable to more general conceptual knowledge that was not limited to specific traditional object shapes. We asked participants to complete a drawing task, and analyzed the painting quality and inter-subject consistency. The results showed that the lack of visual experience affected the quality of shape production for non-object concrete concepts. Even with different tactile experiences, there was no significant difference in the drawing quality of the two types of concepts. This means that for the shape representation of non-object concrete concepts, whether it had explicit shape that can be described by language may become more important.
In summary, we found that even for such a classic supramodal property - shape, its representation reflects the intricate orchestration of vision, touch and language. In human brain intelligence, these pieces of information are utilized together to construct an internal model representation of shape. These findings not only provide some answers to the modality-specific of shape representation, but also highlight the limitations of current theories of general concept knowledge representation,and also have broader implications for the development of current machine intelligence and the neurorehabilitation of people with visually impaired or hearing impaired.