1 Current Paradigms in Social Robotics: a Critical Survey

Social robots are designed to assist human beings in real physical environments. In today’s design, social interaction plays a key role and enables the robot to perform its function adequately. Consequently, social skill does not merely improve the functioning of social robots, but is central to performance. A pivotal question for the design of these robots concerns determining the core of social skill. Because the aim is to build socially sophisticated robots, that core provides the guiding principle for the design.

That skills for social interaction are essential to the proper functioning of social robots does not necessarily entail that social robots have to be or can be like humans. In the end, what kind of skill a given robot requires to function properly is determined by its task environment, or the setting for which it is programmed. To illustrate, cleaning robots require other social skills at a hotel than at home, a receptionist robot draws on other skills at the hotel than in the emergency ward, and care-taking robots demand different skills from cleaning robots. In contrast, the social skills of human beings while varied are entwined and lose their meaning and motivation considered individually; emerging organically, they are defined and develop together relative to the surrounding socio-culture.

Social robots that are designed to imitate human sociality ideally allow for mutually adaptive interaction that involves the robot as a partner or friend as opposed to a tool. This implies that the robot should be capable of dynamic and variable behavior and of responding to fluctuations in the environment while coordinating with the human. Because humans attune to robot behavior, and volunteer more information and help in the company of a robot partner than a robot tool, the expected benefits of a design that involves the robot as partner are many. Among the important benefits are fluent, efficient, and effortless communication; reliable and knowledgeable robots; robots that monitor and promote their own learning; trust or confidence in the robot and willingness to make use of its technology; individualized responses that decrease the experienced distance between self and robot; personal engagement and feelings of connectedness that enhance interest and motivation to interact and learn; and improved information exchange and mutual assistance in task-related contexts.

To be human-like in the strong sense associated with partnership between humans, social robots need to be susceptible to similar environmental cues and behavior patterns as humans are, display similar ways of responding, and be able to take the initiative and behave pro-actively. This places strong demands on social robotics to build robots that understand others’ actions, intentions, and emotions and show emotions themselves; that know when to listen to the human or act on its own preferences; that develop social competence, can keep up a normal conversation, form social relationships, learn from experience, and perhaps have a personality (see, e.g., Breazeal 2002; Dautenhahn 2007; Fong et al. 2003).

To avoid friction, social robots in many cases are made to as far as possible blend into the environments for which they have been designed. What counts as “blending in” varies. The popular robot Pepper (SoftBank Robotics) has a plastic white body and looks exactly like a robot. On the other hand, Pepper can be programmed to display humanlike behavior, e.g., to perform a series of tai chi movements or wave the hand in response to a given cue. Few people in the robotics field aim to build robots that are social in the very same sense as human beings. The goal is to build robots that function seamlessly and are successful by appearing to have social capacities and behaving as-if they were social. In the robotics field, that robots are social means that they are designed to be used in social contexts to interact with humans and display behavior that involves other agents. It does not mean that the robots are intrinsically social, preferring social interaction to other forms or enjoying it.

Social robots are intended to function as informants and guides in the hospitality and tourism industries, and as tutors and companions in the service and healthcare sectors, in educational contexts, at day care centers, and retirement homes. At present, they so far mainly occur in pilot operations and with limited success. Telepresence robots are remotely controlled and wheeled with video and audio facilities. They allow people to participate in organized social activities by being virtually present through a display mechanism, e.g., attending a meeting or class or consulting a doctor. In contrast, autonomous robots have local artificial intelligence and can interact independently in response to cues from people and the physical environment, which makes them more interesting in a global perspective and promises wider application than telepresence robots.

There are two major approaches in the contemporary research on human-robot interaction that involves autonomous robots. Both are based in the assumption that people prefer to engage with physically humanlike robots capable of interacting on human terms. The first approach promotes android robots that are as similar to humans as possible in the respect of looking and acting like humans. This research area is represented by work by Hiroshi Ishiguro and co-workers with robots such as Erica (Glas et al. 2016) and Geminoid (Nishio et al. 2007)—a teleoperated copy of the researcher himself. Another example that has attracted much media attention is the robot Sophia designed by Hansen Robotics that also aims at mimicking a human. A similar approach is taken by researchers that build robots that are less humanlike in over-all appearance, but still made to imitate human gestures (Zecca et al. 2004) and gaze (Mutlu et al. 2006). Today, androids can walk in modified environments, be programmed to coordinate gesture with voice, have fluid movements, and the mechanics inside them makes only little noise; nevertheless, their capacities are very limited and they appear unnatural. Building androids has proven extremely difficult, costly, and time-consuming, and the goal still is estimated to lie decades ahead. Moreover, it is uncertain that HRI requires robots that are human replicas—people today seem happy to interact with less human-like robots too. The evidence that HRI is more successful in the case of androids than, say, traditional social robots is not conclusive. Some of it dates back to surveys of people’s feelings toward robots made in the 1980s and 1990s. Society has changed a lot since then due to the integration of internet technology, ubiquitous computing, mobile phones, AI and apps into people’s daily life, and attitudes too are likely to have changed considerably.

The second approach builds childlike and cute sociable robots that appeal to the users’ emotions, pioneered in the Kismet robot (Breazeal 2002), and engage with humans by imitation of behavior. It is the basis of several commercial robots such as Jibo (Jibo. Inc.) and Pepper that reproduce basic social competences, e.g., displaying (graphic) facial expressions of emotion, reacting to human display of emotion, and making eye contact or coordinating eye gaze. Whereas this research avoids the problems associated with the android approach, sociable robots rarely have additional functions to that of engaging humans emotionally, which means their ability to assist is limited. They are intended to provide users with positive feelings: to boost self-confidence, decrease distress, and broadly encourage over-all activity and increase arousal. Because the design relies on imitation and yields predictable behavior, sociable robots can be made to cue positive feelings and trigger learning in users. They are not able to engage users reciprocally and flexibly in the manner of human-human interaction, something that genuine assistance is bound to require, because it presupposes real-time collaboration between human and robot. A recurring problem is that while sociable robots initially evoke interest and pleasure, the interest in them fades when the exchange of emotions does not issue in other further interaction. Accordingly, their benefit also fades.

It can be argued that the way in which sociable robots engage with humans is unethical, because it exploits human emotional openness and vulnerability for the mere purpose of functionality, to improve the technical quality of interaction. The fact that they mimic human emotion and interact via bodily and facial expression of emotion encourages users to grow emotional attachments to them, whereas the robots themselves do not have feelings of the human kind, but display cue-based behavior. Users who invest themselves in the robot and become emotionally dependent on them risk being hurt, suffer depression, and develop mental and physical illnesses.

We believe that the present ethical problem is created by the violation of users’ integrity, and arises from failure to satisfy the users’ needs and respect their independence as autonomous beings. In our view, personal integrity is essentially related to recognition and the normative framework in which it is embedded. To clarify the nature of the problem, we turn to social philosophy.

Recognition is a relational notion that concerns the way in which an individual is identified and responded to by other individuals and how the nature of the response will affect the individual’s self-image and identity. Much of the contemporary discussion about recognition has its historical origin in Hegel’s philosophy (Hegel 1977, 1991). In his influential interpretation of Hegel’s system of ethical life, Honneth (1995, 2007) extends Hegel’s distinction between three fundamental social relationships in the direction of the personal sphere to the individual’s relationship to his or her self via others’ recognition. To Honneth, recognition as love concerns physical needs and emotions being met by others in primary relationships of family and friends, and provides the individual with basic self-confidence. Recognition as rights concerns the development of moral responsibility and self-respect and has its origin in the individual’s mutual relations to others, by which the individual learns to take a second-person perspective on him or herself as of equal rights. Finally, recognition as solidarity relates to the community’s recognition of the individual as a unique and independent person with valued capacities, and grounds self-worth and self-esteem.

Honneth’s account of recognition takes its starting-point in universal human values, placing the person and his or her needs at center stage. Similarly to Honneth (1995) we think that forms of misrecognition systematically correspond to forms of recognition, and therefore can be used to identify the major types of negative experiences that emotional dependence on sociable robots cause in users. Determining the nature of users’ negative experiences permits appreciation of the magnitude of the present problem and also allows for thinking of measures to counter it.

Accordingly, we derive the respects in which social robots can be (held to be) unethical from Honneth’s tripartite analysis of recognition: first, psychologically, in deceptively offering humans an emotional, feeling-based relationship; second, morally, in treating humans disrespectfully and irresponsibly by playing with their feelings and denying them the second-person perspective that entails an equal standing; and third, existentially and socially, in placing humans in an undignified position, humiliating and degrading them, not recognizing their personal human value or respecting their integrity. The resulting analysis suggests that the effects of unethical behavior in robots can be quite serious. Such behavior is likely to afflict the very core of a person’s existence and threaten the user’s self-confidence, self-respect, or self-esteem, depending on the form it takes in the specific case.

The way we see it, the present problem has its roots in misunderstanding or neglect of fundamental human values in much of robotics, and is symptomatic of research that promotes technologically-driven values rather than knowledge about human conditions (biological, psychological, and cultural). In our view, it is not the robots that are blameworthy but the people behind a design that by its very nature allows for unethical behavior. To avoid the problem, social robotics can construe the interaction relation in a manner that allows for degrees of involvement, and that does not demand users to invest themselves emotionally in the interaction. HRI should not be made to depend on underlying emotional mechanisms that exploit the human propensity to form emotional relationships with others. To emphasize, this criticism does not exclude that social robots may communicate by means of emotion display.

In contrast to contemporary approaches, traditional social robotics emphasizes usability and functionality instead of physical appearance and emotion experience, which means the robots have few social abilities. The goal is not to influence the mental state or mood of users but have robots assume tasks normally performed by humans to offload or distribute the workload. The relation between human and robot is approached from an engineering perspective that considers robots as tools, and emphasizes solutions that enable robots to operate in human environments alongside humans without engaging in social interaction. Evidently, traditional social robotics demands other capacities than contemporary approaches, and rely on different mechanisms that call for other programming and technology than androids and sociable robots. Traditional social robots can interact safely with humans on dedicated tasks in technical industries and in the manufacturing sector, as exemplified by Baxter (Rethink Robotics) which is an industrial robot with some social features, such as animated eyes that indicate what the robot will do next by gazing at an object before it is manipulated.

To conclude, social robotics has not yet found a principled solution to realizing its ultimate goal, viz. to build robots that can assist humans in real social environments and work together with them on joint tasks. There is strong demand from consumers and the market for such robots that would free time for humans to engage in activities that expressly could benefit from human-human interaction within, e.g., healthcare and education, and improve user experiences in the service and tourist sectors. The time for contact and dialog in the public and private social sectors is shrinking to a minimum to increase cost-efficiency, although the positive effects of HHI on health, learning, mental well-being, etc. are documented. Somewhat paradoxically, whereas the advent of functioning social robots would entail a massive increase of technology in everyday life, it may be expected to multiply the opportunities for technology-independent social interaction.

2 From Tool to Collaborator

In a recent interview, Cynthia Breazeal declares that “[R]obots that engage with people are absolutely the future” (Weir 2018). As people have been working on social robots in large scale projects since the 1980s, the remark is telling. The great number of pilot projects that involve androids and sociable robots and the relatively high attention pilots get in media may seem to suggest that social robots soon will appear in our homes or at work and in social institutions at large; however, such scenarios still are distant.

It is not unusual for robots to react to people in a manner that does not involve any components of social interaction. For example, Takayama and Pantofaru (2009) noted that “until recently, our PR2 robots have seen people as mere obstacles in the environment”. Personal space differs from other spatial zones, e.g., the safe buffer zone around a table. Invading personal space is rude and can be unsafe since a person can move unpredictably. Operating in populated environments, social robots need to respect human norms and etiquette to gain acceptance and trust, e.g., including those pertaining to personal space. To deal with personal space, Takayama and Pantofaru (2009) turned to proxemics and research about risks and unsafe zones, and built people-specific sensory systems and detection algorithms; however, their strategy seems a detour.

Within our framework, others are resources that help you navigate space in real time, which means a shortcut for the design. Instead of treating people as obstacles, the robot should detect social and other cues that reveal where the human is heading, whether he or she has seen the robot or will react to it, and choose its behavior accordingly. This is even more evident in joint action where the personal spaces of collaborators coincide as a consequence of emergent dynamic coupling between them, creating a “we-space” (Krueger 2011). Coupling involves the continuous reciprocal coordination and patterning of behavior in time. For example, in jointly carrying an object, each movement needs to depend on the movement of the other as well as the properties of the object (Agravante et al. 2013). We-space gives the agents access to social affordances as well as social norms and values built into artifacts and material infrastructure (Brinck 2014), with new forms of interactive behavior as the result (Brinck et al. 2016; Brinck et al. 2017). The emergence of we-space becomes apparent in situations where the space clearly is created by the interaction. Donner et al. (2017) looked at a collaborative task where a human and a robot together manipulate an unknown flexible object. The dynamics of the task depends on both the co-actors and the unknown object. The joint space only exists once the manipulation starts and the two agents become coupled through a haptic communication channel (Groten et al. 2013). Similar results are obtained from studies of collaborative sawing where the dynamics of the task is learned during on-going joint action (Peternel et al. 2014).

Marsh and Meagher (2016) evidence that others provide resources for interacting with the world, arguing that in humans “the pull to be a coordinating unit with other individuals is fundamental”. They propose that social robots should “tap into the human ability to naturally engage with others”. The view that robots should adapt to humans in order to facilitate intuitive interaction guides much work in social robotics. What this in fact means is uncertain.

Adjustment to others can be passive, active, or pro-active, ranging from the mere mirroring of others’ behavior over probing or testing behavior to prompting. Kühnlenz et al. (2013) regret that social robots are designed to passively mimic human behavior instead of being responsive and react to behavior on-line. For instance, sometimes robots continue taking turns without realizing that the human did not respond (Thomaz and Chao 2012). Indeed, although the technology has been developed to help people, people have to adjust to the technology because it is not well-adapted to human preferences and conditions for interaction. In the words of Breazeal et al. (2005, p. 1):

....so many current technologies (animated agents, computers, etc.) interact with us in a manner characteristic of socially impaired people. In the best cases they know what to do, but often lack the social intelligence to do it in a socially appropriate manner.

As a result, people get frustrated and do not want to continue using the robots. Kühnlenz et al. (2013) stress that it is important that the robot can anticipate human action and proactively trigger behavior that will facilitate interaction and enable the robot to achieve its task.

According to Marsh and Meagher (2016), much work in social robotics relies on a conception of interaction that entails action on somebody as opposed to with somebody and that pictures the robot as an intelligent tool. They suggest thinking of interaction as collaboration that inherently implies working with others (Bratman 1992). This seems reasonable granted that people address robots in the manner that the robot is addressing them.

Fischer et al. (2011) show that human-robot interaction is shaped by the kind of feedback the robot provides. To exemplify, in the absence of linguistic cues, people do not know how to adapt their speech. The functionality of the robot will have a direct impact on the quality of the interaction, and impoverished communicative skills in the robot will cause impoverished interaction. A robot that has been designed as a tool does not invite reciprocity or sharing on the part of the user. In contrast, a collaborative robot that can make use of the human impulse to imitate the robot will have a distinct advantage.

A parallel mechanism can be found in the work of Davidson (1987, 1992) for whom the maxim to act so as to make yourself interpretable by the addressee is key to interpretation and provides the norm for interaction. Davidson’s maxim suggests that an agent’s behavior is shaped relative to the other’s anticipated preparedness to act on the agent’s action relative to the overall goal. It is complementary to Fischer et al.’s mechanism. The latter captures the nature of the influence of the other’s address on one’s own response, whereas Davidson’s maxim covers the nature of the influence of one’s own address on the other’s response.

In their study, Fischer et al. (2011) point to a central mechanism in human social behavior that promises to be highly important for the design of HRI, because it permits disposing of emotional engagement as a means for initiating and maintaining interaction with the robot. It predicts that once the human has entered into the interactive loop with the robot, this loop will perpetuate itself, and does not need to be boosted emotionally or propped externally. We submit that in the presence of a collaborative robot and an imitation function in both human and robot that automatically prolongs collaborative interaction, attentional engagement by gaze, vocalization, and touch is sufficient for HRI. If we are right, the drawbacks of emotional investment related to the design of sociable robots can be avoided.

Below, we will present an alternative approach to social robotics that repudiates the central assumption of the android and sociable robot paradigms that successful HRI requires humanlike robots displaying similar physical and emotional behavior as humans do. Both approaches can be argued to misconstrue the interaction relation around which social robotics revolves. We submit that HRI essentially draws on human recognitional capacities—much like any interaction that involves humans. The present challenge is to determine what capacities are enabling and therefore cannot be left out of the design. We prefer a deflationary approach that allows for flexible bottom-up design and situated and contextualized robotic solutions in real environments, that avoids unrealistic goals, and that meets the ethical and practical demands of users. Therefore we suggest taking embodied recognition as a starting-point for HRI. That recognition is embodied means that it is dependent on the physical constitution of the body and based in sensorimotor processes. To clarify the notion of embodied recognition and flesh out its function in the context of HRI, we start by comparing it to philosophical notions of recognition.

3 Normative Attitude in Embodied Recognition

Hegel’s (1977, 1991) tripartite analysis of intersubjective recognition in terms of love, rights, and solidarity has exerted a strong influence on phenomenology as well as analytic moral philosophy and interdisciplinary social and political philosophy. Hegel argued that self-consciousness arises by mutual recognition, intersubjectivity being constitutively related to the development of subjectivity on the individual, group, and community levels. He furthermore connected the denial of social recognition to social inequality and struggle.

Many philosophers have argued that recognition, or the acknowledgement of the other as individual, is fundamental for self-respect and dignity. It is common to distinguish at least two aspects of recognition that both are considered essential to the self and personhood, one psychological and one normative. The distinction is conceptual (as opposed to empirical), and it is not clear that the two aspects actually can be separated. The former concerns the origin of the self in others’ recognition of its existence and the development of self-consciousness through the progressive dialectic between self and other (Brinck et al. 2017). The normative aspect has its origin in the other person, who responds to the first person’s bids for attention in a manner that causes her to first recognize herself as loved by others, then as having equal rights to others, and finally as having universal human value yet being a unique individual. In sum, we become who we are by seeing ourselves in the eyes of others who respond to our call for attention. Eventually it will become clear that this view has important consequences for how to model the interaction relation between human and robot.

Brandom (2007) is among those who consider recognition an inherently normative attitude, one that signifies a certain way of being together among equals. To recognize someone is to take her to be the subject of commitments and entitlements and capable of undertaking responsibilities and exercising authority (ibid., p. 136). Stronger views hold respect to be of essence to equality (see e.g., discussions in Honneth 2007 and Margalit 1996). This may be respect for the humanity in each person, for a person’s capability of autonomous agency, or for the equal moral standing of persons. Scanlon (1998) maintains that respect has certain internal appeal to those who stand in the relation of mutual recognition to each other, which makes it worth seeking in itself.

Generally, recognition is connected to reason and the ability to respond to blame and reproach by rational argument. Being an autonomous subject in the sense that respect, dignity, and similar concepts suppose is thought to essentially demand capacities for reason-giving such that a person’s authority legitimately can be challenged by demanding her for reasons and questioning the ones she gives (cf. Satne 2014).

It is difficult to understand mutual recognition in HRI along the terms of analytic philosophy that demands quite sophisticated cognitive capacities of the subject. To compare with Scanlon’s position, prima facie robots are unable to seek moral values such as respect for their own sake. One might concede it would be possible for them to seek the instrumental value of reaching a certain goal. To posit an intuitive appeal of value seems irrelevant; it is doubtful that adding it to the system would have any consequences for its function. Moreover, it is not likely that such a strong concept of equal respect follows naturally from basic forms of mutual recognition, which means additional arguments are needed to increase its plausibility. Indeed, it has been argued that norms and normative behavior can be independent of verbal reason-giving practices (Brinck 2014; Roessler and Perner 2015).

Generally, the philosophical concept of recognition is substantial and goes far beyond what reasonably can be expected to ground HRI, at least as we know HRI today. From an ethical perspective, it seems reasonable to opt for a leaner concept that does not entail personal or moral involvement rather than a substantial one. We think that a pragmatic approach to HRI would be preferable that, first, connects values to the needs of end-users and then considers them relative to societal needs and practices, and second, takes the functional aspects of robots into account, discarding values that lead to the design of robots that cannot be realized in a foreseeable future. The deflationary approach we advocate is intended to avoid the ethical problems that surround existing approaches.

From the design-perspective, it is not an end in itself to make robots as human-like as possible—quite the opposite. Robots are artifacts, means to an end to which functionality is key; however, what might seem to complicate the picture is that making the interaction between humans and robots run as smoothly as possible will require designing social robots as equals, although in a restricted sense only. Nevertheless, embodied recognition between human and robot will entail certain desiderata concerning what it means to recognize the other as equal.

Before we go on to explain what normative attitude can be relevant to HRI, let us first briefly introduce the form that embodied recognition can be expected to take in contexts of HRI. Within the social sciences and language and gesture studies, typically, recognition is held to occur on the fly upon encountering another agent—unexpectedly and on the threshold of interaction, as it were. It involves focusing the attention on the other and attending to his or her body manifesting over-all attentional state. The act of recognition signals awareness of the other’s bodily (and mental, if applicable) presence to the self, where the other is just another agent, much like oneself, with similar over-all powers as people in general. Moving from individual to mutual recognition changes the nature of the relation between self and other, from disengaged to participatory, ending in attention contact and mutual recognition. Mutual recognition signals acceptance of the other’s presence and includes addressing the other in the second person, as an equal in the restricted sense and relative to a certain task and environment, viz. somebody who can be co-opted for specific purposes, can be invited to take part in joint action and addressed and responded to in a similar way as oneself.

Recognition is a key psychological mechanism in human-human interaction that can be strongly motivating, and is known to enhance learning and performance, increase trust, resilience, and interest, and promote joint action or collaboration. Mutual recognition is grounded in face-face encounters, the agents being physically present to each other, co-located in space and time and facing one another. Certain behavior appears to have quite specific meaning in these encounters (Brinck 2008). Searching to make and actually making eye contact means inviting the other to engage and respond. Responding to attempts at eye contact by making contact means acknowledging the other’s presence as agent and somebody with whom to communicate. Holding gaze means agreeing to interact, and repeated cursory eye contact while interacting indicates that the stance is upheld. Responding to another’s attention by engaging in turn-taking seems to involve a pragmatic presupposition of interaction on equal terms.

Streeck (2013) describes mutual gaze in the strong terms of a social contract, which implies participants have the concept of a social act of communication, and moreover takes them to be moral agents that understand what obligations and rights a contract engenders. An advantage of Streeck’s interpretation is that once the contract is in place, there is no need for the agents to invest themselves personally or emotionally to maintain interaction. It is just supposed to go on till they have reached the goal. It is an open question whether interaction in fact entails such a strong bond between the participants as a contract would suggest.

While there is a tendency to focus on eye contact in the research on recognition, recognition is multimodal, and may involve, e.g., vocalization as in accentuated imitation of sound or prosody, and touch as in taking each other’s hands, hugging, or letting shoulders and arms rub against each other while calibrating body movements. On an implicit, unaware level of embodied interaction, it is realized in behavioral entrainment and the coordination of sensor and motor action.

We submit that normative attitude has several manifestations in HRI. First, human and robot need to mutually recognize each other as functionally equal with respect to the activity or task and environment in which they are intended to cooperate. We might talk about functional or practical equality, which establishes certain manners of interacting as desirable or adequate and others as undesirable or inadequate. To illustrate, smooth interaction may require gaze coordination in one case, say, a robot that is fetching stuff, and in another case would require moving in pace with a human who is walking or running, e.g., a robot that assists with personal training or jogging tours. Thus, the task definition and environment determines what is required of the robot to be functional. Which exact types of behavior are not pre-set but behavior will develop in the course of the process; however, behavior or features that go beyond the task description are not desirable—either computationally, practically, ethically, or economically (incurring higher costs of research, design, and production). Moreover, functional equality eventually would adapt the interaction so as to maximize the robot’s learning and adaptability. Thus we arrive at a functional or practical sense of equality distinct from the normative and epistemological universal concepts in moral and political philosophy and ones that are pertinent to and recur in philosophical discussion of mutual recognition between humans.

Second, the notion of respect becomes pertinent in the discussion of HRI, because it is necessary to avoid behavior that appears disrespectful to users and to avoid a negative spiral. The more limitations on the means and channels for interaction, the more possibilities to restrict the domain where internal breakdown of the interaction can occur, but this may also restrict functionality. Here we find an externally imposed normativity that can be dealt with practically.

Third, it seems that even the most fundamental form of recognition will be normative if it is mutual: The core notions of reciprocity and responsiveness resist reduction in non-normative terms. At least reciprocity can be described in structural terms of, e.g., turn-taking and can be measured as to timing and quantified as to behavior. Yet the description will not capture the very mechanism that makes the wheel turn and the interaction perpetuate, that makes the participants act so as to prepare and improve the conditions for upcoming behavior, as opposed to merely react to previous events. This is the difference between acting into the future for human and robot together and acting individually or on one’s own out of causal force noticed by Schutz (1967). In this third case, we are dealing with normativity that is internal to the very form of second-person interaction.

In the philosophical framework, agency is a normative matter of responding appropriately to reasons. In the framework of HRI, agency inherits its normativity from an underlying framework, viz. face-face interaction. We suggest that while what counts as responding appropriately to reason is socially constituted through social practices, what counts as an appropriate response in nonverbal multimodal communication at least partly is constituted by tacit patterns that develop very early in life, and according to phenomenology emerges in intersubjectivity via the second-person perspective (e.g., Husserl 2012; Zahavi 2001). If true, this would mean in the positive that HRI has somewhere to look for the grounding of its normativity, and in the negative that it is necessary for designers to be careful because tacit norms run the risk of escaping rational attempts at modeling intersubjective behavior.

To exemplify, gaze behavior has a normative dimension, as in averting the gaze when another attempts to make eye contact or breaking eye contact and turn in another direction. In a coherent context of previous and anticipated actions and events, such behavior can be appropriate. In a number of other contexts, it will imply not just a mechanical breakdown but that something is awry, and if repeated too many times may be experienced as disrespectful by humans.

Thus, face-face interaction being the paradigmatic form of interaction between social robot and human, a deflationary approach nonetheless will have to consider how equality enters into HRI, looking for minimal conditions that do not prematurely restrict the scope of interaction.

It seems reasonable to suggest that the link from type of recognition to type of interaction is a key parameter in HRI: the more demanding the act of recognition that the robot can perform, the more complex the ensuing interaction between robot and human. The rationale for this line of thought is that what you recognize in another individual sets the standard for the ensuing interaction. It constrains how you will respond and what kind of response your own actions anticipate in the other—perhaps you expect a rational self that is tracking objective truth, thus you engage in verbal dialog and logical reasoning, or a subject of normative statuses along the lines of Brandom (2007). If acknowledgement of the other as an autonomous agent might constitute the baseline for recognition in HHI, weaker forms that downgrade the interpersonal component nevertheless may occur. For instance, the other agent might be approached in an instrumental manner as a means for one’s own personal goals, or pragmatically as a causal agent; the behavior of whom is determined by previous events, or again, emotionally as an agent driven by affect and desire, adapting to others’ attitudes but not their reasons. What we need to remember is that robots draw on our relational skills and attitudes, and mirror our own behavior—in fact, much like humans do.

Granted the central position of recognition in HHI and the massive work on it in philosophy and phenomenology, it is surprising that the notion has not drawn more attention in the research on HRI, particularly in the research about social robots that are designed to physically interact with humans. Although in the last decade the intersubjective or interpersonal aspects of interaction have come to the fore in social robotics, its focus on interaffectivity has drawn the attention from other skills, e.g., perceptual and motor abilities, that might underlie more productive forms of HRI than those couched in interaffectivity.

There is not much work on recognition in robotics and we do not know of any robots that fully satisfy the requirements for recognitional capacities. Although many robots today show some aspects of recognition, it is not an inherent property of their control systems. Instead, recognition has been added in specific situations. For example, a robot may greet a human because that is part of a behavioral script, but if this particular behavior is not a component in trying to establish mutual recognition, the greeting serves no joint purpose. This method reflects a very different conception of the role of recognition for HRI than ours. We take recognition to be key to social interaction and joint action. The act of mutual recognition marks the on-set of interaction by proposing a first attempt at equality that will set the stage for the ensuing process.

Recognition is key to successful cooperation with robots because once mutual recognition has been established between agents: (1) the others become resources for me in a way that promotes equality and sharing, which in turn enables (2) more and other information to be shared and exchanged in HRI. We submit that acknowledging that recognition is key to successful interaction will lead to significant advantages for HRI, partly because it will change the expectations that the human has on the robot, from instrumental and empirical to normative (Bicchieri 2006). Normative expectation entails an understanding of the social robot as somebody you can engage meaningfully with and involves taking a certain stance toward the other as worthy of listening and adjusting to—as it seems, based in the principle that you treat the other in the way you want the other to treat you.

HRI that rests on empirical expectations about how the robot will behave together with beliefs about instrumental and causal properties of the target will place the robot in the role of tool instead of collaborator. This has negative consequences: Specifically, the human

  • will not care to tailor its actions to the robot so as to facilitate for its next action,

  • will calculate robot behavior by observation instead of engagement independently of how the robot may perceive the human’s behavior,

  • will not aim for reaching the goal together which means missing an opportunity to speed up and simplify the interaction,

  • will not take the robot seriously as a cooperating partner or co-worker or co-assistant, which may affect overall attitude and result in less effort being put in to achieve the goal.

Next we will describe recognition from two separate points of view: the first one concerns what might be called the cognitive aspects of recognition and the second one what might be called its phenomenological aspects. This approach reflects the empirical research on recognition that tends to focus on two separate sides of recognition. Reciprocity is central to both. The cognitive aspect captures the one horn of recognition resulting in identification and entrainment. The normative aspects capture the other side of the horn to do with the responsiveness that leads to mutual engagement. The aim is to ground the precious theoretical discussion in the explanation of how mutual recognition is empirically realized in behavior and can fulfill its function. It is of crucial importance to our approach that such behavior already is within reach for today’s autonomous robots.

4 The Cognitive Dimension of Mutual Recognition—Recognition in the Robotics Perspective

Mutual recognition starts with identification: the process that assigns certain properties to the other individual that grounds future interaction and allows for having expectations about how the other will engage in a mutual activity. There are two ways in which this can occur.

The first and most important one is when somebody is identified based on immediately perceptible attributes and dynamic properties and is used to form expectations about the forthcoming interaction. This type of identification is essentially reactive and depends on information available here and now. Typically it is multimodal and includes the perception of movement and action, gaze, vocalizations, and emotion. It can be used to infer the goals of the other individual on a shorter or longer time scale.

The second type of identification is anticipatory and depends on previous interactions and on the other individual as belonging to a particular group or having a particular role in the current situation. Initial expectations of the interaction based on previous encounters potentially can contribute to faster entrainment. Accordingly, identification is fundamentally related to predictive perception (Cavanagh 1997; Nijhawan 2002). To perceive the present, you need to predict from recent sensory information to compensate for delays in the perceptual system. Identification is thus not unique to the recognition of others, but is an intrinsic function of perception. You see what you expect, and in the extension what you hope to achieve in the next step. This insight has been adopted in robotics by systems that perceive actions (Demiris and Johnson 2003). Identification does not require that every aspect of the other individual is identified, but is highly context-dependent and shaped by the kind of interaction that will or may occur.

Identification is followed by confirmation to the other that identification has taken place. Confirmation can range from explicit signals, such as a greeting, to very subtle signals such as slightly altered movements that show that the other is now part of what the Gestaltists would call a common fate. In some cases, it may be necessary to probe the other to establish confirmation. This is the role of a quick “Hello” directed toward somebody that does not appear to react to your presence. Confirmation entails attending to the other while simultaneously being attended to.

Mutual recognition has been achieved when one individual shows that its behavior can be influenced by the other. It is dependent on the individual’s willingness to be moved. Common signals are mirroring of posture, movement, or voice. To illustrate, Kühnlenz et al. (2013) showed that people increased their helpfulness toward a robot if it tried to match the mood of the human. Achieving mutual recognition also is crucial for establishing a sensitivity to another’s social intentions and social affordances (Becchio et al. 2010; Krueger 2011). Subtle movements and facial expressions as well as exact timing can be detected only once a baseline of dynamic interaction has been established, since it is the deviation from that baseline that constitutes the communication.

Turn taking is of great importance for the third and final step. When successful, the ongoing regulation of who is leading and who is following reinforces recognition and in the ideal case, this regulation is included in the expectations that the two individuals have about each other. Recognition leads to a dynamic coupling of human and robot such that they become one system. When two individuals simultaneously are able to identify each other, they are no longer two separate cognition-action systems, but more accurately are seen as one. At this stage, it is no longer possible to consider each individual on its own, since the behavior and actions of each will depend on those of the other.

5 The Phenomenological Dimension of Recognition—Recognition in the Human Perspective

As both as a phenomenological and psychological experience, recognition in human face-face interaction is set apart by its immediacy and presence. It is primordial, or foundational, and pre-reflexive, or independent of reflection. Immediacy concerns the speed with which it appears—or does not do so, when an expected response fails to appear. It pertains to the subject’s direct physical and bodily reaction to the other’s attention and interest as physically manifested in, e.g., direct gaze to the face or eyes, a change to the color of the face, body orientation, posture, and intention movement. Its absolute presence and (epistemological) transparency makes the act of recognition a strong social signal, and furthermore explains its vocative and imperative aspects (Brinck 2008), viz., the fact that it summons the other agent personally to take part in the interaction with you. It is in your face (Gallagher 2014): You cannot deny it (Lipari 2012).

Presence manifests itself in two ways: First, the experience of being recognized by another individual tends to be strong and embodied and rarely goes unnoticed: it is a lived quality. Second, once mutual recognition has occurred, as long as it stays operative, it will be bodily manifested in the manner in which the agents are addressing each other, as demonstrated by vocalization, bodily orientation, mutually adaptive intention movements, eye contact, and more. Developing perceptual sensitivity to the timing of social contingencies is a critical social skill and the bedrock of social interaction (Crown et al. 2002).

Engagement grants the agents access to fine-grained social information that is not available from a detached point of view and makes them aware of subtle social contingencies, such as degree of attentiveness, duration of a pause, or coordination of expression, movement, and motion. Reddy and Morris (2004, p. 658) refer to “widening of the eyes, partial opening of the mouth, sudden stilling of the limbs, the quality of the attention directed to us—in invitation or response to us”. These subtle contingencies signal an agent’s expectations and function to ground contextualized or idiosyncratic routines, procedures, and values that streamline or facilitate interaction (Brinck 2014; Rączaszek-Leonardi and Nomikou 2015). Failure or neglect to respond to them in the appropriate way, viz., as expected by the other, will disrupt the interaction. Thus while shared expectations incur a significant gain for the quality of interaction, breakdown is likely to bring on an equally important cost.

Reciprocal mutual recognition leads to attentional engagement that can be observed, perceived, enacted, and engaged (Reddy 2008, 2011). In making eye contact, I will attend to you attending to me and vice versa, both of us attending to the way in which we attend to each other, thus implicitly agreeing to engage, jointly committing to interact. This chain of events illustrates that recognition is a process in time with different phases that involves a rapid alternation between stances. Arguing that immediate face-to-face communication can take the form of I-you relation, Husserl (1973a, 1973b) observes the motivational component of mutual addressing. Each is aware of his or her self as being attended to or addressed by the other in the second person of the “You”, and conversely, is aware of himself or herself as addressing the other as his or her You in the first person. This motivates them to interact and eventually establish a unity. Husserl maintains that this unity is of a particular kind that excludes that the participants would be given to each other as opposite, or as other and different.

Several factors contribute to make face-face interaction open, reciprocal, and responsive. First, it implicates each participant personally and unconditionally in relation to each of the others. Response tends to be immanent. The physical presence of the first agent literally resonates in the other’s body. Second, each participant sees (hears, feels, etc.) that he or she is perceived in some way by the other(s), very likely is seen to be seeing this, and furthermore can see that he or she is so seen (Goffman 1963). What takes place between the participants will be mutual knowledge among them; everything between them literally is out in the open (Carpenter and Liebal 2011; Peacocke 2005). Third, each participant automatically takes the corresponding roles of agent and receiver, speaker and hearer, giver and taker, etc., and mutually adjust their actions (Goffman 1963). Together, these factors sustain the chain of overlapping actions the agents are producing together and perceive as meaningful, and as a rule enhance the efficiency and quality of the interaction.

Goffman (1963, p. 95) maintains that once a set of participants have avowedly opened up for engagement, this maximizes the opportunities to monitor one another’s perceiving. This is evidenced by Kendon’s (1990) studies of patterns of behavior in focused encounters that show how people co-present in a given activity and behavior setting tend to organize themselves into spatial patterns and cooperate to maintain the patterns relative to the shared purpose or style of the interaction. The patterns establish a shared (“we”) space between the participants to which they have equal access both physically and in the epistemic sense.

6 Summary of Conclusions

We have argued that mutually adaptive interaction relies on fundamental recognitional capacities. It involves the robot as an equal as opposed to a tool, and requires that the robot is susceptible to similar situational cues and behavior patterns as humans are. Recognition, or the acknowledgement of the other as indidvidual, is fundamental to mutually adaptive interaction. In physical environments, recognition is embodied: it depends on sensorimotor cognition and perceptual attention. Embodied mutual recognition lies at the bottom of simple forms of face-face interaction, and arguably is the original form of recognition from which more complex notions can be derived. It leads to a dynamic coupling of human and robot behavior built around timing and behavioral contingencies such that they become one system. Recognition can be described from both a cognitive and phenomenological perspective. Its cognitive aspects result in identification and entrainment. Its phenomenological aspects ground responsiveness and complementarity and end in mutual engagement. All the required behaviors are attainable in social robotics today. To end, we propose that embodied recognition is key to successful cooperation with robots and HRI would benefit from implementing recognition as a fundamental ability of any social robot.