testing and ai – Thosha's AI Blog

Mady Delvaux, in her draft report on robotics, advises the EU that robots should be carefully tested in real life scenarios, beyond the lab. In this and future articles, I will examine different aspects of social robot requirements, quality and testing, and try to determine what is still needed in these areas.

Why test social robots?

In brief, I will define robot quality as: does the robot do what it’s supposed to do, and not do what it shouldn’t. For example, when you press the robot’s power button from an offline state, does the robot turn on and the indicator light turn green? If you press the button quickly twice, does the robot still exhibit acceptable behaviour? Testing is the activity of analysis to determine the quality level of what you have produced – is it good enough for the intended purpose?

Since social robots will interact closely with people, strict standards will have to be complied with to ensure that they don’t have unintended negative effects. There are already some standards being developed, like ISO13482:2014 about safety in service robots, but we will need many more to help companies ensure they have done their duty to protect consumers and society. Testing will give insight into whether these robots meet the standards, and new test methods will have to be defined.

What are the core features of the robot?

The first aspect of quality we should measure is if the robot fulfils its basic functional requirements or purpose. For example, a chef robot like the robotic kitchen by Moley would need to be able to take orders, check ingredient availability, order or request ingredients, plan cooking activities, operate the stove or oven, put food into pots and pans, stir, time cooking, check readiness, serve dishes and possibly clean up.

A robot at an airport which helps people find their gate and facilities must be able to identify when someone needs help, determine where they are trying to go (perhaps by talking to them, or scanning a boarding pass), plan a route, communicate the route by talking, indicating with gestures, or printing a map, and know when the interaction has ended.

With KLM’s Spencer the guide robot at Schiphol airport, benchmarking was used to ensure the quality of each function separately. Later the robot was put into live situations at Schiphol and tracked to see if it was planning movement correctly. A metric of distance travelled autonomously vs non autonomously was used to evaluate the robot. Autonomy will probably be an important characteristic to test and to make users aware of in the future.

Two user evaluation studies were done with Spencer, and feedback was collected about the robot’s effectiveness at guiding people around the airport. Some people, for example, found the speed of the robot too slow, especially in quiet periods, while others found the robot too fast, especially for families to follow.

Different environments and social partners

How can we ensure robots function correctly in the wide variety of environments and interaction situations that we encounter everyday? Amazon’s Alexa, for example, suffers from a few communication limitations, like knowing if she is taking orders from the right user and conversing with children.

At our family gatherings, our Softbank Nao robot, Peppy, cannot quite make out instructions against talking and cooking noises. He also has a lot of trouble determining who to focus on when interacting in a group. Softbank tests their robots by isolating them in a room and providing recorded input to determine if they have the right behaviour, but it can be difficult to simulate large public spaces. The Pepper robots seem to perform better under these conditions. In the Mummer project, tests are done in malls with Pepper to determine what social behaviours are needed for a robot to interact effectively in public spaces.

The Pepper robot at the London Science Museum History of Robots exhibition was hugely popular and constantly surrounded by a crowd – it seemed to do well under these conditions, while following a script, as did the Pepper at the European Robotics Forum 2017.

When society becomes the lab

Kristian Esser, founder of the Technolympics, olympic games for cyborgs, suggests that in these times, society itself becomes the test lab. For technologies which are made for close contact with people, but which can have a negative effect on us, the paradox is that we must be present to test it and the very act of testing it is risky.

Consider self-driving vehicles, which must eventually be tested on the road. The human driver must remain aware of what is happening and correct the car when needed, as we have seen in the case of Tesla’s first self driving car fatality: “The … collision … raised concerns about the safety of semi-autonomous systems, and the way in which Tesla had delivered the feature to customers.” Assisted driving will probably overall reduce the number of traffic-related fatalities in the future and that’s why its a goal worth pursuing.

For social robots, we will likely have to follow a similar approach, first trying to achieve a certain level of quality in the lab and then working with informed users to guide the robot, perhaps in a semi-autonomous mode. The perceived value of the robot should be in balance with the risks of testing it. With KLM’s Spencer robot, a combination of lab tests and real life tests are performed to build the robot up to a level of quality at which it can be exposed to people in a supervised way.

Training robots

Over lunch the other day, my boss suggested the idea of teaching social robots as we do children, by observing or reviewing behaviour and correcting afterwards. There is research supporting this idea, like this study on robots learning from humans by imitation and goal inference. One problem with letting the public train social robots, is that they might teach robots unethical or unpleasant behaviour, like in the case of the Microsoft chatbot.

To ensure that robots do not learn undesirable behaviours, perhaps we can have a ‘foster parent’ system – trained and approved robot trainers who build up experience over time and can be held accountable for the training outcome. To prevent the robot accidentally picking up bad behaviours, it could have distinct learning and executing phases.

The robot might have different ways of getting validation of its tasks, behaviours or conclusions. It would then depend on the judgement of the user to approve or correct behaviour. New rules could be sent to a cloud repository for further inspection and compared with similar learned rules from other robots, to find consensus. Perhaps new rules should only be applied if they have been learned and confirmed in multiple households, or examined by a technician.

To conclude, I think testing of social robots will be done in phases, as it is done with many other products. There is a limit to what we can achieve in a lab and there should always be some controlled testing in real life scenarios. We as consumers should be savvy as to the limitations of our robots and conscious of their learning process and our role in it.

	dettifossit on European Robotics Forum 2017
	dettifossit on European Robotics Forum 2017
	Dettifoss IT Solutio… on European Robotics Forum 2017
	abid1509 on European Robotics Forum 2017
	Zana Diekman on European Robotics Forum 2017