Dataset: One-To-Many Human-Human Interaction

1 Overview

This webpage contains supplementary data and files to accompany “Autonomously Learning One-To-Many Social Interaction Logic from Human-Human Interaction Data” (see Reference). Notably, it contains: (1) the raw dataset from our in-lab one-to-many shopkeeper-customer interaction (in Japanese); and (2) our neural network code.

2 One-to-Many Camera Shop Dataset

We set up an in-lab data collection scenario to gather data on how a shopkeeper interacts with multiple customers. Specifically, we set up a lab space to look like a camera shop, invited participants to role-play shopkeeper-customer interactions, and recorded their audio and movements. Every scenario had one shopkeeper and between 2-5 customers, each assigned a group and role. The customers and shopkeeper were instructed to role-play realistic camera shop interactions, while our tracking system captured their positions and our speech recognition system captured their audio. The section describes the files included in the open-source dataset; readers are encouraged to read the associated paper (see Reference) for more context. Note that data collection was conducted in Japanese, and hence all shopkeeper and customer speech is in Japanese.

2.1 File Descriptions

  • dataset.sql contains the raw data. It includes the below 10 tables (in alphabet order). Note that each experiment had maximally 6 research participants, assigned unique IDs 1-6, with 1 always being the shopkeeper.
    • camera_locations contains the name, location, and size of the three camera stands (Nikon, Canon, and Sony) and the counter in the room.
    • experiments_metadata contains the experiment ID, startTime, endTime, and the number of customers assigned to that experiment. In addition, it contains the mapping from uniqueID within that experiment (1-6) to participantID. Per customer, It also contains a groupID , which specifies who they came to the shop with, and a role_id, which specifies their assigned purpose for coming to the shop (see Customer_Role_cards.pdf). The possible roles are:
      • L: Landscape Photographer
      • P: Portrait Photographer
      • N: Novice Photographer
      • I: Interested in Cameras
      • W: Window Shopping
      • B: Bored
    • human_tracker_data includes the x (mm), y (mm), and velocity (mm/s) over time for every person (identified by their uniqueID) perceived by our tracking system. Our tracking system operated at a frequency of roughly 50 Hz.
    • participants_metadata includes the participantID, gender, and age for every participant in this research.
    • speech_{1-6} contains the time, transcribed text, and duration (milliseconds) of every utterance made by participant with the uniqueID specified in the table name. Note that time is when the participant began recording their utterance and time+duration is when they stopped the recording; this may not be exactly the period of time they were talking for.
  • Customer_Role_Cards.pdf includes the text for the role and group cards we provided participants before every role-playing scenario. Note that these were only used to induce a variety of behaviors from the customers, and did not factor into our model training or analysis.
  • Information_Sheet_Shopkeeper.pdf lists the features of each camera that the shopkeeper can talk about.
  • Information_Sheet_Customer.pdf lists the features of each camera that customers can ask about.
  • map.jpg shows a top-down view of the room, and map_annotated_{japanese/english}.jpg shows the same view labeled with important locations in the room. The below Python function converts from world coordinates (in mm) to map coordinates (where the origin is the upper-left corner).
def worldToMap(worldX, worldY):
    mapX = (worldX-(-15376.0))/50.0);
    mapY = -1*(worldY-(10449.0))/50.0);
    return mapX, mapY

2.2 Dataset (ZIP) [188MB]

3 Neural Network Code

The open-source neural network code is intended to provide additional model architecture details that were not included in the paper (e.g., hyperparameters). Therefore, the code is not meant to be run; we open-source only the code that defines the neural network architecture. Readers who would like additional code and/or processed data in order to actually train/test the networks should contact us (below).

In this code, the bulk of the work setting up the overall neural network architectures takes place in the initializeTF method. However, although the paper presents multiple neural networks, they all share multiple architectural components. Therefore, the generateAttentionNetwork function lays out the architecture for a general attention network (which is used in multiple of the networks presented in the paper) and generateInteractionNetwork lays out the architecture for a general interaction network.

The code has multiple cases to be run. Only two are used in the networks presented in the paper. self.caseToRunInteractionNetworkAllCustomerInfoAttentionOverCustomersCoupled refers to the case where first the Causal Inference Network is trained, and then the Attention Network is trained. self.caseToRunInteractionNetwork refers to the case where only the Interaction Network is trained. The former case actually generates two neural networks (stored in a dictionary, specified by the following keys): “filteredNetworkKey” refers to the Causal Inference Network (so named because the inputs are “filtered” to only include the timestamps before the shopkeeper acts); and “fullNetworkAttentionKey” refers to the Attention Network. The user must specify which network to use (by providing the key) when they call the train and predict functions.

4 Usage

This dataset and code is free to use for research purposes only. The is not production version; please use it at your own risk. If you use either the dataset or the code in your work, please be sure to cite the reference below.

5 Reference

Amal Nanavati, Malcolm Doering, Dražen Brščić, and Takayuki Kanda. 2020. Autonomously Learning One-To-Many Social Interaction Logic from Human-Human Interaction Data. In Proceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction (HRI’20), March 23-26,2020, Cambridge, United Kingdom. ACM, New York, NY, USA, 9 pages.

6 Related Work

Readers are also encouraged to look at the following works from our lab about learning interaction techniques from human-human data:

6.1 Publications

  • Doering, M., Glas, D.F. and Ishiguro, H., 2019. Modeling Interaction Structure for Robot Imitation Learning of Human Social Behavior. IEEE Transactions on Human-Machine Systems.
  • Liu, P., Glas, D.F., Kanda, T. and Ishiguro, H., 2018. Learning proactive behavior for interactive social robots. Autonomous Robots, pp.1-19.
  • Glas, D.F., Doering, M., Liu, P., Kanda, T. and Ishiguro, H., 2017, March. Robot’s Delight: A Lyrical Exposition on Learning by Imitation from Human-human Interaction. In Proceedings of the Companion of the 2017 ACM/IEEE International Conference on Human-Robot Interaction (pp. 408-408). ACM.
  • Liu, P., Glas, D. F., Kanda, T., & Ishiguro, H., Learning Interactive Behavior for Service Robots – the Challenge of Mixed-Initiative Interaction, Workshop on Behavior Adaptation, Interaction and Learning for Assistive Robotics (BAILAR), 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN 2016), New York, NY, USA, August 2016.
  • Liu, P., Glas, D. F., Kanda, T., & Ishiguro, H., Data-Driven HRI: Learning Social Behaviors by Example from Human-Human Interaction, in IEEE Transactions on Robotics, Vol. 32, No. 4, pp. 988-1008, 2016.
  • Liu, P., Glas, D. F., Kanda, T., Ishiguro, H., & Hagita, N., How to Train Your Robot – Teaching service robots to reproduce human social behavior, in Robot and Human Interactive Communication, 2014 RO-MAN: The 23rd IEEE International Symposium on, pp. 961-968, Edinburgh, Scotland, 25-29 Aug. 2014. doi:10.1109/roman.2014.6926377

6.2 Datasets

Readers can find a similar human-human camera shop dataset with one-to-one interaction data in English here:

7 Contact

Please contact amaln [at] uw [dot] edu with any questions, comments, concerns, and/or inquiries about this dataset, code, or research project more generally.