Overview
This page introduces the ShopPoint Dataset, a dataset of pointing gestures in customer-shopkeeper interactions collected as full-body skeleton information.
The dataset was collected in a mock-up camera shop scenario, and contains 61 one-to-one customer-shopkeeper interactions and 2959 annotated pointing gestures. It includes two parts: 1) the skeleton data of the original interactions from 15 sensors; 2) the skeleton-based pointing recognition dataset with pointing gestures and non-pointing actions. Additionally, videos and audio from 9 interactions are provided as examples of the interaction procedures.
The figure below shows the different forms of pointing gestures contained in the dataset: straight-arm pointing (left), bent-arm pointing (middle) and hand-only pointing (right).
Data collection and annotation
The collection and annotation of ShopPoint aimed to accurately capture the pointing gestures for the analysis of pointing form usage and the evaluation of recognizing these diverse forms of pointing gestures used in service scenarios.
A camera shop was selected as a representative example of the service scenario. Participants were hired to role-play as shopkeepers and customers to have one-to-one camera-selling interactions, which allowed us to observe realistic customer-service interactions and capture pointing gestures within a controlled setting.
Six main categories of information were annotated for each pointing, as shown in the following table.
Download links
(Downloads will be available soon.)
Skeleton data from original interactions (3.67G)
Skeleton data for pointing recognition (pickle file, 527M) (hdf5 file, 238M)
Interaction Examples (39.6G)
Data description
- Skeleton data from original interactions
The skeleton data of the original interaction contains 61 files, each containing data from one interaction. The data is saved in hdf5 format, containing the following key-value pairs.Two list containing the annotated timing and label of pointing gestures.
- ‘pointing_window_times’: [start_of_raising, end_of_holding, end_of_raising ]
- ‘pointing_window_annotations’: str
Fused skeleton of shopkeeper and customer in this interaction. We averaged data from each sensor to fuse the data.
- ‘skp_timestamps_fused’: ndarray, shape: [t,]
- ‘skp_skeletons_fused’: ndarray, shape: [t, 32, 3]
- ‘cus_timestamps_fused’: ndarray, shape: [t,]
- ‘cus_skeletons_fused’: ndarray, shape: [t, 32, 3]
Original data from each sensor. ‘azureXX’ labels which sensor the data is from. Notice that the timestamps for each sensor may not be continuous.
- ‘skp_timestamps_azure01’ to ‘skp_timestamps_azure15’: ndarray, shape [t,]
- ‘skp_skeletons_azure01’ to ‘skp_skeletons_azure15’: ndarray, shape [t,32,3]
- ‘skp_confidences_azure01’ to ‘skp_confidences_azure15’: ndarray, shape [t,32]
- ‘cus_timestamps_azure01’ to ‘cus_timestamps_azure15’: ndarray, shape [t,]
- ‘cus_skeletons_azure01’ to ‘cus_skeletons_azure15’: ndarray, shape [t,32,3]
- ‘cus_confidences_azure01’ to ‘cus_confidences_azure15’: ndarray, shape [t,32]
The skeleton consists of 32 lists of 3D x,y, and z, representing 32 joints. Check the order of the joints here.
- Skeleton data for pointing recognition
We provide two data formats for the skeleton-based pointing recognition dataset, hdf5 and pickle file.The pickle file has the following structure, which can be directly used with mmaction2.{‘split’: splitID: list,
‘annotations’:[{
‘frame_dir’:str,
‘keypoint’: np.ndarray,
‘label’: int,
‘total_frames’: int
]}‘split’ – the person split of the data, a list of frame_dir.
Each split is like fv_skp1_train, labeling whose data (skp1 ~ cus10) is for which set (train, vali of test).
‘annotation’ – skeleton data identified by frame_dir
‘frame_dir’ – the action’s ID
‘label’ – 0 for non-pointing actions and 1 for pointing gestures.
‘total_frames’ – the length of the skeleton. We use fixed length as 60 frames.
‘keypoint’ – skeleton joints in ndarray, shape [60, 32, 3]If you cannot load the pickle file correctly, please use the hdf5 file. The hdf5 file contains the following key-value pairs.
‘fv_skp1_train’ ~ ‘fv_cus_train’: list
‘fv_skp1_val’ ~ ‘fv_cus10_val’: list
‘fv_skp1_test’ ~ ‘fv_cus10_test’: list
These are the list of person split for the LOOCV training. Each split is a list of frame_dir.‘frame_dir’: list, length 5298
‘keypoint’: ndarray, shape [5298,60,32,3]
‘labels’: list, length 5298
These are the skeleton annotations. ‘fn’ is the length of frame dir. ‘labels’ uses 0 for non-pointing actions and 1 for pointing gestures. - Interaction Examples
Videos and audio of 9 interactions are released with annotations, for which we got the agreement from the participants to release them. The videos and audio are saved in rosbag files, while the annotations are saved in tmv files in YAML format. The name of each bag file corresponds to the skeleton pickle file with the same name.Topic for videos:
Name: /video_1/image ~ /video_5/image
Type: sensor_msgs/CompressedImageTopic for audio
Name: /audo_topic
Type: rospy_tutorials/HeaderStringTopic for skeletons:
Name: /skeleton_32
Type: sensor_msgs/PointCloud
Name: /skeleton_32_maker
Type: visualization_msgs/MarkerPlease check the How to use section below for instructions on how to load and check the videos.
How to use
(1) Below is a short Python example of how to load the skeleton data.
import h5py
import pickle
# import either the hdf5 file
with h5py.File('ShopPoint_skeleton_for_pointing_recognition.h5', 'r') as f:
data = {key: [item.decode() if isinstance(item, bytes) else item for item in f[key][()]] for key in f.keys()}
# or the pickle file
data = pickle.load(open('ShopPoint_skeleton_for_pointing_recognition.pkl','rb'))
(2) For video and audio data we recommend using the TAMSVIZ tool to replay. You can directly load the bags and the annotations from the TAMVIS Tool. To play the audio, use the following script.
playAudioStream.py (Download will be available soon.)
License
The datasets are free to use for research purposes only. In case you use the data in your work please be sure to cite the reference paper below.
Reference
(To be made available soon.)
Contact
yongqiang (at) robot.i.kyoto-u.ac.jp