NCTU-GTAV360: Spherical Video Action Recognition Dataset

Commlab NCTU


Abstract

NCTU-GTAV360 is new 360° action recogntion video dataset captured from Grand Theft Auto V (GTA V) game with RiotMode enabled. It will gave the pedestrians in the game have weapons and attack each other. Despite many action recognition video dataset available right now, none of them are in spherical projection. The spherical video is obtained by stitching views from various angles, combined them into a video. Script Hook V and Script Hook V .NET was used to control the camera rotation and position automatically. The benefit of using 360° camera is it can capture more angles than the normal one. The spherical video captured in 200 locations within the Los Santos city (the city name in the GTA V). We believe this dataset will also help other research in spherical domain, especially on action recognition.

Video Samples

Note: the videos shown above may be compressed by YouTube. You can get the full resolution of video above here:

Resources

  • Paper:
    S. Ardianto, and H-M. Hang. "NCTU-GTAV360: A 360° Action Recognition Video Dataset". IEEE 21st International Workshop on Multimedia Signal Processing. 2019.
  • Data:

Data Format

Video Files:
The videos are stored in mp4 at 20 FPS using ffmpeg software. File name format is <location number>.mp4. For example 123.mp4 is the video for location 123.
The video resolution is 4096x2048.

Label Files:
  • Pedestrian:
    Each frame will have one txt file consist n row lines. Which n is the number of nearby pedestrians. File name format is <frame number>.txt.
    Every line contains name of the action and 26 x 3 (x,y,z) skeleton joint location in meter.

    Example:
    Fleeing 78.72278 26.1908 0.6010818 78.66736 26.16058 0.5071983 78.55249 26.11688 0.08034134 78.54089 26.10248 -0.02093887 78.58105 26.13025 0.2761879 78.68774 26.17072 0.4654961 78.58105 26.13025 0.2761879 78.59106 26.35889 0.4388542 78.46826 26.41992 0.15522 78.75415 26.41547 0.1192818 78.77661 26.39716 0.1379128 78.81555 26.35748 0.146553 78.84436 26.33582 0.1467934 78.74158 25.96515 0.4465485 78.57336 25.73566 0.3207016 78.72839 25.79449 0.09486008 78.75525 25.8125 0.08260345 78.78296 25.84119 0.04288483 78.80005 25.85321 0.0136261 78.48596 26.17224 -0.1030998 78.34131 26.00775 -0.4455719 78.01257 25.75787 -0.4383545 77.96021 25.69379 -0.5770416 78.58215 26.00708 -0.08285522 78.82849 26.16333 -0.3659515 78.82947 26.22046 -0.7749634
    Running -41.27747 -36.43158 -22.17011 -41.25598 -36.45203 -22.28863 -41.23596 -36.46753 -22.70668 -41.24634 -36.46429 -22.77118 -41.24829 -36.45447 -22.53161 -41.29114 -36.43036 -22.32133 -41.24829 -36.45447 -22.53161 -41.35803 -36.58032 -22.36565 -41.34436 -36.67072 -22.63333 -41.50012 -36.58838 -22.80621 -41.51758 -36.59198 -22.81009 -41.55774 -36.60358 -22.83668 -41.59277 -36.60132 -22.86186 -41.16589 -36.31537 -22.32129 -41.18188 -36.10687 -22.53466 -41.37317 -36.2724 -22.63134 -41.3844 -36.28125 -22.61987 -41.39966 -36.32819 -22.62388 -41.41809 -36.36261 -22.64219 -41.28174 -36.55688 -22.83057 -41.276 -36.56982 -23.26937 -41.17969 -36.61194 -23.65543 -41.30688 -36.58838 -23.7084 -41.19495 -36.38867 -22.84185 -41.3324 -36.32819 -23.25438 -41.18396 -36.45209 -23.60468
  • Vehicle
    Each frame will have one txt file consist n row lines. Which n is the number of nearby vehicles. File name format is <frame number>.txt.
    Every line contains type of the vehicle, position (x,y,z), and its dimension (width, length, height) in meter.
    Example:
    Car 581.0228 39.24371 92.82274 2.010509 5.55837 1.650319
    Bike 502.468 19.51715 88.9951 2.000296 4.716197 1.666888
  • Weather
    Each locations will have one weather file stored in txt which only contains one line of what weather showing on that location. File name format is <location number>.txt.
    Example: Overcast

Capture Method

  1. Capture 24 views
    Since the game did not support to capture 3600 images, so we built a program to move the camera to look at various angles. The camera rotation range in the game is -600 to 600 for X-axis, and -1800 to 1800 for Z-axis. Y-axis only used for direction instead of rotation. In this case, we used 400 interval for X-axis, and 600 interval for Z-axis. The camera cannot be moved when the game is paused. Hence, we need to slow down the game time and continuously pause then play the game again while moving the camera to capture the scenes.

  2. Stitch into equirectangular projection
    We used PTGui Pro software to stitch the 24 images into equirectangular projection. There are many spherical projections available right now, however, this projection is the most popular one in representing the 360° image or video. Many companies, like Facebook, YouTube, GoPro, Ricoh, are using this projection in their product. Figure 3 shows the position of the images when they are in the equirectangular projection.

    Images from previous step stitched into equirectangular projection with its guideline:


    Details on each image warped into equirectangular projection:
    No Image Warped Yaw Pitch Roll No Image Warped Yaw Pitch Roll No Image Warped Yaw Pitch Roll
    00-60-0.06 8-119.9-19.9-0.08 16120.119.90.0005
    1-59.9-59.9-0.02 9-179.9-20-0.1 1760.119.90.08
    2-119.8-59.90.2 10120.1-20.1-0.07 180.3600.1
    3-179.6-60-0.3 1160.1-200.03 19-59.9600.09
    4120.2-60.1-0.2 120.1200.1 20-119.860.10.2
    560.1-60.1-0.03 13-59.920.10.05 21-18060.1-0.1
    60.07-200.06 14-119.920.10.006 2212059.9-0.2
    7-59.9-19.90.02 15-179.920-0.05 2360.3600.4
  3. Combine into video
    There are many video encoding format, such as mp4, mkv, avi, etc. The images from previous steps are stored in jpg format. We choose the mp4 to store the images into video. FFmpeg software used to combine the images into video. The video result will have same resolution with equirectangular projection images that is 4096x2048.

Dataset Statistics

  • Weather statistics

    Weather # location Percentage (%)
    Overcast 87 43.5
    Smog 56 28
    ExtraSunny 43 21.5
    Clouds 12 6
    Raining 2 1
  • Vehicle statistics
    Type Number of vehicle Percentage (%)
    Car15,472,69197.81
    Bike143,2560.91
    Bicycle107,3840.68
    Helicopter39,1350.25
    Quadbike26,7770.17
    Boat20,6830.13
    Vehicle5,0280.03
    Train4,4810.03
  • Actions statistics
    Action Close (<5m) Near (5m-20m) Far (>20m) Full
    # % # % # % # %
    OnFoot35,81039.50459,85827.4413,778,50849.5314,274,17648.25
    AimingFromCover8,5639.44475,61828.384,790,40717.225,274,58817.83
    Prone10,80211.91183,80910.973,154,11211.343,348,72311.32
    Running4,4584.9267,5964.032,225,3988.002,297,4527.77
    Walking8,8459.7667,8654.051,479,7945.321,556,5045.26
    Fleeing1,0051.1181,2034.84731,3022.63813,5102.75
    Stopped19,97822.04296,49717.69718,5892.581,035,0643.50
    Swimming00.0000.00412,3051.48412,3051.39
    GoingIntoCover1,0061.1122,2101.33250,9530.90274,1690.93
    Falling00.00810.0066,3470.2466,4280.22
    Unknown1770.2017,5171.0558,8510.2176,5450.26
    Ragdoll00.003,1640.1945,5900.1648,7540.16
    Climbing00.002850.0245,0330.1645,3180.15
    BeingStunned00.0000.0025,8990.0925,8990.09
    GettingUp00.001880.0111,2210.0411,4090.04
    OnBike00.00200.0010,9970.0411,0170.04
    SwimmingUnderWater00.0000.004,6540.024,6540.02
    Reloading200.02900.013,8440.013,9540.01
    Vaulting00.00920.013,1220.013,2140.01
    GettingIntoAVehicle00.0000.005310.005310.00
    Jumping00.0000.00560.00560.00
    DoingDriveBy00.0000.0010.0010.00