@ArrowLLL 2018-02-27T02:44:16.000000Z 字数 5152 阅读 2607

Study-Note : Pedestrian Trajectory Predition Based on Birdirectional LSTM Classfication

Study-Note OPTIMAL

原文：Bi-Prediction: Pedestrian Trajectory Prediction Based on Bidirectional LSTM Classification

中文理解

问题：对于观察目标得到的过去的轨迹 $X_i^{obs} = [(x_1^i, y_1^i), ..., (x_{obs}^i, y_{obs}^i)]$ ，要预测未来一段时间的轨迹
$X_i^{pred} = [(x_{obs+1}^i, y_{obs+1}^i), ... , (x_{pred}^i, y_{pred}^i)]$

预测方法的名字交给Bi-Prediction，主要有两层含义，一是在路径预测的过程中使用了双向LSTM，二是整个过程有两个阶段。

算法首先预处理场景，人为将场景分为若干区域，这些区域作为场景中轨迹运动的起点和终点。通过(起点，终点)这个二元组将所有的路径分类。因为使用双向LSTM，所以可以认为 $(S_x, S_y)与(S_y, S_x)$ 是同一个类别，由此获得总共 n(n-1)/2 个类别。

第一阶段的过程如下图所示：
image_1c78k7jm78g8bsp16c8uvl1cfgg.png-194.9kB
在这个阶段，算法将双向LSTM和一个CNN结合起来，充分利用前者的预测能力以及后者的分类能力，使用softmax函数获得到到达不同区域的概率。

第二阶段使用基于LSTM的编码器-解码器网络(encoder-decoder network)，LSTM编码器接受一个观察到轨迹作为输入，然后生成一个隐藏状态序列；LSTM解码器解析这个隐藏状态序列获得输出。考虑这样一个问题：如果每一条轨迹使用一个LSTM，计算的代价会非常得大；如果所有轨迹使用一个LSTM，则网络并不能学习到所有行人的运动模式。为了平衡这一点，算法仅对每一个路径类别采用不同的LSTM。

如下图所示:

image_1c795pja81m1rqu71t7u79n8bn9.png-106.7kB

在第一阶段获得每个轨迹类别的概率，如果大于一个阈值 $\tau$ ，则使用对于应类别的LSTM预测器轨迹。

Introduction

Existing research work on predestrian trajectory prediction can be divide into model based and Long Short Term Memory(LSTM) based methods.

Ideally, with multiple entry and exit points present in most scenes, atrajectory prediction method should predict multiple possible trajectories heading toward different destinations and each predicted trajectory should have a probability measure indiction how likely the trajectory would be taken.

we propose a novel two-stage trajectory prediction method to overcome the above shortcomings. After partitioning the scene into several regions, the first stage of our method is using bidirectional LSTM classification to predict a pedestrian's possible destination regions, with a short observed trajectory as input. The second stage is choosing differently trained LSTMs to predict the trajectory for each destination region.

This method is named Bi-Prediction, which has twi meanings: using bidirectional LSTM in trajectory prediction and having two stages in the whole process.

Prosed Model

Problem

Given: Observed trajectories $X_i^{obs} = [(x_1^i, y_1^i), ..., (x_{obs}^i, y_{obs}^i)], \forall i$

Objective: Predict future trajectories $X_i^{pred} = [(x_{obs+1}^i, y_{obs+1}^i), ... , (x_{pred}^i, y_{pred}^i)], \forall i$

Our proposed two-stage Bi-Prediction method can learn the potential destinations in a scene and can generate multiple trajectory predictions.

first stage: to predict the destination candidates $D_i$ and the probability of choosing $D_j$ .

second stage: to generate different sequences $X_{i,j}^{pred}$ based on these candidate destinations $D_j$ .

Stage 1: Bidirectional LSTM Classfication

image_1c78k7jm78g8bsp16c8uvl1cfgg.png-194.9kB

To deal with the complex movement patterns of pedestrians, the scene is first partioned into regions.

It is import to note that the route is not well-defined if the trajectory is very short. One way to overcome this is to design a special network that combines a recurrent neural network woth a convolution neural network whereby the former develops the "foreseeing" power and the letter takes charge of classification task to output multiple probabilities of different destination regions.

For the bidirectional LSTM network architecture :

$\overrightarrow{h}_t = LSTM(x_t, \overrightarrow{h}_{t-1};\overrightarrow{W}) \\ \overleftarrow{h}_t = LSTM(x_t, \overleftarrow{h}_{t-1}; \overleftarrow{W}) \\ y_t = W_{\overrightarrow{h}y}\overrightarrow{h}_t + W_{\overleftarrow{h}y}\overleftarrow{h}_t + b_y$
Specifically, $\overleftarrow h_t, \overrightarrow{W}, \overleftarrow{h}_t$ and $\overleftarrow{W}$ are the hidden states and weight matrices of the forward and backward layers respectively.

Stage 2: Prediction with Classification

To predict future trajectories corresponding to different route classes.

When it comes to future trajectory prediction, existing LSTM based methods which use the one-LSTM-one-pedestrian policy would suffice, which make both the training process and the predicting process very computationally expensive; one LSTM for predicting the trajectories of all pedestrians, the network would not be able to learn all kinds of pedestrian movement. As a trade-off, our method uses one sub-LSTM for each route class.

image_1c795pja81m1rqu71t7u79n8bn9.png-106.7kB

If the probability of a route class is larger than a pre-defined threshold $\tau$ , the system automatically selects the corresponding trained sub LSTMs to output the predicted trajectories for each pedestrian.

As for each sub-LSTM, we use a general encoder-decoder network where the LSTM encoder layer receives an observed trajectory as input and generates a hidden state sequence. From the hidden state sequence, the LSTM decoder layer can output a predicted trajectory. The input size and output size of the sub-LSTMs are determined by the length of the observed trajectories and predicting trajectories respectively.

Implementation Details

the forward and backward LSTM layers have a fixed hidden state dimension of 128.
We use a one dimension convilution layer follwoed by one dimension max pooling layer. The size of the max pooling windows is 4.
the hidden states is also 128-dimensional for each sub-LSTM in stage 2.
Model is built using Python on Keras with a Tensorflow backend.
The learning rate is set to 0.003.
Use 20-frame observed trajectories to train each sub-LSTM and predict trajectories of 20 frames long. $T_{obs} = 20 and T_{pred} = 40$
The pre-defined threshold $\tau$ is set to 0.01.

Experiments

Datasets:

New York Grand Central dataset(NYGC)

Edinburgh Informatics Forum(EIF)

Metrics:

Average displacement erroe(ADE)

$ADE = \frac {\sum_{i=1}^n \sum_{t=T_{obs}+1}^{T_pred}\|X_{i,t}^{pred} - X_{i, t}^{obs}\|} {n(T_{pred} - (T_{obs + 1}))}$

Final displacement error(FDE)

$FDE = \frac {\sum_{i=1}^n\|X_{i, T_pred}^{pred} - X_{i, T_{pred}}^{obs}\|} n$

Compare:

two baseline methods and 3 state-of-art pedestrian trajectory prediction methods with which we compare our method are:

image_1c79al9opoec1sfu1k3o1r4g11d8m.png-117.4kB

Linear Prediction

LSTM

Social Force(SF)

Social-LSTM

Attention-LSTM

Bi-Prediction-1

Bi-Prediction-3