IFSeg: Image-free Semantic Segmentation via Vision-Language Model

We propose a new image-free semantic segmentation model, referred to as the IFSeg.

K-centered Patch Sampling for Efficient Video Recognition

We propose a patch sampling method, referred to as the K-centered patch sampling, which uses the greedy K-center search for video transformers.

LaPred: Lane-Aware Prediction of Multi-Modal Future Trajectories of Dynamic Agents

We propose a novel prediction model, referred to as the lane-aware prediction (LaPred) network, which uses the instance-level lane entities extracted from a semantic map to predict the multi-modal future trajectories.

Diverse and Admissible Trajectory Forecasting through Multimodal Context Understanding

We propose a model that synthesizes multiple input signals from the multimodal world|the environment’s scene context and interactions between multiple surrounding agents|to best model all diverse and admissible trajectories.

Sequence-to-Sequence Prediction of Vehicle Trajectory via LSTM Encoder-Decoder Architecture

We propose a deep learning based vehicle trajectory prediction technique which can generate the future trajectory sequence of surrounding vehicles in real time.