[Day 152] SVR & STTCM - Two architectures for taxi demand prediction

 Hello :)
Today is Day 152!


A quick summary of today (papers read):


Updated graph of read papers from Obsidian:

The first paper introduces Spatial-Temporal Tree Convolution Model (STTCM)

Introduction

Spatial-temporal dependencies in taxi demand reflect changes across different locations and times. Temporally, demand follows city rhythms, being higher during rush hours and lower late at night, with variations between weekdays and weekends. Spatially, demand varies by area type, such as high demand in business districts during rush hours and in nightlife zones late at night. These dependencies interact; for instance, high morning demand in business districts is linked to the time of day, just as late-night demand in hotspots depends on it being late night. Taxi demand is influenced by multiple historical sequences and spatial dependencies. Temporally, demand is related to previous adjacent moments and shows periodic changes. Historical inflows also impact future demand, as passengers often return by taxi. Spatially, regions close to each other tend to have similar demand patterns and more taxi flows (picture). However, demand can also be affected by distant areas with potential spatial dependencies, such as high morning demand from residential to work areas and evening demand in the opposite direction on weekdays.






Definition of a tree matrix


Model design
The design consists of four modules that capture the above defined four spatio-temporal characteristics.


Spatial-temporal convolution module
For spatial characteristics, GCN is widely used but its performance falls off when it comes to discerning the local path features and the hierarchical features among different regions. In terms of temporal characteristics - the LSTM is widely used. However, its iterative training process is time-consuming.
This paper introduces a novel block - spatio-temporal convolutional block. It consists of a temporal gate convolution layer (for temporal characteristics) and a tree convolution layer (for spatial characteristics).

Temporal Gated Convolution Layer
The CNN captures the influence of adjacent time-step features on the current time-step features, and then the GLU further improves the non-linearity of the data.

Spatial Tree Convolution Layer
Tree convolution aims to construct a multi-layer convolutional structure for tree matrices to fully explore spatial relationships between nodes. The spatial tree matrix consists of multiple plane tree matrices, each processed independently by convolution. The process starts with inputting the bottom two rows of a plane tree matrix into a CNN to obtain hidden layer node features. These features are then weighted and fused with the original features to preserve information. This process repeats until all layers' features are fused, resulting in high-level features for each plane tree matrix. The final output combines these high-level features from all plane tree matrices.

Tree convolution uses a [2 × 1] convolution kernel to translate from the bottom to the top of a tree matrix, capturing local path and hierarchical features among nodes. Each column in the tree matrix, generated by the BFS algorithm, stores connectivity paths from the root to leaf nodes. Each path undergoes an independent convolution process, transforming connectivity features into high-level features without intersecting with other paths. The high-level features from all paths are concatenated to represent local path features. This method ensures independent convolutional space for each path, preventing noise from other paths and enabling accurate analysis of latent attributes. This approach offers a more precise understanding of potential features compared to traditional aggregate-style convolution methods. 

Tree convolution captures hierarchical features using a plane tree matrix generated by BFS. This matrix clearly stores hierarchical relationships among nodes, with the root node in the first row, directly connected nodes in the second row, and indirectly connected nodes in subsequent rows based on the number of indirect connections. The convolution kernel translates row-by-row, starting from layers with more indirect connections. This process allows tree convolution to interpret both direct and indirect hierarchical relationships effectively. 
During conventional convolution, some information loss is inevitable, especially noticeable when processing data from root nodes and their direct connections in a graph structure. Root nodes and directly connected nodes retain more relevant information due to their limited participation in the convolution process, leveraging the convolution's inherent properties to reduce information loss and enhance accuracy and robustness. This hierarchical processing strategy forms higher-level features through tree convolution, incorporating varying degrees of hierarchical relationships between nodes. This approach allows for granular feature extraction, effectively revealing complex hidden information within node data in graph structures.

Forecasting process

Experiment

Results

Future research
  • There is inherent sparsity in the constructed tree matrix.
  • Incorporating auxiliary factors like holidays and weather conditions, can also form ideas for future improvements upon STTCM.
  • Incorporating new structures that can better capture and reflect time-varying trends in inter-regional travel flows and traffic flows within urban road networks.


The second paper was referenced in the above one so I decided to read it as well - talking about Support Vector Regression

Introduction

Taxi-out time refers to the duration an aircraft spends moving on the ground from the departure gate to the takeoff runway. It starts when the aircraft is pushed back from the gate and ends when the aircraft's wheels lift off the ground during takeoff.

This period includes various activities such as taxiing to the runway, waiting in line for takeoff clearance, and any other ground movements necessary before departure. It's an essential factor in understanding airport operations and flight efficiency since longer taxi-out times can contribute to delays and congestion, affecting both airport capacity and overall flight punctuality.

Taxi-out time prediction techniques

Generalised Linear Model

GLM is a flexible generalisation of OLS that allows for response variables with error-distribution models other than normal distribution. GLM relates the linear model to the response variables through a link function and by allowing the magnitute of the variance of each measurement to be a function of its predicted value.

Softmax Regression Model

SR is a generalisation of logistic regression allowing for multiple (more than two) output classes.

Artificial Neural Network

Improved Swarm Intelligence Algorithm Based Prediction Approaches

Support Vector Regression

SVR is a non-linear regression forecasting method where the input variables are mapped into a high-dimensional linear feature space, commonly through a kernel function. In this higher dimensional space, the training data can be approximated to a linear function, and the global optimal solution is obtained by training of the finite sample.

Particle Swarm Optimization

PSO solves an optimization problem by moving the particles (candidate solutions) over those particles' velocities and positions according to simple mathematical formulae. The position of each particle is updated towards the better-known position driven by its neighbours’, and the global, best performance.

Improved Firefly Algorithm Optimization

The Firefly Algorithm (FA) is a new approach for solving optimization problems efficiently. It works by simulating the behavior of fireflies. In this algorithm, fireflies are attracted to each other based on their brightness and attraction levels. Brightness depends on their location and target value, with higher brightness indicating a better location. Fireflies with higher brightness also have a stronger attraction. If fireflies have similar brightness, they move randomly. Overall, FA is effective in solving optimization problems and can outperform traditional algorithms like Genetic Algorithms (GA).

PSO/IFA Based Support Vector Regression

This study combines swarm intelligence algorithms with SVR for better prediction accuracy. Since the learning data has many more samples than features, the input variables are transformed into Hilbert space using the RBF kernel, which is more effective than other kernels. To accurately predict departure taxi-out time, SVR models need optimal values for penalty factor, RBF kernel parameter, and epsilon. PSO and IFA are used to determine these values beforehand. Incorrect settings could affect training errors, induce overfitting, or lead to less effective learning. 

Performance Measures

RMSE, MAPE, squared correlation coefficient, prediction accuracy


Data

Results



That is all for today!

See you tomorrow :)

Popular posts from this blog

[Day 198] Transactions Data Streaming Pipeline Porject [v1 completed]

[미리 공부] 기초 통계 복습 (Day 1는 1월2일)

[Day 61] Stanford CS224N (NLP with DL): Machine translation, seq2seq + a side CDCGAN mini project