Metro Regression.

Data imputing example in a Santiago Metro station.

Compute the number of exits in a random metro station. Then, compute the regression over the whole network and plot the error of the regression. Here, Tikhonov regression is used. Lastly, the average of neighboring nodes is also used to compare the error of the regression.

To run this example, you need to download three files and place them in the same directory as this script.

  1. Download the file Tablas de subidas y bajadas nov23.zip from this link:

https://www.dtpm.cl/descargas/modelos_y_matrices/Tablas%20de%20subidas%20y%20bajadas%20nov23.zip

Then, uncompress the zip file and copy 2023.11 Matriz_baj_SS_MH.xlsb to the same location as this script.

  1. Download the file santiago_metro_stations_coords.geojson from this link:

https://zenodo.org/records/11637462/files/santiago_metro_stations_coords.geojson

  1. Download the file santiago_metro_stations_connections.txt from this link:

https://zenodo.org/records/11637462/files/santiago_metro_stations_connections.txt

import os
import sys

import matplotlib.pyplot as plt
import networkx as nx
import numpy as np
import pandas as pd
from unidecode import unidecode

from pygsp2 import graphs, learning
from pygsp2.utils_examples import (fetch_data, make_metro_graph, metro_database_preprocessing, plot_signal_in_graph)

current_dir = os.getcwd()
os.chdir(current_dir)
assets_dir = os.path.join(current_dir, 'data')
fetch_data(assets_dir, 'metro')

try:
    commutes = pd.read_excel(os.path.join(assets_dir, '2023.11 Matriz_baj_SS_MH.xlsb'), header=1,
                             sheet_name='bajadas_prom_laboral')
except FileNotFoundError:
    print(f'Data file was not found in:\n {os.getcwd()}')
    print('Download it from:\n' +
          r'https://www.dtpm.cl/descargas/modelos_y_matrices/Tablas%20de%20subidas%20y%20bajadas%20nov23.zip')
    sys.exit(1)
G, pos = make_metro_graph(edgesfile=os.path.join(assets_dir, 'santiago_metro_stations_connections.txt'),
                          coordsfile=os.path.join(assets_dir, 'santiago_metro_stations_coords.geojson'))
pos_list = [(G.nodes[node]['y'], G.nodes[node]['x']) for node in G.nodes]

# Extract adjacency matrix
W = nx.adjacency_matrix(G).toarray()
G_pygsp = graphs.Graph(W)

# Node degree matrix
D = np.diag(G_pygsp.d)
# Compute inverse for later
D_inv = np.linalg.inv(D)

# Convert to uppercase
stations = [name.upper() for name in list(G.nodes)]
stations = [unidecode(station) for station in stations]
metro_commutes, signal = metro_database_preprocessing(commutes, stations)
signal2 = signal.copy()
station_idx = np.random.randint(0, len(signal2))
# station_idx = 24

print(f'Deleted Station: {stations[station_idx]}')
signal2[station_idx] = np.nan
mask = np.ones(len(signal)).astype(bool)
mask[station_idx] = False

# Use tikhonov regression to recover the signal
recovered_signal = learning.regression_tikhonov(G_pygsp, signal2, mask, tau=0.5)

# Compute the average of the nodes around the missing value
average = (W @ D_inv @ signal)[station_idx]

print(f'Estimated: {recovered_signal[station_idx]:.2f}')
print(f'One-hop Average: {average:.2f}')
print(f'Real: {signal[station_idx]:.2f}')
Deleted Station: ECUADOR
Estimated: nan
One-hop Average: 22676.90
Real: 13824.00
tikhonov_estimation = np.zeros_like(signal)
average_estimation = W @ D_inv @ signal

for i, s in enumerate(signal):
    # Allocate new signal
    signal2 = signal.copy()
    print(f'Deleted Station: {stations[i]}')

    # Delete value in the signal
    signal2[i] = np.nan
    mask = np.ones(len(signal)).astype(bool)
    mask[i] = False

    # Use tikhonov regression to recover the signal
    recovered_signal = learning.regression_tikhonov(G_pygsp, signal2, mask, tau=0.5)
    tikhonov_estimation[i] = recovered_signal[i]

abs_err = np.abs(tikhonov_estimation - signal)
Deleted Station: TOESCA
Deleted Station: RONDIZZONI
Deleted Station: ESCUELA MILITAR
Deleted Station: ALCANTARA
Deleted Station: EL GOLF
Deleted Station: TOBALABA
Deleted Station: PEDRO DE VALDIVIA
Deleted Station: MANUEL MONTT
Deleted Station: UNIVERSIDAD CATOLICA
Deleted Station: SANTA LUCIA
Deleted Station: REPUBLICA
Deleted Station: UNION LATINOAMERICANA
Deleted Station: ESTACION CENTRAL
Deleted Station: CRISTOBAL COLON
Deleted Station: FRANCISCO BILBAO
Deleted Station: PRINCIPE DE GALES
Deleted Station: SIMON BOLIVAR
Deleted Station: BELLAS ARTES
Deleted Station: CUMMING
Deleted Station: UNIVERSIDAD DE SANTIAGO
Deleted Station: SAN ALBERTO HURTADO
Deleted Station: ECUADOR
Deleted Station: LAS REJAS
Deleted Station: PAJARITOS
Deleted Station: SANTA ISABEL
Deleted Station: MIRADOR
Deleted Station: PEDRERO
Deleted Station: CAMINO AGRICOLA
Deleted Station: CARLOS VALDOVINOS
Deleted Station: RODRIGO DE ARAYA
Deleted Station: PARQUE O'HIGGINS
Deleted Station: QUINTA NORMAL
Deleted Station: LOS ORIENTALES
Deleted Station: GRECIA
Deleted Station: LOS PRESIDENTES
Deleted Station: QUILIN
Deleted Station: LAS TORRES
Deleted Station: MACUL
Deleted Station: ROJAS MAGALLANES
Deleted Station: TRINIDAD
Deleted Station: LOS QUILLAYES
Deleted Station: ELISA CORREA
Deleted Station: HOSPITAL SOTERO DEL RIO
Deleted Station: PROTECTORA DE LA INFANCIA
Deleted Station: BELLAVISTA DE LA FLORIDA
Deleted Station: CEMENTERIOS
Deleted Station: ZAPADORES
Deleted Station: DORSAL
Deleted Station: VESPUCIO NORTE
Deleted Station: EL LLANO
Deleted Station: SAN MIGUEL
Deleted Station: LO VIAL
Deleted Station: DEPARTAMENTAL
Deleted Station: CIUDAD DEL NINO
Deleted Station: LO OVALLE
Deleted Station: EL PARRON
Deleted Station: SANTA JULIA
Deleted Station: LA GRANJA
Deleted Station: SANTA ROSA
Deleted Station: SAN RAMON
Deleted Station: MANQUEHUE
Deleted Station: HERNANDO DE MAGALLANES
Deleted Station: LOS DOMINICOS
Deleted Station: BARRANCAS
Deleted Station: LAGUNA SUR
Deleted Station: LAS PARCELAS
Deleted Station: MONTE TABOR
Deleted Station: LAS MERCEDES
Deleted Station: SAN JOSE DE LA ESTRELLA
Deleted Station: PARQUE BUSTAMANTE
Deleted Station: GRUTA DE LOURDES
Deleted Station: BLANQUEADO
Deleted Station: PUDAHUEL
Deleted Station: LO PRADO
Deleted Station: SANTIAGO BUERAS
Deleted Station: SANTA ANA
Deleted Station: PLAZA DE PUENTE ALTO
Deleted Station: SALVADOR
Deleted Station: DEL SOL
Deleted Station: PRESIDENTE PEDRO AGUIRRE CERDA
Deleted Station: BIO BIO
Deleted Station: INES DE SUAREZ
Deleted Station: NUBLE
Deleted Station: SAN PABLO
Deleted Station: PATRONATO
Deleted Station: CERRO BLANCO
Deleted Station: EINSTEIN
Deleted Station: BAQUEDANO
Deleted Station: MONSENOR EYZAGUIRRE
Deleted Station: CHILE ESPANA
Deleted Station: VILLA FREI
Deleted Station: LOS HEROES
Deleted Station: LOS LIBERTADORES
Deleted Station: CERRILLOS
Deleted Station: LA MONEDA
Deleted Station: PLAZA DE MAIPU
Deleted Station: LO VALLEDOR
Deleted Station: ESTADIO NACIONAL
Deleted Station: NEPTUNO
Deleted Station: FRANKLIN
Deleted Station: EL BOSQUE
Deleted Station: OBSERVATORIO
Deleted Station: COPA LO MARTINEZ
Deleted Station: HOSPITAL EL PINO
Deleted Station: FERROCARRIL
Deleted Station: LO CRUZAT
Deleted Station: PLAZA QUILICURA
Deleted Station: PARQUE ALMAGRO
Deleted Station: SAN JOAQUIN
Deleted Station: LOS LEONES
Deleted Station: LA CISTERNA
Deleted Station: MATTA
Deleted Station: HOSPITALES
Deleted Station: PLAZA CHACABUCO
Deleted Station: CONCHALI
Deleted Station: VIVACETA
Deleted Station: CARDENAL CARO
Deleted Station: PLAZA DE ARMAS
Deleted Station: PUENTE CAL Y CANTO
Deleted Station: UNIVERSIDAD DE CHILE
Deleted Station: NUNOA
Deleted Station: PLAZA EGANA
Deleted Station: FERNANDO CASTILLO VELASCO
Deleted Station: VICUNA MACKENNA
Deleted Station: IRARRAZAVAL
Deleted Station: VICENTE VALDES
fig, ax = plot_signal_in_graph(G, abs_err, title='Error of Tikhonov Regression', label='Error absoluto')
#fig.savefig('metro_regression_tikhonov_error.png', dpi=300)
Error of Tikhonov Regression

Change variable to error with average estimation

abs_err = np.abs(average_estimation - signal)

fig, ax = plot_signal_in_graph(G, abs_err, title=r'Error of $y = AD^{-1}x$', label='Error absoluto')
#fig.savefig('metro_regression_error_abs.png', dpi=300)
plt.show()
Error of $y = AD^{-1}x$

Total running time of the script: (0 minutes 9.404 seconds)

Estimated memory usage: 329 MB

Gallery generated by Sphinx-Gallery