New Jersey train delay prediction and on-time rate improvement

Sijia Zhang & Yixuan Zhou
2023-11-26

For Product Introduction, visit NJT - Connect is Ready!.

1. Use Case

Our initiatives aim to effectively address and minimize delays for NJ Transit, ultimately reducing the financial and time-related setbacks experienced by New Jersey Transit and its passengers.

In 2019, New Jersey Transit faced a higher number of outages compared to any other system in the country, as reported by the Federal Transit Administration. And the on-time performance rate this year has fallen below target goal.

Figure 1.1. NJ Transit Train

The reliability of New Jersey’s train services is paramount for commuters and travelers, with disruptions potentially leading to legal consequences, damages, and increased operating costs. These challenges are exacerbated by negative word of mouth, diminished ridership, and the risk of losing market share, especially amid fierce competition with Amtrak.

Our app, NJ Connect, empowers NJ Transit staff by providing advance notice of potential delays and enabling proactive measures to mitigate or avoid disruptions. NJ Connect comes in two versions: a computer version for the command room and a mobile version for all station staff.

Figure 1.2 Web Interface Workflow

The computer version for the command room enables real-time monitoring and prediction of delays, as well as the management of trains and staff. Potential management measures include:

  1. Train Management
  • Schedule Adjustments: Introduce shuttle buses during peak passenger flow periods or adjust train speeds and dwell times.
  • Priority Management: Assign priority status to certain trains or routes to minimize the impact of delays on critical services or high-capacity routes.
  • Rerouting: Rearrange spare tracks based on the time needed for track fault repairs.
  1. Staff Management
    • Issue Instructions
    • Receive Reports
  2. Passenger Communication
    • Announcements
    • Digital Displays

Manage real-time information displayed on digital screens regarding train schedules, delays, and platform changes.

Figure 1.3 Delay Prediction for NJ Transit (Web Interface)

Figure 1.4 Delay Prediction for NJ Transit (Web Interface)

The mobile version for staff enables NJ Transit employees to:

  1. View real-time and predicted delays.

  2. Receive commands from the command center and indicate their execution or completion with a simple click.

  3. Report to the command center any issues that could cause delays.

   

2. Data Wrangling

In this section, we present the data sources involved in this report and the initial data cleaning and integration methods used to prepare for subsequent statistical analysis and model construction.

2.1 NJ Station Data

To depict the spatial location of the New Jersey Transit rail network in New Jersey and surrounding areas, we used data on the locations of New Jersey Transit stations and lines. Concurrently, we incorporated New Jersey’s county-level geographical areas to present the New Jersey Transit rail network with 14 rail lines as spatial pattern.

Show

Figure 2.1. New Jersey Transit Lines and Stations Map

2.2 Delays by station and line

Delays invariably occur in accordance with the unique features of each station and every railway line. In this section, we use a dataset from Kaggle that list origin/destinations delay for Amtrak and NJ Transit trains. This dataset contains monthly records that detail the performance information for nearly all train trips across the NJ Transit rail network. We extract the rail data from October to November 2019, which included valuable information on train information, line, and delay time.

Show

2.3 Weather Data

Next, we explore the weather data pulled from the RIEM package in R, including four features: temperature (Fahrenheit), precipitation in the past hour (Inches), visibility (Miles), and wind speed (Knots). The complex variations in weather can impact decision-making regarding railway transit networks. Generally, high temperatures, substantial precipitation, and low visibility do not severely affect the normal operation of rail systems. However, extreme, and atypical weather conditions such as typhoons, blizzards, heavy downpours, and hailstorms can be decisive factors in preventing trains from operating normally on tracks. Therefore, we incorporated weather data as a reference data in our further predictive models.

Show

Figure 2.2 Weather data - Newark Liberty International Airport (EWR) - Oct & Nov 2019

2.5 Create Space-Time Panel and Time Lag Features

In this report, to better understand the delays in New Jersey’s transit railways, we create a time series from a spatio-temporal perspective to analyze the train delays. We hypothesize that when a delay occurs at a station on a railway line, it causes subsequent delays at other stations on the line. Therefore, we establish predictive models by forecasting the delays at each station on every line for each hour interval and every daily interval during the study period to predict the spatial-temporal process of delays.

Show

Show

Figure 2.3. Mean Delays of Each Lines

3. Exploratory Analysis

Before developing models, we explored the delay data over time, across space, across time and space, as well as the influences by time lag and weather.

3.1 On-time performance overview

We began by examining the overview of delay frequency. As illustrated in the graph below, the frequency of delays diminishes with the passage of time. Notably, there is a scarcity of trains experiencing delays exceeding 30 minutes.

Show

Figure 3.1 Overview of NJ Transit Delays at All Stations

To provide a quantitative assessment, we computed the percentage of trains adhering to their scheduled arrival times. Merely 68% of trains managed to arrive within five minutes of the expected time, and only 89% achieved this within a ten-minute window. When extending the timeframe to 20 minutes, a substantial improvement is observed, with almost all (97%) trains arriving punctually. Remarkably, all trains successfully reach their destinations within a two-hour timeframe. This analysis underscores the severity of the train delay issue, posing potential setbacks for both NJ Transit and its patrons.

Show

Time_FramePercentage_of_trains
Arrived within 5 minutes68.00
Arrived within 10 minutes88.67
Arrived within 20 minutes97.07
Arrived within 30 minutes98.68
Arrived within 1 hour99.64
Arrived within 2 hours100.00

The plot also reveals an alarming trend—nearly all on-time rates for trains fall below 0.5, with over half plummeting below 0.1. This implies a disconcerting scenario where the likelihood of trains arriving on schedule is exceedingly rare.

Show

Figure 3.2 On-Time rate of Trains by NJ Transit Stations

Moving on to the on-time rate maps for trains, a geographical pattern emerges. Stations in Gloucester and Atlantic exhibit markedly lower on-time rates compared to those in the northern regions of New Jersey. This geographic disparity in on-time performance raises concerns about the consistency of service across different areas.

Show

Figure 3.3 On-Time rate of Trains by NJ Transit Stations within 5/10/20/30 minutes

3.2 Delays across time

Secondly, we delved into the temporal dimension, scrutinizing delays on both a weekly and daily basis.

When examining the total train stops and average delay minutes over a week, a discernible cyclical pattern emerges. This cycle, however, becomes more intricate when analyzed on a daily scale.

Show

Figure 3.4 Average Train Delays of Each Line

Upon closer inspection of day-to-day variations, we observe that the average delay time exhibits a remarkably consistent fluctuation pattern from Monday to Friday. The shortest delay occurs at 3:30 in the morning, gradually increasing until it peaks around 9:00. Subsequently, there is a modest decrease until 15:00, followed by a continuous extension until 19:00 when it reaches the day’s highest peak.

During the weekends, the overall pattern of delay time fluctuations mirrors that of weekdays. However, the timing of the lowest and highest peaks differs, occurring later than the corresponding times on weekdays. Notably, both the overall average delay time and the peak value during weekends surpass those recorded from Monday to Friday.

Show

Figure 3.5 Average Train Delays per hr. by Day

3.3 Delays across space

Thirdly, we start to explore the delays across space (by lines and by stations).

(1) Delays by lines

Examining the table below reveals notable variations in the average delay durations across different transit lines. The most considerable delay is observed on the Atl. City Line, reaching up to 17 minutes. Notably, the Princeton Shuttle Line stands out for its minimal delay, registering only 0.4 minutes, indicating a nearly negligible delay scenario.

Show

Table 3.3.1 Summary of Delays by Line
lineMean_Delay_MinutesMean_Delay_HoursMedian_Delay_MinutesMedian_Delay_HoursMin_Delay_MinutesMax_Delay_Minutes
Atl. City Line16.55936550.27598940.07500000.00125000121.08333
Bergen Co. Line4.37445560.07290763.08333330.0513889099.00000
Gladstone Branch4.43519680.07391992.71666670.04527780100.21667
Main Line4.40684820.07344753.18333330.05305560100.00000
Montclair-Boonton4.94629780.08243833.21666670.0536111090.00000
Morristown Line5.42436760.09040613.43333330.05722220100.50000
No Jersey Coast4.71927790.07865463.15000000.0525000091.43333
Northeast Corrdr3.85301960.06421702.20000000.0366667069.33333
Pascack Valley3.49379460.05822992.41666670.0402778098.00000
Princeton Shuttle0.38168610.00636140.01666670.0002778011.46667
Raritan Valley4.36727230.07278793.06666670.0511111081.00000

To enhance comprehension, we translated these findings into visual representations using bar plots and a map. These visualizations offer a clear depiction of the severity of delays on each line and their respective geographic locations.

Show

Figure 3.6 Average Delay by Lines in NJ

(2) Delays by stations

Given that the average delays across lines fail to pinpoint the exact locations of specific delay incidents, we further scrutinize the delays categorized by stations. As illustrated in the table 3.3 below, the majority of stations experience delays of approximately 5 minutes. However, it is noteworthy that 8 stations exhibit mean delay durations surpassing 15 minutes, with the most prolonged delay reaching 18 minutes at Lindenwold.

Show

Table 3.3.2 Summary of Delays by Stations
fromMean_Delay_MinutesMean_Delay_HoursMedian_Delay_MinutesMedian_Delay_HoursMin_Delay_MinutesMax_Delay_Minutes
Aberdeen-Matawan5.46316570.09105284.03333330.06722220.000000067.31667
Absecon16.43189710.27386500.00000000.00000000.0000000116.00000
Allendale4.22233640.07037233.05000000.05083330.000000098.00000
Allenhurst5.48045400.09134094.20000000.07000000.000000050.00000
Anderson Street4.20984180.07016403.22500000.05375000.000000067.00000
Annandale3.57181820.05953031.49166670.02486110.000000050.00000
Asbury Park5.15155040.08585924.10000000.06833330.000000049.00000
Atco16.82178850.28036310.13333330.00222220.0000000118.05000
Atlantic City Rail Terminal16.73593090.27893228.01666670.13361110.000000085.38333
Avenel5.19852500.08664214.05000000.06750000.000000064.00000
Basking Ridge4.92404140.08206743.05000000.05083330.000000098.00000
Bay Head1.85472270.03091201.20833330.02013890.000000040.15000
Bay Street5.10287010.08504783.43333330.05722220.000000088.00000
Belmar6.20485880.10341435.08333330.08472220.000000049.00000
Berkeley Heights5.79714390.09661914.16666670.06944440.000000098.00000
Bernardsville5.59111420.09318524.08333330.06805560.0000000100.21667
Bloomfield4.60733700.07678893.20000000.05333330.000000071.86667
Boonton5.87705260.09795093.46666670.05777780.000000069.00000
Bound Brook4.84223890.08070403.23333330.05388890.000000078.00000
Bradley Beach5.50622920.09177054.21666670.07027780.000000049.00000
Brick Church5.64789440.09413164.10000000.06833330.000000098.00000
Bridgewater5.20142320.08669043.46666670.05777780.000000081.00000
Broadway Fair Lawn3.83575830.06392932.43333330.04055560.000000053.11667
Campbell Hall7.51543430.12525726.15000000.10250000.000000099.00000
Chatham5.69881350.09498023.99166670.06652780.000000098.00000
Cherry Hill18.04124920.30068752.86666670.04777780.0000000121.08333
Clifton5.01688030.08361474.20000000.07000000.000000065.05000
Convent Station5.84658570.09744314.13333330.06888890.000000099.00000
Cranford4.56910910.07615183.16666670.05277780.000000077.00000
Delawanna4.60274060.07671233.65833330.06097220.000000063.33333
Denville5.06661060.08444353.01666670.05027780.000000099.00000
Dover3.32042450.05534042.01666670.03361110.000000071.45000
Dunellen4.82933260.08048893.25000000.05416670.000000078.00000
East Orange5.43085980.09051433.69166670.06152780.000000098.00000
Edison3.97940280.06632342.18333330.03638890.000000049.35000
Egg Harbor City17.56272150.29271201.13333330.01888890.0000000119.16667
Elberon4.33211620.07220193.05000000.05083330.000000050.00000
Elizabeth4.77504990.07958423.30000000.05500000.000000065.00000
Emerson3.73676940.06227953.06666670.05111110.000000069.00000
Essex Street4.08360800.06806013.15000000.05250000.000000098.00000
Fanwood4.86935580.08115593.35000000.05583330.000000077.00000
Far Hills5.48449380.09140823.55833330.05930560.000000085.00000
Garfield4.66855180.07780923.31666670.05527780.000000055.05000
Garwood5.31269510.08854493.74166670.06236110.000000077.00000
Gillette4.75535970.07925602.85833330.04763890.000000098.00000
Gladstone0.85000000.01416670.45000000.00750000.000000012.15000
Glen Ridge5.13925240.08565423.51666670.05861110.000000088.00000
Glen Rock Boro Hall4.40687550.07344793.28333330.05472220.000000050.23333
Glen Rock Main Line4.19492110.06991543.23333330.05388890.000000070.03333
Hackettstown1.25908470.02098470.42500000.00708330.000000016.13333
Hamilton2.11746740.03529110.55000000.00916670.000000058.00000
Hammonton16.21686750.27028110.07500000.00125000.0000000108.63333
Harriman7.83542810.13059056.25833330.10430560.000000098.00000
Hawthorne4.44578460.07409643.51666670.05861110.000000070.15000
Hazlet5.18926910.08648783.38333330.05638890.000000068.13333
High Bridge0.76340210.01272340.00000000.00000000.000000031.28333
Highland Avenue5.52158990.09202653.75000000.06250000.000000098.00000
Hillsdale3.87764950.06462753.11666670.05194440.000000068.00000
Hoboken2.87025490.04783762.10000000.03500000.000000091.43333
Jersey Avenue3.69769690.06162832.15000000.03583330.000000051.00000
Kingsland4.38891680.07314863.30000000.05500000.000000061.06667
Lake Hopatcong4.30833330.07180563.10833330.05180560.000000045.00000
Lebanon5.04250380.08404172.18333330.03638890.000000057.00000
Lincoln Park6.53014040.10883574.43333330.07388890.000000069.00000
Linden4.78298570.07971643.28333330.05472220.000000065.00000
Lindenwold17.97701030.29961682.15000000.03583330.0000000120.08333
Little Falls6.48421050.10807025.06666670.08444440.083333368.00000
Little Silver3.78258940.06304322.38333330.03972220.000000085.00000
Long Branch2.08224550.03470410.26666670.00444440.000000070.13333
Lyndhurst4.54567080.07576123.60000000.06000000.000000063.13333
Lyons6.20314040.10338574.15000000.06916670.000000098.00000
Madison5.78251210.09637524.08333330.06805560.000000099.00000
Mahwah3.85362380.06422712.33333330.03888890.000000098.00000
Manasquan4.40674140.07344572.23333330.03722220.000000049.00000
Maplewood5.46355460.09105924.06666670.06777780.0000000100.41667
Metropark4.19329440.06988823.05000000.05083330.000000056.35000
Metuchen4.07169120.06786152.39166670.03986110.000000049.00000
Middletown NJ5.49591190.09159854.01666670.06694440.000000084.00000
Middletown NY6.64442420.11074044.56666670.07611110.000000098.00000
Millburn5.37096760.08951613.57500000.05958330.000000098.00000
Millington4.99563630.08326063.15000000.05250000.000000099.00000
Monmouth Park9.69041670.16150699.00000000.15000000.000000031.01667
Montclair Heights4.86279320.08104663.10000000.05166670.000000090.00000
Montclair State U2.55463210.04257721.25833330.02097220.000000071.10000
Montvale3.51824480.05863742.30000000.03833330.000000068.00000
Morris Plains5.21961530.08699363.26666670.05444440.000000099.08333
Morristown5.73244280.09554074.10000000.06833330.0000000100.00000
Mount Arlington5.82795950.09713274.15000000.06916670.000000053.00000
Mount Olive3.65896200.06098272.20000000.03666670.000000045.00000
Mount Tabor4.34047440.07234122.95000000.04916670.000000084.00000
Mountain Avenue5.80099930.09668334.01666670.06694440.000000088.00000
Mountain Lakes5.57628070.09293803.33333330.05555560.000000069.00000
Mountain Station5.43384210.09056403.43333330.05722220.0000000100.03333
Mountain View6.50782460.10846375.05000000.08416670.050000069.00000
Murray Hill3.53574280.05892901.41666670.02361110.000000098.00000
Nanuet3.29240450.05487342.21666670.03694440.000000073.00000
Netcong5.22040190.08700673.41666670.05694440.000000045.00000
Netherwood4.55537830.07592303.16666670.05277780.000000077.00000
New Bridge Landing3.41795780.05696602.45000000.04083330.000000067.00000
New Brunswick3.26719770.05445331.95000000.03250000.000000057.00000
New Providence4.82280210.08038003.11666670.05194440.000000098.00000
New York Penn Station3.18282610.05304711.15000000.01916670.000000090.13333
Newark Airport4.56563230.07609393.18333330.05305560.000000069.33333
Newark Broad Street5.40700340.09011673.51666670.05861110.000000099.00000
Newark Penn Station5.07412330.08456873.41666670.05694440.000000076.00000
North Branch6.72929060.11215484.11666670.06861110.000000058.00000
North Elizabeth4.84668390.08077813.38333330.05638890.000000065.00000
Oradell3.38795800.05646602.35000000.03916670.000000068.00000
Orange5.50072960.09167884.03333330.06722220.000000098.00000
Otisville6.46226800.10770454.11666670.06861110.0000000100.00000
Park Ridge3.90804460.06513413.11666670.05194440.000000068.00000
Passaic4.47874400.07464573.55833330.05930560.000000064.23333
Paterson4.59374190.07656244.06666670.06777780.000000064.10000
Peapack4.95070370.08251172.13333330.03555560.000000099.00000
Pearl River3.56781140.05946352.20000000.03666670.000000075.00000
Pennsauken17.71210260.29520172.13333330.03555560.0000000121.00000
Perth Amboy5.24223200.08737053.50000000.05833330.000000067.23333
Philadelphia11.39326090.18988770.00000000.00000000.0000000109.38333
Plainfield4.75775530.07929593.25000000.05416670.000000078.00000
Plauderville3.71206780.06186782.28333330.03805560.000000054.13333
Point Pleasant Beach2.98067990.04967801.25000000.02083330.000000050.00000
Port Jervis1.88899840.03148330.00000000.00000000.000000077.26667
Princeton0.38202830.00636710.01666670.00027780.000000011.46667
Princeton Junction3.41412580.05690211.48333330.02472220.000000055.00000
Radburn Fair Lawn4.50408750.07506813.23333330.05388890.000000098.00000
Rahway4.92424680.08207083.26666670.05444440.000000066.00000
Ramsey Main St4.47705220.07461753.11666670.05194440.000000098.00000
Ramsey Route 174.81006580.08016783.15000000.05250000.000000098.00000
Raritan3.45451290.05757522.20000000.03666670.000000056.00000
Red Bank5.64952120.09415874.05000000.06750000.000000085.00000
Ridgewood4.55066520.07584443.29166670.05486110.000000098.00000
River Edge3.33536870.05558952.28333330.03805560.000000068.00000
Roselle Park4.72379200.07872993.20000000.05333330.000000076.00000
Rutherford5.11040140.08517344.06666670.06777780.000000098.00000
Salisbury Mills-Cornwall7.88222360.13137046.21666670.10361110.000000098.00000
Secaucus Concourse0.10000000.00166670.10000000.00166670.10000000.10000
Secaucus Lower Lvl3.70830830.06180513.00000000.05000000.000000099.00000
Secaucus Upper Lvl4.25670900.07094522.35000000.03916670.000000079.00000
Short Hills5.54911600.09248533.46666670.05777780.000000098.00000
Sloatsburg6.88264220.11471075.21666670.08694440.000000098.00000
Somerville3.84352950.06405882.36666670.03944440.000000080.00000
South Amboy5.55096810.09251614.06666670.06777780.000000067.11667
South Orange5.81266650.09687784.11666670.06861110.000000099.06667
Spring Lake5.31827240.08863794.13333330.06888890.000000049.00000
Spring Valley1.63776720.02729611.20000000.02000000.000000056.30000
Stirling4.51609070.07526822.68333330.04472220.000000098.00000
Suffern3.40952990.05682551.46666670.02444440.000000098.00000
Summit5.13052650.08550883.28333330.05472220.0000000100.50000
Teterboro4.03968770.06732813.20000000.05333330.000000042.56667
Towaco5.50529820.09175503.31666670.05527780.000000069.00000
Trenton2.12267120.03537791.18333330.01972220.000000043.05000
Tuxedo7.11401730.11856705.35000000.08916670.000000098.00000
Union4.21098830.07018313.00000000.05000000.000000079.00000
Upper Montclair5.85511330.09758524.15000000.06916670.000000089.00000
Waldwick3.22028040.05367132.16666670.03611110.000000069.00000
Walnut Street5.36430260.08940503.56666670.05944440.000000088.00000
Watchung Avenue5.63866050.09397774.11666670.06861110.000000088.00000
Watsessing Avenue5.53586400.09226444.20833330.07013890.000000076.00000
Wayne-Route 236.88694740.11478255.13333330.08555560.100000068.00000
Wesmont4.30130320.07168843.16666670.05277780.000000056.15000
Westfield4.54796960.07579953.14166670.05236110.000000077.00000
Westwood4.05297440.06754963.20000000.05333330.000000069.00000
White House4.56952050.07615871.34166670.02236110.000000056.00000
Wood Ridge4.57818980.07630323.60000000.06000000.000000066.00000
Woodbridge4.92609910.08210173.20000000.05333330.000000053.06667
Woodcliff Lake4.09137410.06818963.20000000.05333330.000000047.30000

Figure 3.7 Average Delay by Stations in NJ

3.4 Delays across time and space

Fourthly, we investigate temporal (weekly and daily) and spatial delays.

Presented below are the daily mean delays throughout the week, outlined by day. As depicted in the plot, the average delays in the northern region of New Jersey range consistently below 10 minutes from Monday to Sunday. In contrast, the mean delay for the AC line exceeds 10 minutes, particularly on weekends where it peaks at 20 minutes.

Show

Figure 3.8 Mean Delay Minutes per Day by Line

Additionally, we examine the hourly delays categorized by line.

Show

Figure 3.9 Mean delay minutes per hour by line

Show

3.6 Delays impacted by weather

In this section, we investigate the lack of any significant correlation between train delays and the various weather conditions. And the reason for the result might be that only severe weather conditions are likely to lead to significant train delay. Our model was trained on a limited time period, and no noteworthy weather events occurred during that period to significantly manifest in the data. Although their impact was only marginal, our final model incorporated temperature, precipitation, visibility, and wind speed features due to their slight predictive enhancements.

Show

Figure 3.9 Train delays by weather conditions

4. Regression model building and training

In this part, we began to divide train and test set, build 5 different models.

4.1 Create Train and Test Set

We have the delay data from the 40th week to the 46th week. Considering that we calculated the lag data, there are the complete data from 41th week to the 46th week. Then we filter the first four weeks as our train set, and the last two weeks as our test set.

Show

Figure 3.10 NJ Transit Rail Trips by Week

4.2 Correlation Matrix

The correlation matrix shows the relationship across all the numeric variables. From Figure 3.10 below, there is no obvious correlation between any two variables. Thus, we kept all the variables for the models we built later.

Show

Figure 3.11 Correlation Matrix

4.3 Build Model

During the development of our model, we categorized the features we incorporated into four broad groups. These categories were systematically tested in various combinations, leading us to the presentation of five models, among which the one we ultimately selected is featured.

Here are our four feature categories:

  • Time: week (Week number), dotw (Weekday), hour (Time of a day)
  • Space: ST_NAME (Name of stations), LINE_CODE (Abbreviation for lines)
  • Time lag: lag1Hourlag2Hourslag3Hours,lag4Hourslag6Hourslag8Hourslag10Hourslag12Hours
  • Weather: temperatureprecipitationvisibilitywindspeed
  • Holidays:holidayLag

Here are our three models:

    1. Time and Space
    1. Time, Space and Time Lag
    1. Time, Space, Time Lag and Weather
    1. Time, Space, TIme Lag, Weather and HolidayLag

5. Results

5.1 Modeling for Training Set

5.2 Predict for Test Set

5.3 MAEs

We validate the models based on the Mean Absolute Error (MAE) over time and can see that model B and Model C has the smallest MAE, whether it is calculated by week and model or by line.

Show

Figure 5.1 MAE: Delays by week and model

Show

Figure 5.2 MAE: Delays of more than 20 minutes by model and lines

5.4 Observed and Predicted Mean Hourly Delays

From predictions of two weeks’ data, we can see a significant enhancement in the ability of models B, C and D to predict highest peaks of delay. Also, they accurately predict low delay peaks and trends, meaning that incorporating time lags improves prediction accuracy. However, the models fail to precisely predict timing of high delay. We think it could be due to unforeseen situations.

Show

Figure 5.3 Predicted and Observed mean delay minutes by hour

5.5 Cross Validation

We conduct daily cross-validation on our linear regression model. Each day of the week sequentially becomes part of the training set, while the other days act as the test set. Results show that the average Mean Absolute Error (MAE) stands at 0.493 minutes, with the maximum daily MAE reaching about 1.2. The bulk of the MAE values cluster between 0.2 and 0.6, indicating that our model displays a relatively consistent and low prediction error across test set. This consistency suggests that our model has commendable predictive accuracy. However, despite the average error being relatively low, there remains room for improvement, especially in reducing the maximum error values.

Show

Figure 5.4 LOGO-CV: Mean Delay Duration by Day in a Week

Show

Table 5. Cross Validation Results
CrossValidationMean_MAESD_MAE
LOGO CV: day in a week0.4930.39

6. Conclusion

The model trained for the App reveals the primary factors leading to train delays, providing a multidimensional periodic delay prediction for NJ Transit that considers both temporal and spatial dimensions. This enables our model to proactively forecast delays by analyzing New Jersey’s traffic hub dynamics.

The utility and efficacy of our model are evident, yet its refinement depends on an extensive, precise, and publicly accessible database from governmental and specialized sectors. By reviewing literature and historical delay instances, we aim to mitigate unpredictable factors such as mechanical failures, funding shortages, and unexpected events. In 2023, for instance, the Morris & Essex line saw significant commuter disruptions due to ongoing overhead line issues. Such disruptions, especially from signal problems in critical areas like tunnels, can lead to notable delays due to safety and operational concerns.

To enhance our model, we need to integrate potential factors that may cause train delays or cancellations as independent variables. Another key to improving our delay prediction model’s accuracy is to include comprehensive Amtrak data, such as actual and estimated arrival times and train IDs, which enrich our independent variable observations.

Despite these considerations, we are confident that our model, which includes a wide array of important predictors, is robust enough for NJ Transit’s application. The train delay prediction model we’ve developed for NJ Transit aims to optimize operations, improve cost-effectiveness, and elevate the passenger experience and service quality. This data-driven approach not only assists NJ Transit in better understanding and managing its operational challenges but also supports its sustainable development goals over the long term. Moving forward, we are committed to continually enhancing our prediction model to offer even more valuable solutions to NJ Transit and other public transit operators.

  • 21
    点赞
  • 14
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值