Sijia Zhang & Yixuan Zhou
2023-11-26
For Product Introduction, visit NJT - Connect is Ready!.
1. Use Case
Our initiatives aim to effectively address and minimize delays for NJ Transit, ultimately reducing the financial and time-related setbacks experienced by New Jersey Transit and its passengers.
In 2019, New Jersey Transit faced a higher number of outages compared to any other system in the country, as reported by the Federal Transit Administration. And the on-time performance rate this year has fallen below target goal.
Figure 1.1. NJ Transit Train
The reliability of New Jersey’s train services is paramount for commuters and travelers, with disruptions potentially leading to legal consequences, damages, and increased operating costs. These challenges are exacerbated by negative word of mouth, diminished ridership, and the risk of losing market share, especially amid fierce competition with Amtrak.
Our app, NJ Connect, empowers NJ Transit staff by providing advance notice of potential delays and enabling proactive measures to mitigate or avoid disruptions. NJ Connect comes in two versions: a computer version for the command room and a mobile version for all station staff.
Figure 1.2 Web Interface Workflow
The computer version for the command room enables real-time monitoring and prediction of delays, as well as the management of trains and staff. Potential management measures include:
- Train Management
- Schedule Adjustments: Introduce shuttle buses during peak passenger flow periods or adjust train speeds and dwell times.
- Priority Management: Assign priority status to certain trains or routes to minimize the impact of delays on critical services or high-capacity routes.
- Rerouting: Rearrange spare tracks based on the time needed for track fault repairs.
- Staff Management
- Issue Instructions
- Receive Reports
- Passenger Communication
- Announcements
- Digital Displays
Manage real-time information displayed on digital screens regarding train schedules, delays, and platform changes.
Figure 1.3 Delay Prediction for NJ Transit (Web Interface)
Figure 1.4 Delay Prediction for NJ Transit (Web Interface)
The mobile version for staff enables NJ Transit employees to:
-
View real-time and predicted delays.
-
Receive commands from the command center and indicate their execution or completion with a simple click.
-
Report to the command center any issues that could cause delays.
2. Data Wrangling
In this section, we present the data sources involved in this report and the initial data cleaning and integration methods used to prepare for subsequent statistical analysis and model construction.
2.1 NJ Station Data
To depict the spatial location of the New Jersey Transit rail network in New Jersey and surrounding areas, we used data on the locations of New Jersey Transit stations and lines. Concurrently, we incorporated New Jersey’s county-level geographical areas to present the New Jersey Transit rail network with 14 rail lines as spatial pattern.
Show
Figure 2.1. New Jersey Transit Lines and Stations Map
2.2 Delays by station and line
Delays invariably occur in accordance with the unique features of each station and every railway line. In this section, we use a dataset from Kaggle that list origin/destinations delay for Amtrak and NJ Transit trains. This dataset contains monthly records that detail the performance information for nearly all train trips across the NJ Transit rail network. We extract the rail data from October to November 2019, which included valuable information on train information, line, and delay time.
Show
2.3 Weather Data
Next, we explore the weather data pulled from the RIEM package in R, including four features: temperature (Fahrenheit), precipitation in the past hour (Inches), visibility (Miles), and wind speed (Knots). The complex variations in weather can impact decision-making regarding railway transit networks. Generally, high temperatures, substantial precipitation, and low visibility do not severely affect the normal operation of rail systems. However, extreme, and atypical weather conditions such as typhoons, blizzards, heavy downpours, and hailstorms can be decisive factors in preventing trains from operating normally on tracks. Therefore, we incorporated weather data as a reference data in our further predictive models.
Show
Figure 2.2 Weather data - Newark Liberty International Airport (EWR) - Oct & Nov 2019
2.5 Create Space-Time Panel and Time Lag Features
In this report, to better understand the delays in New Jersey’s transit railways, we create a time series from a spatio-temporal perspective to analyze the train delays. We hypothesize that when a delay occurs at a station on a railway line, it causes subsequent delays at other stations on the line. Therefore, we establish predictive models by forecasting the delays at each station on every line for each hour interval and every daily interval during the study period to predict the spatial-temporal process of delays.
Show
Show
Figure 2.3. Mean Delays of Each Lines
3. Exploratory Analysis
Before developing models, we explored the delay data over time, across space, across time and space, as well as the influences by time lag and weather.
3.1 On-time performance overview
We began by examining the overview of delay frequency. As illustrated in the graph below, the frequency of delays diminishes with the passage of time. Notably, there is a scarcity of trains experiencing delays exceeding 30 minutes.
Show
Figure 3.1 Overview of NJ Transit Delays at All Stations
To provide a quantitative assessment, we computed the percentage of trains adhering to their scheduled arrival times. Merely 68% of trains managed to arrive within five minutes of the expected time, and only 89% achieved this within a ten-minute window. When extending the timeframe to 20 minutes, a substantial improvement is observed, with almost all (97%) trains arriving punctually. Remarkably, all trains successfully reach their destinations within a two-hour timeframe. This analysis underscores the severity of the train delay issue, posing potential setbacks for both NJ Transit and its patrons.
Show
Time_Frame | Percentage_of_trains |
---|---|
Arrived within 5 minutes | 68.00 |
Arrived within 10 minutes | 88.67 |
Arrived within 20 minutes | 97.07 |
Arrived within 30 minutes | 98.68 |
Arrived within 1 hour | 99.64 |
Arrived within 2 hours | 100.00 |
The plot also reveals an alarming trend—nearly all on-time rates for trains fall below 0.5, with over half plummeting below 0.1. This implies a disconcerting scenario where the likelihood of trains arriving on schedule is exceedingly rare.
Show
Figure 3.2 On-Time rate of Trains by NJ Transit Stations
Moving on to the on-time rate maps for trains, a geographical pattern emerges. Stations in Gloucester and Atlantic exhibit markedly lower on-time rates compared to those in the northern regions of New Jersey. This geographic disparity in on-time performance raises concerns about the consistency of service across different areas.
Show
Figure 3.3 On-Time rate of Trains by NJ Transit Stations within 5/10/20/30 minutes
3.2 Delays across time
Secondly, we delved into the temporal dimension, scrutinizing delays on both a weekly and daily basis.
When examining the total train stops and average delay minutes over a week, a discernible cyclical pattern emerges. This cycle, however, becomes more intricate when analyzed on a daily scale.
Show
Figure 3.4 Average Train Delays of Each Line
Upon closer inspection of day-to-day variations, we observe that the average delay time exhibits a remarkably consistent fluctuation pattern from Monday to Friday. The shortest delay occurs at 3:30 in the morning, gradually increasing until it peaks around 9:00. Subsequently, there is a modest decrease until 15:00, followed by a continuous extension until 19:00 when it reaches the day’s highest peak.
During the weekends, the overall pattern of delay time fluctuations mirrors that of weekdays. However, the timing of the lowest and highest peaks differs, occurring later than the corresponding times on weekdays. Notably, both the overall average delay time and the peak value during weekends surpass those recorded from Monday to Friday.
Show
Figure 3.5 Average Train Delays per hr. by Day
3.3 Delays across space
Thirdly, we start to explore the delays across space (by lines and by stations).
(1) Delays by lines
Examining the table below reveals notable variations in the average delay durations across different transit lines. The most considerable delay is observed on the Atl. City Line, reaching up to 17 minutes. Notably, the Princeton Shuttle Line stands out for its minimal delay, registering only 0.4 minutes, indicating a nearly negligible delay scenario.
Show
line | Mean_Delay_Minutes | Mean_Delay_Hours | Median_Delay_Minutes | Median_Delay_Hours | Min_Delay_Minutes | Max_Delay_Minutes |
---|---|---|---|---|---|---|
Atl. City Line | 16.5593655 | 0.2759894 | 0.0750000 | 0.0012500 | 0 | 121.08333 |
Bergen Co. Line | 4.3744556 | 0.0729076 | 3.0833333 | 0.0513889 | 0 | 99.00000 |
Gladstone Branch | 4.4351968 | 0.0739199 | 2.7166667 | 0.0452778 | 0 | 100.21667 |
Main Line | 4.4068482 | 0.0734475 | 3.1833333 | 0.0530556 | 0 | 100.00000 |
Montclair-Boonton | 4.9462978 | 0.0824383 | 3.2166667 | 0.0536111 | 0 | 90.00000 |
Morristown Line | 5.4243676 | 0.0904061 | 3.4333333 | 0.0572222 | 0 | 100.50000 |
No Jersey Coast | 4.7192779 | 0.0786546 | 3.1500000 | 0.0525000 | 0 | 91.43333 |
Northeast Corrdr | 3.8530196 | 0.0642170 | 2.2000000 | 0.0366667 | 0 | 69.33333 |
Pascack Valley | 3.4937946 | 0.0582299 | 2.4166667 | 0.0402778 | 0 | 98.00000 |
Princeton Shuttle | 0.3816861 | 0.0063614 | 0.0166667 | 0.0002778 | 0 | 11.46667 |
Raritan Valley | 4.3672723 | 0.0727879 | 3.0666667 | 0.0511111 | 0 | 81.00000 |
To enhance comprehension, we translated these findings into visual representations using bar plots and a map. These visualizations offer a clear depiction of the severity of delays on each line and their respective geographic locations.
Show
Figure 3.6 Average Delay by Lines in NJ
(2) Delays by stations
Given that the average delays across lines fail to pinpoint the exact locations of specific delay incidents, we further scrutinize the delays categorized by stations. As illustrated in the table 3.3 below, the majority of stations experience delays of approximately 5 minutes. However, it is noteworthy that 8 stations exhibit mean delay durations surpassing 15 minutes, with the most prolonged delay reaching 18 minutes at Lindenwold.
Show
from | Mean_Delay_Minutes | Mean_Delay_Hours | Median_Delay_Minutes | Median_Delay_Hours | Min_Delay_Minutes | Max_Delay_Minutes |
---|---|---|---|---|---|---|
Aberdeen-Matawan | 5.4631657 | 0.0910528 | 4.0333333 | 0.0672222 | 0.0000000 | 67.31667 |
Absecon | 16.4318971 | 0.2738650 | 0.0000000 | 0.0000000 | 0.0000000 | 116.00000 |
Allendale | 4.2223364 | 0.0703723 | 3.0500000 | 0.0508333 | 0.0000000 | 98.00000 |
Allenhurst | 5.4804540 | 0.0913409 | 4.2000000 | 0.0700000 | 0.0000000 | 50.00000 |
Anderson Street | 4.2098418 | 0.0701640 | 3.2250000 | 0.0537500 | 0.0000000 | 67.00000 |
Annandale | 3.5718182 | 0.0595303 | 1.4916667 | 0.0248611 | 0.0000000 | 50.00000 |
Asbury Park | 5.1515504 | 0.0858592 | 4.1000000 | 0.0683333 | 0.0000000 | 49.00000 |
Atco | 16.8217885 | 0.2803631 | 0.1333333 | 0.0022222 | 0.0000000 | 118.05000 |
Atlantic City Rail Terminal | 16.7359309 | 0.2789322 | 8.0166667 | 0.1336111 | 0.0000000 | 85.38333 |
Avenel | 5.1985250 | 0.0866421 | 4.0500000 | 0.0675000 | 0.0000000 | 64.00000 |
Basking Ridge | 4.9240414 | 0.0820674 | 3.0500000 | 0.0508333 | 0.0000000 | 98.00000 |
Bay Head | 1.8547227 | 0.0309120 | 1.2083333 | 0.0201389 | 0.0000000 | 40.15000 |
Bay Street | 5.1028701 | 0.0850478 | 3.4333333 | 0.0572222 | 0.0000000 | 88.00000 |
Belmar | 6.2048588 | 0.1034143 | 5.0833333 | 0.0847222 | 0.0000000 | 49.00000 |
Berkeley Heights | 5.7971439 | 0.0966191 | 4.1666667 | 0.0694444 | 0.0000000 | 98.00000 |
Bernardsville | 5.5911142 | 0.0931852 | 4.0833333 | 0.0680556 | 0.0000000 | 100.21667 |
Bloomfield | 4.6073370 | 0.0767889 | 3.2000000 | 0.0533333 | 0.0000000 | 71.86667 |
Boonton | 5.8770526 | 0.0979509 | 3.4666667 | 0.0577778 | 0.0000000 | 69.00000 |
Bound Brook | 4.8422389 | 0.0807040 | 3.2333333 | 0.0538889 | 0.0000000 | 78.00000 |
Bradley Beach | 5.5062292 | 0.0917705 | 4.2166667 | 0.0702778 | 0.0000000 | 49.00000 |
Brick Church | 5.6478944 | 0.0941316 | 4.1000000 | 0.0683333 | 0.0000000 | 98.00000 |
Bridgewater | 5.2014232 | 0.0866904 | 3.4666667 | 0.0577778 | 0.0000000 | 81.00000 |
Broadway Fair Lawn | 3.8357583 | 0.0639293 | 2.4333333 | 0.0405556 | 0.0000000 | 53.11667 |
Campbell Hall | 7.5154343 | 0.1252572 | 6.1500000 | 0.1025000 | 0.0000000 | 99.00000 |
Chatham | 5.6988135 | 0.0949802 | 3.9916667 | 0.0665278 | 0.0000000 | 98.00000 |
Cherry Hill | 18.0412492 | 0.3006875 | 2.8666667 | 0.0477778 | 0.0000000 | 121.08333 |
Clifton | 5.0168803 | 0.0836147 | 4.2000000 | 0.0700000 | 0.0000000 | 65.05000 |
Convent Station | 5.8465857 | 0.0974431 | 4.1333333 | 0.0688889 | 0.0000000 | 99.00000 |
Cranford | 4.5691091 | 0.0761518 | 3.1666667 | 0.0527778 | 0.0000000 | 77.00000 |
Delawanna | 4.6027406 | 0.0767123 | 3.6583333 | 0.0609722 | 0.0000000 | 63.33333 |
Denville | 5.0666106 | 0.0844435 | 3.0166667 | 0.0502778 | 0.0000000 | 99.00000 |
Dover | 3.3204245 | 0.0553404 | 2.0166667 | 0.0336111 | 0.0000000 | 71.45000 |
Dunellen | 4.8293326 | 0.0804889 | 3.2500000 | 0.0541667 | 0.0000000 | 78.00000 |
East Orange | 5.4308598 | 0.0905143 | 3.6916667 | 0.0615278 | 0.0000000 | 98.00000 |
Edison | 3.9794028 | 0.0663234 | 2.1833333 | 0.0363889 | 0.0000000 | 49.35000 |
Egg Harbor City | 17.5627215 | 0.2927120 | 1.1333333 | 0.0188889 | 0.0000000 | 119.16667 |
Elberon | 4.3321162 | 0.0722019 | 3.0500000 | 0.0508333 | 0.0000000 | 50.00000 |
Elizabeth | 4.7750499 | 0.0795842 | 3.3000000 | 0.0550000 | 0.0000000 | 65.00000 |
Emerson | 3.7367694 | 0.0622795 | 3.0666667 | 0.0511111 | 0.0000000 | 69.00000 |
Essex Street | 4.0836080 | 0.0680601 | 3.1500000 | 0.0525000 | 0.0000000 | 98.00000 |
Fanwood | 4.8693558 | 0.0811559 | 3.3500000 | 0.0558333 | 0.0000000 | 77.00000 |
Far Hills | 5.4844938 | 0.0914082 | 3.5583333 | 0.0593056 | 0.0000000 | 85.00000 |
Garfield | 4.6685518 | 0.0778092 | 3.3166667 | 0.0552778 | 0.0000000 | 55.05000 |
Garwood | 5.3126951 | 0.0885449 | 3.7416667 | 0.0623611 | 0.0000000 | 77.00000 |
Gillette | 4.7553597 | 0.0792560 | 2.8583333 | 0.0476389 | 0.0000000 | 98.00000 |
Gladstone | 0.8500000 | 0.0141667 | 0.4500000 | 0.0075000 | 0.0000000 | 12.15000 |
Glen Ridge | 5.1392524 | 0.0856542 | 3.5166667 | 0.0586111 | 0.0000000 | 88.00000 |
Glen Rock Boro Hall | 4.4068755 | 0.0734479 | 3.2833333 | 0.0547222 | 0.0000000 | 50.23333 |
Glen Rock Main Line | 4.1949211 | 0.0699154 | 3.2333333 | 0.0538889 | 0.0000000 | 70.03333 |
Hackettstown | 1.2590847 | 0.0209847 | 0.4250000 | 0.0070833 | 0.0000000 | 16.13333 |
Hamilton | 2.1174674 | 0.0352911 | 0.5500000 | 0.0091667 | 0.0000000 | 58.00000 |
Hammonton | 16.2168675 | 0.2702811 | 0.0750000 | 0.0012500 | 0.0000000 | 108.63333 |
Harriman | 7.8354281 | 0.1305905 | 6.2583333 | 0.1043056 | 0.0000000 | 98.00000 |
Hawthorne | 4.4457846 | 0.0740964 | 3.5166667 | 0.0586111 | 0.0000000 | 70.15000 |
Hazlet | 5.1892691 | 0.0864878 | 3.3833333 | 0.0563889 | 0.0000000 | 68.13333 |
High Bridge | 0.7634021 | 0.0127234 | 0.0000000 | 0.0000000 | 0.0000000 | 31.28333 |
Highland Avenue | 5.5215899 | 0.0920265 | 3.7500000 | 0.0625000 | 0.0000000 | 98.00000 |
Hillsdale | 3.8776495 | 0.0646275 | 3.1166667 | 0.0519444 | 0.0000000 | 68.00000 |
Hoboken | 2.8702549 | 0.0478376 | 2.1000000 | 0.0350000 | 0.0000000 | 91.43333 |
Jersey Avenue | 3.6976969 | 0.0616283 | 2.1500000 | 0.0358333 | 0.0000000 | 51.00000 |
Kingsland | 4.3889168 | 0.0731486 | 3.3000000 | 0.0550000 | 0.0000000 | 61.06667 |
Lake Hopatcong | 4.3083333 | 0.0718056 | 3.1083333 | 0.0518056 | 0.0000000 | 45.00000 |
Lebanon | 5.0425038 | 0.0840417 | 2.1833333 | 0.0363889 | 0.0000000 | 57.00000 |
Lincoln Park | 6.5301404 | 0.1088357 | 4.4333333 | 0.0738889 | 0.0000000 | 69.00000 |
Linden | 4.7829857 | 0.0797164 | 3.2833333 | 0.0547222 | 0.0000000 | 65.00000 |
Lindenwold | 17.9770103 | 0.2996168 | 2.1500000 | 0.0358333 | 0.0000000 | 120.08333 |
Little Falls | 6.4842105 | 0.1080702 | 5.0666667 | 0.0844444 | 0.0833333 | 68.00000 |
Little Silver | 3.7825894 | 0.0630432 | 2.3833333 | 0.0397222 | 0.0000000 | 85.00000 |
Long Branch | 2.0822455 | 0.0347041 | 0.2666667 | 0.0044444 | 0.0000000 | 70.13333 |
Lyndhurst | 4.5456708 | 0.0757612 | 3.6000000 | 0.0600000 | 0.0000000 | 63.13333 |
Lyons | 6.2031404 | 0.1033857 | 4.1500000 | 0.0691667 | 0.0000000 | 98.00000 |
Madison | 5.7825121 | 0.0963752 | 4.0833333 | 0.0680556 | 0.0000000 | 99.00000 |
Mahwah | 3.8536238 | 0.0642271 | 2.3333333 | 0.0388889 | 0.0000000 | 98.00000 |
Manasquan | 4.4067414 | 0.0734457 | 2.2333333 | 0.0372222 | 0.0000000 | 49.00000 |
Maplewood | 5.4635546 | 0.0910592 | 4.0666667 | 0.0677778 | 0.0000000 | 100.41667 |
Metropark | 4.1932944 | 0.0698882 | 3.0500000 | 0.0508333 | 0.0000000 | 56.35000 |
Metuchen | 4.0716912 | 0.0678615 | 2.3916667 | 0.0398611 | 0.0000000 | 49.00000 |
Middletown NJ | 5.4959119 | 0.0915985 | 4.0166667 | 0.0669444 | 0.0000000 | 84.00000 |
Middletown NY | 6.6444242 | 0.1107404 | 4.5666667 | 0.0761111 | 0.0000000 | 98.00000 |
Millburn | 5.3709676 | 0.0895161 | 3.5750000 | 0.0595833 | 0.0000000 | 98.00000 |
Millington | 4.9956363 | 0.0832606 | 3.1500000 | 0.0525000 | 0.0000000 | 99.00000 |
Monmouth Park | 9.6904167 | 0.1615069 | 9.0000000 | 0.1500000 | 0.0000000 | 31.01667 |
Montclair Heights | 4.8627932 | 0.0810466 | 3.1000000 | 0.0516667 | 0.0000000 | 90.00000 |
Montclair State U | 2.5546321 | 0.0425772 | 1.2583333 | 0.0209722 | 0.0000000 | 71.10000 |
Montvale | 3.5182448 | 0.0586374 | 2.3000000 | 0.0383333 | 0.0000000 | 68.00000 |
Morris Plains | 5.2196153 | 0.0869936 | 3.2666667 | 0.0544444 | 0.0000000 | 99.08333 |
Morristown | 5.7324428 | 0.0955407 | 4.1000000 | 0.0683333 | 0.0000000 | 100.00000 |
Mount Arlington | 5.8279595 | 0.0971327 | 4.1500000 | 0.0691667 | 0.0000000 | 53.00000 |
Mount Olive | 3.6589620 | 0.0609827 | 2.2000000 | 0.0366667 | 0.0000000 | 45.00000 |
Mount Tabor | 4.3404744 | 0.0723412 | 2.9500000 | 0.0491667 | 0.0000000 | 84.00000 |
Mountain Avenue | 5.8009993 | 0.0966833 | 4.0166667 | 0.0669444 | 0.0000000 | 88.00000 |
Mountain Lakes | 5.5762807 | 0.0929380 | 3.3333333 | 0.0555556 | 0.0000000 | 69.00000 |
Mountain Station | 5.4338421 | 0.0905640 | 3.4333333 | 0.0572222 | 0.0000000 | 100.03333 |
Mountain View | 6.5078246 | 0.1084637 | 5.0500000 | 0.0841667 | 0.0500000 | 69.00000 |
Murray Hill | 3.5357428 | 0.0589290 | 1.4166667 | 0.0236111 | 0.0000000 | 98.00000 |
Nanuet | 3.2924045 | 0.0548734 | 2.2166667 | 0.0369444 | 0.0000000 | 73.00000 |
Netcong | 5.2204019 | 0.0870067 | 3.4166667 | 0.0569444 | 0.0000000 | 45.00000 |
Netherwood | 4.5553783 | 0.0759230 | 3.1666667 | 0.0527778 | 0.0000000 | 77.00000 |
New Bridge Landing | 3.4179578 | 0.0569660 | 2.4500000 | 0.0408333 | 0.0000000 | 67.00000 |
New Brunswick | 3.2671977 | 0.0544533 | 1.9500000 | 0.0325000 | 0.0000000 | 57.00000 |
New Providence | 4.8228021 | 0.0803800 | 3.1166667 | 0.0519444 | 0.0000000 | 98.00000 |
New York Penn Station | 3.1828261 | 0.0530471 | 1.1500000 | 0.0191667 | 0.0000000 | 90.13333 |
Newark Airport | 4.5656323 | 0.0760939 | 3.1833333 | 0.0530556 | 0.0000000 | 69.33333 |
Newark Broad Street | 5.4070034 | 0.0901167 | 3.5166667 | 0.0586111 | 0.0000000 | 99.00000 |
Newark Penn Station | 5.0741233 | 0.0845687 | 3.4166667 | 0.0569444 | 0.0000000 | 76.00000 |
North Branch | 6.7292906 | 0.1121548 | 4.1166667 | 0.0686111 | 0.0000000 | 58.00000 |
North Elizabeth | 4.8466839 | 0.0807781 | 3.3833333 | 0.0563889 | 0.0000000 | 65.00000 |
Oradell | 3.3879580 | 0.0564660 | 2.3500000 | 0.0391667 | 0.0000000 | 68.00000 |
Orange | 5.5007296 | 0.0916788 | 4.0333333 | 0.0672222 | 0.0000000 | 98.00000 |
Otisville | 6.4622680 | 0.1077045 | 4.1166667 | 0.0686111 | 0.0000000 | 100.00000 |
Park Ridge | 3.9080446 | 0.0651341 | 3.1166667 | 0.0519444 | 0.0000000 | 68.00000 |
Passaic | 4.4787440 | 0.0746457 | 3.5583333 | 0.0593056 | 0.0000000 | 64.23333 |
Paterson | 4.5937419 | 0.0765624 | 4.0666667 | 0.0677778 | 0.0000000 | 64.10000 |
Peapack | 4.9507037 | 0.0825117 | 2.1333333 | 0.0355556 | 0.0000000 | 99.00000 |
Pearl River | 3.5678114 | 0.0594635 | 2.2000000 | 0.0366667 | 0.0000000 | 75.00000 |
Pennsauken | 17.7121026 | 0.2952017 | 2.1333333 | 0.0355556 | 0.0000000 | 121.00000 |
Perth Amboy | 5.2422320 | 0.0873705 | 3.5000000 | 0.0583333 | 0.0000000 | 67.23333 |
Philadelphia | 11.3932609 | 0.1898877 | 0.0000000 | 0.0000000 | 0.0000000 | 109.38333 |
Plainfield | 4.7577553 | 0.0792959 | 3.2500000 | 0.0541667 | 0.0000000 | 78.00000 |
Plauderville | 3.7120678 | 0.0618678 | 2.2833333 | 0.0380556 | 0.0000000 | 54.13333 |
Point Pleasant Beach | 2.9806799 | 0.0496780 | 1.2500000 | 0.0208333 | 0.0000000 | 50.00000 |
Port Jervis | 1.8889984 | 0.0314833 | 0.0000000 | 0.0000000 | 0.0000000 | 77.26667 |
Princeton | 0.3820283 | 0.0063671 | 0.0166667 | 0.0002778 | 0.0000000 | 11.46667 |
Princeton Junction | 3.4141258 | 0.0569021 | 1.4833333 | 0.0247222 | 0.0000000 | 55.00000 |
Radburn Fair Lawn | 4.5040875 | 0.0750681 | 3.2333333 | 0.0538889 | 0.0000000 | 98.00000 |
Rahway | 4.9242468 | 0.0820708 | 3.2666667 | 0.0544444 | 0.0000000 | 66.00000 |
Ramsey Main St | 4.4770522 | 0.0746175 | 3.1166667 | 0.0519444 | 0.0000000 | 98.00000 |
Ramsey Route 17 | 4.8100658 | 0.0801678 | 3.1500000 | 0.0525000 | 0.0000000 | 98.00000 |
Raritan | 3.4545129 | 0.0575752 | 2.2000000 | 0.0366667 | 0.0000000 | 56.00000 |
Red Bank | 5.6495212 | 0.0941587 | 4.0500000 | 0.0675000 | 0.0000000 | 85.00000 |
Ridgewood | 4.5506652 | 0.0758444 | 3.2916667 | 0.0548611 | 0.0000000 | 98.00000 |
River Edge | 3.3353687 | 0.0555895 | 2.2833333 | 0.0380556 | 0.0000000 | 68.00000 |
Roselle Park | 4.7237920 | 0.0787299 | 3.2000000 | 0.0533333 | 0.0000000 | 76.00000 |
Rutherford | 5.1104014 | 0.0851734 | 4.0666667 | 0.0677778 | 0.0000000 | 98.00000 |
Salisbury Mills-Cornwall | 7.8822236 | 0.1313704 | 6.2166667 | 0.1036111 | 0.0000000 | 98.00000 |
Secaucus Concourse | 0.1000000 | 0.0016667 | 0.1000000 | 0.0016667 | 0.1000000 | 0.10000 |
Secaucus Lower Lvl | 3.7083083 | 0.0618051 | 3.0000000 | 0.0500000 | 0.0000000 | 99.00000 |
Secaucus Upper Lvl | 4.2567090 | 0.0709452 | 2.3500000 | 0.0391667 | 0.0000000 | 79.00000 |
Short Hills | 5.5491160 | 0.0924853 | 3.4666667 | 0.0577778 | 0.0000000 | 98.00000 |
Sloatsburg | 6.8826422 | 0.1147107 | 5.2166667 | 0.0869444 | 0.0000000 | 98.00000 |
Somerville | 3.8435295 | 0.0640588 | 2.3666667 | 0.0394444 | 0.0000000 | 80.00000 |
South Amboy | 5.5509681 | 0.0925161 | 4.0666667 | 0.0677778 | 0.0000000 | 67.11667 |
South Orange | 5.8126665 | 0.0968778 | 4.1166667 | 0.0686111 | 0.0000000 | 99.06667 |
Spring Lake | 5.3182724 | 0.0886379 | 4.1333333 | 0.0688889 | 0.0000000 | 49.00000 |
Spring Valley | 1.6377672 | 0.0272961 | 1.2000000 | 0.0200000 | 0.0000000 | 56.30000 |
Stirling | 4.5160907 | 0.0752682 | 2.6833333 | 0.0447222 | 0.0000000 | 98.00000 |
Suffern | 3.4095299 | 0.0568255 | 1.4666667 | 0.0244444 | 0.0000000 | 98.00000 |
Summit | 5.1305265 | 0.0855088 | 3.2833333 | 0.0547222 | 0.0000000 | 100.50000 |
Teterboro | 4.0396877 | 0.0673281 | 3.2000000 | 0.0533333 | 0.0000000 | 42.56667 |
Towaco | 5.5052982 | 0.0917550 | 3.3166667 | 0.0552778 | 0.0000000 | 69.00000 |
Trenton | 2.1226712 | 0.0353779 | 1.1833333 | 0.0197222 | 0.0000000 | 43.05000 |
Tuxedo | 7.1140173 | 0.1185670 | 5.3500000 | 0.0891667 | 0.0000000 | 98.00000 |
Union | 4.2109883 | 0.0701831 | 3.0000000 | 0.0500000 | 0.0000000 | 79.00000 |
Upper Montclair | 5.8551133 | 0.0975852 | 4.1500000 | 0.0691667 | 0.0000000 | 89.00000 |
Waldwick | 3.2202804 | 0.0536713 | 2.1666667 | 0.0361111 | 0.0000000 | 69.00000 |
Walnut Street | 5.3643026 | 0.0894050 | 3.5666667 | 0.0594444 | 0.0000000 | 88.00000 |
Watchung Avenue | 5.6386605 | 0.0939777 | 4.1166667 | 0.0686111 | 0.0000000 | 88.00000 |
Watsessing Avenue | 5.5358640 | 0.0922644 | 4.2083333 | 0.0701389 | 0.0000000 | 76.00000 |
Wayne-Route 23 | 6.8869474 | 0.1147825 | 5.1333333 | 0.0855556 | 0.1000000 | 68.00000 |
Wesmont | 4.3013032 | 0.0716884 | 3.1666667 | 0.0527778 | 0.0000000 | 56.15000 |
Westfield | 4.5479696 | 0.0757995 | 3.1416667 | 0.0523611 | 0.0000000 | 77.00000 |
Westwood | 4.0529744 | 0.0675496 | 3.2000000 | 0.0533333 | 0.0000000 | 69.00000 |
White House | 4.5695205 | 0.0761587 | 1.3416667 | 0.0223611 | 0.0000000 | 56.00000 |
Wood Ridge | 4.5781898 | 0.0763032 | 3.6000000 | 0.0600000 | 0.0000000 | 66.00000 |
Woodbridge | 4.9260991 | 0.0821017 | 3.2000000 | 0.0533333 | 0.0000000 | 53.06667 |
Woodcliff Lake | 4.0913741 | 0.0681896 | 3.2000000 | 0.0533333 | 0.0000000 | 47.30000 |
Figure 3.7 Average Delay by Stations in NJ
3.4 Delays across time and space
Fourthly, we investigate temporal (weekly and daily) and spatial delays.
Presented below are the daily mean delays throughout the week, outlined by day. As depicted in the plot, the average delays in the northern region of New Jersey range consistently below 10 minutes from Monday to Sunday. In contrast, the mean delay for the AC line exceeds 10 minutes, particularly on weekends where it peaks at 20 minutes.
Show
Figure 3.8 Mean Delay Minutes per Day by Line
Additionally, we examine the hourly delays categorized by line.
Show
Figure 3.9 Mean delay minutes per hour by line
Show
3.6 Delays impacted by weather
In this section, we investigate the lack of any significant correlation between train delays and the various weather conditions. And the reason for the result might be that only severe weather conditions are likely to lead to significant train delay. Our model was trained on a limited time period, and no noteworthy weather events occurred during that period to significantly manifest in the data. Although their impact was only marginal, our final model incorporated temperature, precipitation, visibility, and wind speed features due to their slight predictive enhancements.
Show
Figure 3.9 Train delays by weather conditions
4. Regression model building and training
In this part, we began to divide train and test set, build 5 different models.
4.1 Create Train and Test Set
We have the delay data from the 40th week to the 46th week. Considering that we calculated the lag data, there are the complete data from 41th week to the 46th week. Then we filter the first four weeks as our train set, and the last two weeks as our test set.
Show
Figure 3.10 NJ Transit Rail Trips by Week
4.2 Correlation Matrix
The correlation matrix shows the relationship across all the numeric variables. From Figure 3.10 below, there is no obvious correlation between any two variables. Thus, we kept all the variables for the models we built later.
Show
Figure 3.11 Correlation Matrix
4.3 Build Model
During the development of our model, we categorized the features we incorporated into four broad groups. These categories were systematically tested in various combinations, leading us to the presentation of five models, among which the one we ultimately selected is featured.
Here are our four feature categories:
- Time:
week
(Week number),dotw
(Weekday),hour
(Time of a day) - Space:
ST_NAME
(Name of stations),LINE_CODE
(Abbreviation for lines) - Time lag:
lag1Hour
,lag2Hours
,lag3Hours
,lag4Hours
,lag6Hours
,lag8Hours
,lag10Hours
,lag12Hours
- Weather:
temperature
,precipitation
,visibility
,windspeed
- Holidays:
holidayLag
Here are our three models:
-
- Time and Space
-
- Time, Space and Time Lag
-
- Time, Space, Time Lag and Weather
-
- Time, Space, TIme Lag, Weather and HolidayLag
5. Results
5.1 Modeling for Training Set
5.2 Predict for Test Set
5.3 MAEs
We validate the models based on the Mean Absolute Error (MAE) over time and can see that model B and Model C has the smallest MAE, whether it is calculated by week and model or by line.
Show
Figure 5.1 MAE: Delays by week and model
Show
Figure 5.2 MAE: Delays of more than 20 minutes by model and lines
5.4 Observed and Predicted Mean Hourly Delays
From predictions of two weeks’ data, we can see a significant enhancement in the ability of models B, C and D to predict highest peaks of delay. Also, they accurately predict low delay peaks and trends, meaning that incorporating time lags improves prediction accuracy. However, the models fail to precisely predict timing of high delay. We think it could be due to unforeseen situations.
Show
Figure 5.3 Predicted and Observed mean delay minutes by hour
5.5 Cross Validation
We conduct daily cross-validation on our linear regression model. Each day of the week sequentially becomes part of the training set, while the other days act as the test set. Results show that the average Mean Absolute Error (MAE) stands at 0.493 minutes, with the maximum daily MAE reaching about 1.2. The bulk of the MAE values cluster between 0.2 and 0.6, indicating that our model displays a relatively consistent and low prediction error across test set. This consistency suggests that our model has commendable predictive accuracy. However, despite the average error being relatively low, there remains room for improvement, especially in reducing the maximum error values.
Show
Figure 5.4 LOGO-CV: Mean Delay Duration by Day in a Week
Show
CrossValidation | Mean_MAE | SD_MAE |
---|---|---|
LOGO CV: day in a week | 0.493 | 0.39 |
6. Conclusion
The model trained for the App reveals the primary factors leading to train delays, providing a multidimensional periodic delay prediction for NJ Transit that considers both temporal and spatial dimensions. This enables our model to proactively forecast delays by analyzing New Jersey’s traffic hub dynamics.
The utility and efficacy of our model are evident, yet its refinement depends on an extensive, precise, and publicly accessible database from governmental and specialized sectors. By reviewing literature and historical delay instances, we aim to mitigate unpredictable factors such as mechanical failures, funding shortages, and unexpected events. In 2023, for instance, the Morris & Essex line saw significant commuter disruptions due to ongoing overhead line issues. Such disruptions, especially from signal problems in critical areas like tunnels, can lead to notable delays due to safety and operational concerns.
To enhance our model, we need to integrate potential factors that may cause train delays or cancellations as independent variables. Another key to improving our delay prediction model’s accuracy is to include comprehensive Amtrak data, such as actual and estimated arrival times and train IDs, which enrich our independent variable observations.
Despite these considerations, we are confident that our model, which includes a wide array of important predictors, is robust enough for NJ Transit’s application. The train delay prediction model we’ve developed for NJ Transit aims to optimize operations, improve cost-effectiveness, and elevate the passenger experience and service quality. This data-driven approach not only assists NJ Transit in better understanding and managing its operational challenges but also supports its sustainable development goals over the long term. Moving forward, we are committed to continually enhancing our prediction model to offer even more valuable solutions to NJ Transit and other public transit operators.