NYC Animated Traffic Maps and Branch Analysis | Congestion Zone
NYC Animated Traffic Maps and Branch Analysis
July 28th 2024
Updated 10/25/24 with Directional Traffic Flow Map
By Gabriel del Valle
www.linkedin.com/in/gabrielxdelvalle

Traffic Congestion Visualized with Imputed Data, Manhattan below 60th Street
Introduction
This past year, a sweeping economic change loomed over Manhattan, creating new obstacles for some businesses while promising a boom to residents and public transportation. It had been sold to the public as the best solution to concerns about the environment, quality of life, and the budget gap in the MTA. But it didn’t happen.
Manhattan's Congestion Relief Zone embodied a policy long advocated for, long prepared for, and specially approved by the president to be a first of its kind transportation policy in the United States. It was set to come into effect on June 30, 2024. Warning signs of the impending fees throughout entry points to the city had already gone up in the spring of that year.
But it didn’t happen. On June 5th, Governor of New York, Kathy Hochul, put in the order to postpone it indefinitely. Governor Hochul's rationale for the last minute veto was that it was too soon after the pandemic to strain businesses and the public.
What was the Congestion Relief Zone?
The core of the Congestion Relief Zone program was a once daily $15 charge for drivers who enter Manhattan below 60th street in peak hours, with the following conditions:
-
- Non peak hours cost $3.50
- Larger vehicles (trucks) pay more
- Smaller vehicles (motorcycles) pay less
- Cost of existing tolls into Manhattan subtracted from $15 charge
- Discounts available for income below $60,000 or disability
- FDR Drive, West Street, and Hugh L Carey Tunnel are not part of the toll zone
- The policy targeted ride-hire apps, adding a toll of $2.50 to each trip in the zone
In the years leading up to the Congestion Relief Zone plan, NYC traffic studies** had identified ride-hire apps as a recent major contributor to congestion, with drivers not only taking extra road space to compete with taxis, but also often driving around high demand areas just to look for passengers.
Core Value Question
In addition to producing revenue for the MTA, the aim of the Congestion Relief Zone policy is to increase the mobility (efficiency of transport) of downtown streets by decreasing congestion. Thus, the core value question which concerns all manner of businesses is -- What change in mobility can be expected for each street?
Stakeholders affected by congestion pricing policies
-
- Businesses that manage a fleet of vehicles such as ride-hire and delivery services
-
- Professional drivers
-
- Businesses that manage a labor pool of commuters
- Businesses which rely on trucked goods
- Businesses and residents concerned with real estate prices
- NYC government officials
- Urban researchers, and other groups concerned with understanding and improving the health of urban environments
The information that is valuable to residents, business oriented, and urbanism oriented stakeholders is how effectively and in what specific streets does the Congestion Zone achieve its target goal of reducing the volume of traffic to increase mobility and increase transport speeds.
The core value question -- What change in mobility can be expected for each street? -- Is key to many other economic questions, including:
Project Aims
In order to contribute meaningful insights to the core value question, this project sought to:
Because the Congestion Relief Zone is designed to target the entry points into the congestion zone with a toll, it’s essential to understand the impact of source streets when assessing how successful the program is in terms of the goal of reducing traffic volume to streets within the zone.
Map illustrating source streets into the Congestion Relief Zone:

Note: In this graphic, 6 Avenue is excluded because it is the one source avenue not present in the traffic dataset. On the other hand, Lincoln Tunnel, Holland Tunnel, and Hugh L Carey Tunnel that you see pictured here are not available in the dataset.
Measuring Congestion
Volume as a measure alone is not enough to compare the rate of traffic between streets, because a volume of 100 may indicate low traffic on Broadway, but it would rank as high traffic on Water Street.
To function as a relative measure, congestion = current street traffic volume / max traffic volume for each street
As a relative measure congestion can also be normalized and displayed with a color code. This allows the mobility of each street to be seen at a glance in a map or a bar graph.
Maps Produced by this Project
Each of these maps visualizes the full range of the available dates within the Octobers of 2016 to 2019, down to a 15 minute interval, coloring each street on a spectrum from green, to yellow, to red, to indicate its normalized level of congestion.
-
- Open NYC Automated Traffic Counts data stopped recording in 2020 and has not resumed, though the datasets remain updated.
- Data recording is highly concentrated in the month of October.
In addition to using color to represent a congestion value, they are each accompanied by an animated bar graph that labels streets in descending order of traffic volume.
The maps are drawn using geojson datasets that were refined to the geography of the Congestion Relief Zone, and formatted to share a naming scheme with the NYC Automated Traffic Counts Dataset. A custom script was made to color each street with colors according to its values at a given date and time.
Average Congestion per Street per 15 Minute Interval
This map visualizes average congestion per street because in the dataset there are often multiple rows per street per datetime interval. These multiple rows per street per interval are the result of multiple data recording stations per street, as well as variations in the values of 'direction', 'toStreet', and 'fromStreet' columns for each street.
Averaging these multiple rows together allows a street to represent a single value / color. However, it's important to recognize that this aggregation makes this map a descriptive statistic rather than a true simulation of the dataset. Coloring the streets a single color is an effective way to visualize the data; however, seeing the entire street highlighted across the city may misrepresent that data is often recorded from only a single source along a street, obscuring the nuance of where congestion may be higher or lower along different points in the street.
For example: FDR Drive has the highest average volume of all streets in the dataset, with a maximum volume over 2100. However, the junctions where congestion is the highest are averaged into long stretches of fast moving highway traffic. As a result, FDR Drive never appears red in this dataset because the average congestion never approaches the maximum congestion point.
This is a short clip of the full 1 hour 29 minute video representing all October dates 2016-2019 Full video: https://youtu.be/aNUBE7UanMY?si=E4KKpVrh1n4oHyNZ
Congestion per Data Recording Node 15 Minute Interval
In contrast to the previous map, this map provides a discrete depiction of the traffic data, rather than an average. Opposed to coloring streets with a congestion value, it plots a marker at the location where the data was recorded, illustrating both the source of the data on the map and the data readings. The congestion value of each street is portrayed with the coloring of the node marker.
To do so this map utilizes the Wkt Geometry coordinates that were stored in each entry of the NYC Automated Traffic counts database. Plotting these points on the geojson map allows them to be placed in context (by default they didn’t include a reference map). To align the Wkt Geometry to be used with the geojson map required aligning their coordinate systems.
This map reveals important insights into the data derived from the exact location where the data was recorded. In most cases street data is recorded from a single point on the street In some few cases, recording sites are densely packed along a street, providing highly precise data.
In the few cases where several recording sites are packed together along a single street, the animation reveals detailed traffic patterns, with shifts in congestion swaying forward and back across the street. An increase in congestion can be seen clogging the street with a wave which grows red, until a wave of first yellow, then green, represents a backup gaining speed and clearing the street until a new source of congestion is introduced.
To build on the Data Node animation's ability to portray the flow of traffic, I've developed a Directional Traffic Flow map (added 10/25/24) which indicates the direction that traffic
This is a short clip of the full 1 hour 29 minute video representing October dates 2016-2019 Full video : https://www.youtube.com/watch?v=x_-C6Yj3dFg
Average Congestion per Street per 15 Minute Interval (Imputed Data)
This map visualizes the same average congestion per street per datetime interval as the first map, but uses imputed data so that all streets are portrayed in each frame. Streets that remain gray are missing from the dataset.Missing data was imputed according to the average volume per street per hour in order to fit the imputation to the dominant trend in the data.
This dominant trend is, essentially, that the city that never sleeps at least nods off for a bit, decreasing activity between midnight and 4am before the rising swell of morning activities becomes a mountain by midday. That activity begins to diminish slowly after rush hour ends at 7pm.

#Average Volume per hour per street, used as the pattern to impute missing data.

#Imputed Data compared to Original Data, Average volume per hour per day
Full video length 1 hour 38:46 Full video: https://youtu.be/xhVpZdrdkjQ?si=B4VgzAhrXLscI7iO
Branches
The analysis starts from a chosen source street from which traffic enters the congestion relief zone. The average volume of the source street at the selected hour is the basis for the estimate. With each branch added to the analysis, an estimate is made for how much traffic volume is passed from the source street at the selected hour to each street of the branch.
Thus with this method, one can create an estimate of how much traffic from a given source street at a given hour impacts the traffic volume of the entire congestion zone. By quantifying how much traffic volume each source street contributes to the rest of the streets in the region, this analysis method provides a baseline to compare and measure the mobility effects from the implementation of a congestion zone. This method also makes it possible to estimate which streets will receive the greatest gains in mobility by identifying which streets receive the most traffic from outside the Congestion Relief Zone.
Analysis assumptions:
-
- The time frame of an hour is used under the assumption that the average driver should be able to reach any destination within the congestion relief zone region once they have entered the congestion zone.
- Each branch excludes previously listed streets under the assumption that the average driver will not backtrack to a previous street.
Note: the values of 'toSt' include streets which are not themselves tracked in the Automated Traffic Counts Dataset and, therefore, are not mapped, though they remain as a valid element in the branch analysis.
Directional Volume Flow Animated Map (Added 10/25/24)
To build on the Data Node animation's ability to portray the flow of traffic using WktGeom markers from the Automated Traffic Counts dataset, I've developed a new system to portray the to-street and from-street relationships in the data and the directional exchanges of volume they represent.
My goal was to portray these directional volume relationships using markers systemically placed along the length of each relevant street. These markers both clarify where directional volume flow has been recorded in the Automated Traffic Counts dataset and allow me to use a color gradient to portray the direction of the traffic.
- I located exchanges of volume on the map by identifying intersections where volume was passed between two different streets, which could be corresponded to geometric intersections between streets in the base_map.geojson dataset.
- An intersection marker is generated for each intersection recorded in the data by comparing geometric intersections and the names of streets where geometric intersections occur in the base_map.geojson dataset.

Map of geometric intersections in the base map that correspond to exchanges of volume in the Automated Traffic Counts Dataset
- In all rows of the Automated Traffic Counts dataset, the flow of traffic volume exchanged between intersections is uni-directional.
- The grey-scale street markers are placed between intersections that have an exchange of volume recorded between them in the Automated Traffic Counts dataset.
- Street index values indicate the direction of flow: Index values towards 0 connect to "gain_streets" whereas index values at the end of the list connect to "lose_streets".
- Greyscale colors are assigned to street markers, normalized to the number of markers between two intersections.
The greyscale colors do not imply volume quantities, as there is not data to support the assignment of specific volumes to the street markers between known intersections. Rather, the greyscale colors are purely meant to indicate the directional flow of the traffic.
While intersection markers could be colored, as they have known to_volume, from_volume, net_volume, absolute_volume, and congestion values per datetime, I chose not to color intersection markers, since intersection markers are most often very near the WktGeom marker that is the source of its data. Thus, allowing the WktGeom markers alone to reflect the volume / congestion values gives clarity and precision to where recorded volumes correspond.
This is a short clip of the full 48 minute video representing the datetimes that are relevant to both WktGeom markers and intersections.
Full Video: https://youtu.be/5sDdGmy6npU?si=YYcW0UaiX13JVEjv
Note for the MTA Open Data competition 2024: The Full Video was uploaded to Youtube 10/26/24
To further develop what I have started here, I could do more to communicate the relationship between intersection markers and WktGeom markers, such as:
- coloring intersection markers (with a nonclashing colormap like blue to purple), to portray the volumes gained or lost at each intersection
- connecting streets between intersection markers and WktGeom markers with street markers if there is a corresponding flow of data between them. Despite that WktGeom markers are rarely perfectly aligned with an intersection (which helps to generate and correspond street markers), a WktGeom marker could be replaced by the nearest intersection marker on the map. This method has been proven to work but not integrated into the current output of this project.
- One weakness which this new map reveals is that there are WktGeom markers beyond the bounds of the Congestion Zone that have been averaged into the volumes of the other maps. They could be removed by using the matplotlib axis to select markers based on geometric boundaries. These markers can then be corresponded to rows that can be excluded from the dataset
Overall, this new system has advanced my goal of portraying the maximum amount of traffic flow information available in the Automated Traffic Counts dataset in an animated form. Potentially, the system for systemically generating and coordinating markers could be combined with more information on traffic movement patterns to portray a simulation of traffic flow.
Analysis Conclusion -
Identifying integral street networks and insightful data
To the extent that the available data is able to represent the volume passed between branches, and in acknowledgement of the data's many limitations in this regard, predict which streets would gain the most mobility as a result of the congestion relief zone, based on which gain most of their volumes from source streets.
The graphs, ranking, and volume distribution estimates in this portion of the analysis are made by summing the volume passed from each source street to each of its branches. This creates a direct one-to-one node connection for each source street and each of its branches.
Network graph of traffic volume distribution -- Source Street in Red
Reading this graph:
The shape of this graph is configured with the spring layout algorithm. It positions nodes so that the length of each connecting line is as similar as possible, and distributes nodes so that there are as few crossing edges as possible.
For the distribution of nodes, this makes those that share an edge closer together; nodes that are not connected are farther apart.
-
- Source streets are clustered towards the middle, as source streets are connected to the variety of branches more than the branches are connected to each other.
- A similar position shared by two source streets towards the center may indicate that the two source streets are connected and/or that they share a similar set of connected branches
- Though nodes towards the center tend to have more connections, a branch with an average number of connections may also be positioned in the interior of the circle if its connections are widely distributed across the circle.
The following link is an interactive Plotly graph:
Limitations of the data for branch volume distribution estimates:
If the data were sufficient to support predictions, the branch volume distribution estimates would have provided predictive value by identifying which streets receive the most volume from the entry points targeted by the toll. However, the number of connections for many major source streets eliminates them from branch analysis.
In addition to the holes in the timescale and streets recorded in the traffic dataset, there are additional holes in the connections recorded with the toSt and fromSt columns. For example, Williamsburg Bridge and West Street, both major traffic sources, have only one connection each, and neither to streets which are recorded in the dataset.
Only a source street with an accurate number of branches can be used to estimate volume distributions, as the volume distribution to each branch must be the sum of the average volume of the source street.
While most of the source streets in the dataset do not have enough connections to provide insight in this dataset, the most integral branch networks can be identified by ranking their number of connections.
Identifying integral streets:
This ranking exposes that out of all the source streets, 5 AVENUE, 2 AVENUE, 3 AVENUE, and 1 AVENUE have a much higher number of connections. This qualifies the volume distribution estimates for the branches of these streets as much more reliable than the others in the dataset.
Thus, I have identified where one would look in this dataset for the most integral estimate of volume distribution through branch analysis.
Data Insight:
Branch volume (daily average) distribution estimation for integral streets ranked (summarize first 20 rows):
Project Process
See how these maps were made on Github:
https://github.com/Corriande/Manhattan-Congestion-Zone-Animation-Analysis
Data Sources:
There are two publicly available datasets which must be used to run these notebooks, both from OpenNYC:
Automated Traffic Counts Dataset
As of 07/11/24, last updated -- 04/02/24
https://data.cityofnewyork.us/Transportation/Automated-Traffic-Volume-Counts/7ym2-wayt/about_data
Dates Queried: 2016 to 2019, Note: Data collection ceased after 2019
NYC Street Centerline (CSCL) -- Geojson shapefile
As of 07/11/24, last updated -- 07/08/24
https://data.cityofnewyork.us/City-Government/NYC-Street-Centerline-CSCL-/exjm-f27b
More information on these Data Sources Detailed in the README.
In my repository you will find...
Seven sequential Python Jupyter notebooks outline step by step with detailed comments how to produce three kinds of animated traffic maps, as well as how to perform and graph a branch analysis.
01_Initial_Congestion_Zone_Data_Framing.ipynb
- Refine both traffic data and map data to area of interest and align their naming schemes
02_AzureSynapse_czone_missing_dates.ipynb
- A Microsoft Azure Synapse Analytics Notebook, made to work with Apache PySpark. Distributed computing is used to impute NA for the unrecorded intervals of time per street per datetime in the traffic counts dataset. While this process was useful in my investigation for data density to make the public data more integral, this information is only strictly necessary for imputing missing data, which forms the basis for only one of the three animated maps.
03_Post_Process_Data_Exploration.ipynb
- Find the distribution of data and missing data
- Create Descriptive Statistics
- Explore Mapping Methods
- Impute Missing Data
04_Congestion_Map_and_Bargraph_Animation.ipynb
- This Python Jupyter notebook contains several functions that can be used together to produce a video of a Manhattan Congestion Zone Map, displaying average volume per street per datetime, as well as a corresponding animated bar graph.
05_Imputed_Congestion_Map_and_Bargraph_Animation.ipynb
- This notebook is the same as the previous one, though it includes the imputed_congestion.csv dataset
06_Animate_WktGeom_Map.ipynb
- Maps the WktGeom data included in the original Automated Traffic Counts Dataset from OpenNYC, which indicates with coordinates the location at which data was recorded. When plotted these coordinates create a circular marker but not a map. The maps produced by this notebook, on the other hand, are a much closer representation of the dataset and comes close to simulating the elastic nature of traffic movement.
07_Traffic_Routes_Branch_Analysis.ipynb
- Mappable Branch analysis system: Identifies the possible next streets one could take from a given street. The analysis starts from a chosen source street, from which traffic enters the congestion relief zone. The average volume of the source street at the selected hour is the basis for the estimate. With each branch added to the analysis, an estimate is made for how much traffic volume is passed from the source street at the selected hour to each street of the branch.
- Create a dataframe which sums the volume passed to each branch from each source street
- Rank streets by their volume received from source streets to predict which would gain the most mobility of the congestion zone (within limits of the data's integrity)
- Identify which streets are more integral to draw insights from branch analysis based on whether their number of connections is sufficient to describe the average daily volume of the source street. Rank by number of connections.
- Graph network connections between source streets and branches
New Additions 10/25/24:
08_Intersections.ipynb
- Identifies geometric intersections in the basemap.geojson dataset and the names of intersecting streets
- Identifies exchanges of volume between to different streets using the toSt and fromSt columns of the Automated Traffic Counts Dataset
- Locates these exchanges of volumes on the map, giving them a precise spatial coordinate where previously there was none
- Creates the sorted_intersection_dates dataset, with information on all intersectional exchanges of volume per datetime, info on both intersecting streets, and an intersection marker to plot this data on the map. This dataset is used in 09_Directional_Volume_Flow_Map
09_Directional_Volume_Flow_Map.ipynb
- Maps the WktGeom data included in the Automated Traffic Counts dataset, as well as portraying the direction of traffic volume exchanges between known intersections (recorded in sorted_intersection_dates) by systemically fitting markers to each street, and coloring the markers between intersections with a grey scale gradient. The gradient becomes darker towards the "gain street" and whiter towards the "lose street" between two intersections that exchange traffic volume.
Again, if you'd like access to datasets, map images, or map videos, please feel free to reach out on Linkedin (www.linkedin.com/in/gabrielxdelvalle).
Discussion
Positive and Negative Considerations of the Congestion Relief Zone Policy
The sum of the policy's changes could raise the cost of living, working, and operating a business in Manhattan. In addition to the cost placed on all who drive into Manhattan daily for work and ride-hire apps, trucks of all kinds would receive the highest fees. Trucks are the source for nearly all retail goods and supplies the city depends on, and thus this policy could be expected to raise retail prices in the congestion zone across the board.
And yet, congestion relief zones implemented in cities across the world have been supposedly well-received by their residents and well studied by urban economists. Before I knew of NYC's plan to implement a congestion zone, I learned of their practice through MIT urban economist Alain Bertard's 2018 book, Order Without Design: How Markets Shape Cities.
Some Perspectives "Order Without Design" Can Offer About Manhattan's Congestion Relief Zone:
Manhattan's streets are all at once rare, extremely difficult to expand, and also a vital economic resource:
Manhattan takes in 900,000 vehicles daily, and the extreme congestion keeps their speeds down to an average of just 7 miles per hour across the island. That includes the trucks that supply the vast majority of retail goods and food to both stores and individual homes. . No matter how astronomical the demand, increasing the size of the streets in Manhattan is simply not feasible. Real estate prices are too high to justify sacrificing built space for more roads. And of course, the density of New York City has important cultural value as well, with the rich urban experience of density motivating the desire of many residents, supporting a diverse and highly desirable market of local businesses. Manhattan residents already rose in defiance against the intrusion of elevated highways through their historic neighborhoods in 1955, when the grassroots pro-urban movement of Jane Jacobs defended Greenwich village against the rampant highway developer City Commissioner Robert Moses.
What happens when the demand for streets outpaces the supply?
If a collective cost remains invisible, it becomes an externality.
It's easy to accept the idea of constant traffic in Manhattan as inevitable, but it's possible to reach a limit at which streets become so overburdened that they fail to do their most important jobs-- supporting a labor force, a consumption market, and supplying resources efficiently. In the case of for-hire vehicle, companies increasing congestion by competing with existing taxis and circulating in high demand areas to look for customers, a lack of disincentives leads to overuse of the public resource. The cost of this overuse is paid by the rest of the city who have their quality of mobility diminished.
Road space and the efficiency with which it is used is a vital indicator of the health of an urban economy. It determines the reach of the labor pool that a city can support, the economic activities it can sustain, the energy used, and the quality of life of its citizens.
Notice that the Congestion Relief Zone plan priced the congestion fee according to the size of the vehicle, with motorcycles paying less, and large vehicles paying more. This reflects the intent to have drivers consider road space as an economic resource consumed by their use of a vehicle. The more space you take up, the more you pay. The policy reflects the economic ideal that when drivers are made to consider the space they take in the city street as a form of resource consumption with an associated cost, the driver will learn to consume space more efficiently with their vehicle use choices and their decisions for what purposes to drive. Properly priced and regulated, the resource scarcity of street space in Manhattan can be balanced, putting a limit on overuse which diminishes the mobility of all.
The Argument for the Congestion Relief Zone
Singapore, Stockholm, London, and Milan have reported, thanks to their congestion pricing programs, efficiency increases to the point that drivers recover the cost of the fee from savings in gas money, and overall save time from spending less time slowed by traffic.
It's this history of positive economic results from which an NYC study projected the following results:
https://rpa.org/work/reports/congestion-pricing-in-nyc
Regional Plan Association - NYC Congestion Pricing: Getting it Right (Campaign Report):
-
- $1.01 billion yearly revenue for the NYC Metro Transit Authority
- (-7%) C02 emissions
- 10.3% weekday peak speed increase
- 58,900 reduced auto trips on weekdays
- Reduction in air pollution and noise
The support for Congestion Zones from urban economists like Alain Bertard is based on the potential to establish a win-win scenario where public transportation funds can be raised, the value delivered to drivers can be improved, and the key urban resource of road space could be efficiently moderated. Cost to the public, and the equality of the impact of these costs are a central concern with such a policy, but in the best case scenario the cost to drivers is offset by their savings in transportation time and energy costs. Despite NYC’s efforts to make these concerns central to the policy, critics remained skeptical that the policy could actually deliver traffic and environmental improvements to the public. For example, Ana Ley writing for the New York Times warned that pollution and traffic could be simply displaced from wealthy downtown neighborhoods to regions which bypass the Congestion Zone, such as the Bronx:
https://www.nytimes.com/2022/09/12/nyregion/nyc-congestion-pricing-manhattan-bronx.html
Data Opportunity for future MTA projects
In the context of the Congestion Relief Zone's failure to ratify, the identification of holes in NYC's traffic data is one of the major values created by this project. No matter how, to its advocates, the Congestion Relief Zone appeared as an obvious win-win, in the end lack of faith in these benefits and the ability to communicate these tradeoffs to the public won out over the assurances of the plan's positive track record. As a result, the NYC Metro Transit Authority loses their expected 1.01 billion dollar annual revenue after investing $507 million dollars into new traffic cameras for the program. For the opportunity to regain this revenue stream, the MTA would need to present a more convincing case for the benefits of the program, or a more advanced version of the program with a more nuanced pricing system that optimizes the desired result while decreasing the cost to the public.
For the opportunity to regain its Congestion Zone revenue stream, the MTA could make use of its new and now purposeless traffic cameras to collect more data. For this task this project provides a means of identifying holes in data and illustrating findings.
Perspective: This project sheds light on the difference between NYC's available data collection methods and those used by various hubs of private traffic data, including Google, Apple, and Uber. These private companies have the ability to track their users’ phones directly for traffic information, as they are able to include these permissions in their user's terms of services. Meanwhile, the government doesn't have the right to track its citizens for traffic data, and are forced to rely on cameras, and other recording technologies like traffic counting cables that can be located on public property.
Identified Missing Data
The mapping systems developed in this project makes the missing data easily and intuitively identifiable. A major step in the project was imputing the dates and time intervals per each street where data was not recorded. I used distributed computing with Azure Synapse Notebooks and PySpark to compute the 30,000,000+ unrecorded datetime intervals per street, however, this was before I recognized the density of the data in October. If I had imputed only unrecorded rows for October the task would have been much more feasible to compute on a single machine.
Fin.
Thank you for reading. If you are interested in any aspects of this, my first data science project, please feel free to reach out:
www.linkedin.com/in/gabrielxdelvalle