_{1}

As a global financial center, the transportation system in New York City (NYC) has always been studied from various aspects. Since 2009, NYC Taxi and Limousine Commission have made public the information on NYC taxi operations, offering an opportunity for detailed analysis. Thus, this research project investigates taxi operations in New York City based on big data analysis. The correlation between taxi operations and different types of weather, including precipitation, snow depth, and snowfall is discussed in this paper. The research also evaluates taxi trip distribution in each NTA area using Geopandas, and presents its density on an NYC map.

As a global financial center, New York City is frequently studied by researchers, and its transportation has become an increasingly important topic. A large amount of data related to transportation released by NYC Taxi and Limousine Commission makes more sophisticated analysis possible. Using big data analysis to study taxi operations in the city of New York, this research paper explores the statistics of taxi’s payment type, daily and monthly trend of taxi operation, its long-term trend, and the impact of weather. To process the data, econometrics is used to find out comprehensive results.

Chris Whong and Todd W. Schneider have conducted similar research on taxi operations in the past. In 2013, Whong studied 170 million taxi trips in NYC and collected information of each trip’s tip, total payment, number of passengers, trip start point, and trip end point [

In this research paper, the impact of different types of weather, including precipitation, snow depth, and snowfall, will be evaluated to determine factors that affect taxi operations. In addition, distribution of taxi trips in each NTA area defined by Geopandas will be studied, and its density will be shown on a plotted NYC map.

This research uses data of taxi operations between 2009 to 2015 from NYC Taxi and Limousine Commission, with a focus on the newest data from year 2015 [

This research requires the use of Python for programming, and programming cells are run on Jupyter notebook. Numpy, Pandas, Geopandas, Matplotlib are applied to process data. Specifically, Numpy and Pandas are used to analyze array and data frame data, Geopandas is used for geometry data, and Matplotlib is used to plot graphs. Linear regression and linear algebra are later used to determine the functional relation of the selected data.

First, basic information of taxi operations in January 2015 is studied. Several columns of information related to the topic, such as pickup time, are selected, and the raw data is then read into Jupyter notebook using “read.csv”. The data is grouped by day since daily trips are the main targets. The result, as plotted in ^{th} January, when trip amount decreases sharply by around 150 thousand, and increases later to 500 thousand on 30th January.

Next, the average amount of hourly trip is learned: data of trip amount of January is grouped by 24 hours and then divided by 31, as there are 31 days in January. Consequently, the result of average trip amount of each hour in January is obtained. As shown in

The impact of weather is then considered. Data of average snow depth in NYC, January 2015, is read into Jupyter notebook and is arranged by time, which corresponds to each day of January 2015. Linear regression is applied to build a functional relationship between snow depth and daily trips in January.

In

Following the analysis of taxi operations in January 2015, data of the whole year is studied. Data of taxi trip operations from February to December of 2015 and data of weather are read into Jupyter notebook respectively. After daily trip amount is selected and combined to a data frame, linear regression is applied to show the relationship between snow depth and the amount of daily trips in the year of 2015. According to

Using the same method, linear regression is applied to test the relationship between snowfall and the amount of daily trips in 2015.

Another liner regression is done to test the relationship between daily trips and precipitation in 2015. In

In addition to the study of daily trips, the trend of monthly average trips in 2015 is examined and graphed. The number of days per month is standardized to 30 days. As shown in

In

In a similar manner, it is found that the trend of average hourly trips throughout the year is almost the same with that in January.

In

Data of payment to taxis in year 2015 is also decomposed to analyze variation of payment in weekday.

In

The location of taxi trips is then carefully studied. A map of NYC is used and read into the Jupyter notebook. It is converted to Geopandas format in order for python to analyze. To find out the trip amount in each NTA area, a function is set up to select which NTA area each trip belongs to. The result is shown in

This research paper mainly analyzed basic information of taxi trips in 2015 in the city of New York. The trend of the amount of daily trip and average hourly trip in January is first studied. The weather’s impact on the trip amount is then discussed using linear regression to find out the relationship between snow depth, snowfall, precipitation and trip amount. The difference between the impact caused by snowfall and snow depth is later compared. Furthermore, monthly trend, weekday trip amount, hourly average trip, and weekday payment are examined based on the data of 2015. In addition, the distribution of taxi trips in each NTA area is discussed and their density is shown on an NYC map. Finally, by looking at the trips from 8 a.m. to 10 a.m. in

The limitation of this research is that it takes a long time to run such a great number of data (2 GB for one month). It takes an afternoon to run only one cell of a year’s data, which greatly restricts the amount of data used in the research process. Another limitation is the lack of visualization. The results are mostly shown through graphs, but not by animations which would allow readers to understand more comprehensively and directly.

The author declares no conflicts of interest regarding the publication of this paper.

Tang, Y.X. (2019) Big Data Analytics of Taxi Operations in New York City. American Journal of Operations Research, 9, 192-199. https://doi.org/10.4236/ajor.2019.94012