Nyc

10/2/2018 nyc-311-data-analysis/311_data_analysis.
ipynb at master · ikp-ogbeide/nyc-311-data-analysis · GitHub
ikp-ogbeide / nyc-311-data-analysis
Dismiss
Join GitHub today
GitHub is home to over 28 million developers working together to host
and review code, manage projects, and build software together.
Sign up
Branch: master nyc-311-data-analysis / 311_data_analysis.ipynb Find file Copy path
ikp-ogbeide Update file e830bc7 on Jan 2
1 contributor
2540 lines (2539 sloc) 716 KB
https://github.com/ikp-ogbeide/nyc-311-data-analysis/blob/master/311_data_analysis.ipynb 1/12
10/2/2018 nyc-311-data-analysis/311_data_analysis.ipynb at master · ikp-ogbeide/nyc-311-data-analysis · GitHub
Exploratory Data Analysis of the 2014 New York City 311 Call Data
The aim of this EDA is to understand the NYC 311 call data and observe any noticible trends that we can make predictions on.
Author: Ikponmwosa Felix Ogbeide
Questions to answer through this project:
Which boroughs have the most call incidents
Which agency gets the most incidents
What borough has the fastest incident resolution time
How does the incidents vary month to month
PS
I'm also on the lookout for interesting trends and observations beyond the questions above!
Before I begin exploring the data, it is essential to understand what type of data I am working with. This would enable me decide on what kind data
cleaning or wrangling is necessary for this project
In [1]: # Import python libraries data manipulation

import numpy as np
import pandas as pd
In [2]: # Read 311 data from xls file into pandas dataframe
datafile = '311_Service_Requests_from_2014.csv'
df = pd.read_csv(datafile)
df.head(3)
C:\Users\Ikp\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py:2698: DtypeWarning: Columns (8,1

7,40,41,42,43,44,45,46,47,48,49) have mixed types. Specify dtype option on import or set low_memory=False.
interactivity=interactivity, compiler=compiler, result=result)
Out[2]:
Unique Created Closed Agency Complaint Incident Incident
Agency Descriptor Location Type
Key Date Date Name Type Zip Address
07/11/2014 08/05/2014
Department of Sidewalk 37-73 104
0 28457271 03:08:58 12:41:37 DOT Defacement Sidewalk 11368
Transportation Condition STREET
PM PM
08/08/2014 08/12/2014 Department of 113

Consumer False
1 28644314 02:06:22 11:33:34 DCA Consumer NaN 10014 WASHINGTO
Complaint Advertising
PM AM Affairs PLACE
11/18/2014 11/18/2014 New York City

Blocked 42-10 159
2 29306886 12:52:40 01:35:22 NYPD Police No Access Street/Sidewalk 11358
Driveway STREET
AM AM Department
3 rows × 53 columns
In [3]: # Setting pandas maximum colummns display so as to view all columns at once.
pd.set_option('display.max_columns', None)
In [4]: # Since the datafile has a Unique Key column its better to have that as the index
# Re-read datafile into df pandas dataframe setting unique key as index
df = pd.read_csv(datafile, index_col='Unique Key')

interactivity=interactivity, compiler=compiler, result=result)
C:\Users\Ikp\Anaconda3\lib\site-packages\numpy\lib\arraysetops.py:463: FutureWarning: elementwise comparis
on failed; returning scalar instead, but in the future will perform elementwise comparison
mask |= (ar1 == a)
From warning, we know we have mixed types in some columns. Its best to investigate what the mixtypes are.
In [5]: # I'll create a list to store the data with mixed data types.
# Later on in the project, I'll visit this list of mixed data types
df_mixed_dt = [df.columns[7], df.columns[16], df.columns[39],df.columns[40],df.columns[41],df.columns[42],
df.columns[43], df.columns[44], df.columns[45],df.columns[46], df.columns[47], df.columns[4
8]]
In [6]: # View object types of each columns

df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2099056 entries, 28457271 to 29607189
Data columns (total 52 columns):
Created Date object
Closed Date object
Agency object
Agency Name object
Complaint Type object
Descriptor object
Location Type object
Incident Zip object
Incident Address object
Street Name object
Cross Street 1 object
Cross Street 2 object
Intersection Street 1 object
Intersection Street 2 object
Address Type object
City object
Landmark object
Facility Type object
Status object
Due Date object
Resolution Description object
Resolution Action Updated Date object
Community Board object
Borough object
X Coordinate (State Plane) float64
Y Coordinate (State Plane) float64
Park Facility Name object
Park Borough object
School Name object
School Number object
School Region object
School Code object
School Phone Number object
School Address object
School City object
School State object
School Zip object
School Not Found object
School or Citywide Complaint float64
Vehicle Type object
Taxi Company Borough object
Taxi Pick Up Location object
Bridge Highway Name object
Bridge Highway Direction object
Road Ramp object
Bridge Highway Segment object
Garage Lot Name object
Ferry Direction object
Ferry Terminal Name object
Latitude float64
Longitude float64
Location object
dtypes: float64(5), object(47)
memory usage: 848.8+ MB
In [7]: # Count of non-null values for every column

df_columns = list(df.columns)
df[df_columns].count()
Out[7]: Created Date 2099056

Closed Date 2053916
Agency 2099056
Agency Name 2099056
Complaint Type 2099056
Descriptor 2086658
Location Type 1546759
Incident Zip 1952035
Incident Address 1641505
Street Name 1641401
Cross Street 1 1675745
Cross Street 2 1668168
Intersection Street 1 323622
Intersection Street 2 323523
Address Type 1994598
City 1953071
Landmark 810
Facility Type 536769
Status 2099053
Due Date 870518
Resolution Description 2085477
Resolution Action Updated Date 2060578
Community Board 2099056
Borough 2099056
X Coordinate (State Plane) 1888333
Y Coordinate (State Plane) 1888333
Park Facility Name 2099056
Park Borough 2099056
School Name 2099056
School Number 2098598
School Region 2088711
School Code 2088711
School Phone Number 2099056
School Address 2099055
School City 2099056
School State 2099056
School Zip 2099054
School Not Found 857159
School or Citywide Complaint 0
Vehicle Type 481
Taxi Company Borough 1689
Taxi Pick Up Location 14906
Bridge Highway Name 9223
Bridge Highway Direction 9216
Road Ramp 9177
Bridge Highway Segment 10678
Garage Lot Name 624
Ferry Direction 479
Ferry Terminal Name 1596
Latitude 1888333
Longitude 1888333
Location 1888333
dtype: int64
Columns with a lot of missing data won't be useful to my analysis. Also, some columns in the data aren't just useful to this analysis, its
best to remove these columns
In [8]: # A list of columns to remove from the dataframe

df_cols_rmv = ['Incident Address', 'Street Name', 'Cross Street 1', 'Cross Street 2',
'Intersection Street 1', 'Intersection Street 2', 'Landmark', 'Facility Type',
'Due Date', 'Resolution Description','Community Board', 'X Coordinate (State Plane)',
'Y Coordinate (State Plane)', 'Park Facility Name', 'Park Borough', 'School Name',
'School Number', 'School Region', 'School Code', 'School Phone Number', 'School Address',
'School City', 'School State', 'School Zip', 'School Not Found', 'School or Citywide Compla
int',
'Vehicle Type', 'Taxi Company Borough', 'Taxi Pick Up Location', 'Bridge Highway Name',
'Bridge Highway Direction', 'Road Ramp', 'Bridge Highway Segment', 'Garage Lot Name',
'Ferry Direction', 'Ferry Terminal Name', 'Location', 'Address Type', 'Agency Name',
'Resolution Action Updated Date', 'Descriptor', 'Location Type']
In [9]: # Remove the columns added to the df_cols_rmv list from df dataframe
df.drop(df_cols_rmv, inplace=True, axis=1)
In [10]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2099056 entries, 28457271 to 29607189
Data columns (total 10 columns):
Created Date object
Closed Date object
Agency object
Complaint Type object
Incident Zip object
City object
Status object
Borough object
Latitude float64
Longitude float64
dtypes: float64(2), object(8)
memory usage: 176.2+ MB
In [11]: #Time to investigate df columns with mixed data types

df_mixed_dt
Out[11]: ['Incident Zip',

'Landmark',
'Vehicle Type',
'Taxi Company Borough',
'Taxi Pick Up Location',
'Bridge Highway Name',
'Bridge Highway Direction',
'Road Ramp',
'Bridge Highway Segment',
'Garage Lot Name',
'Ferry Direction',
'Ferry Terminal Name']
In [12]: #Drop columns in df-mixed_dt that aren't in df anymore

df_mixed_dt = [x for x in df_mixed_dt if x in list(df.columns)]
In [13]: df_mixed_dt
Out[13]: ['Incident Zip']
In [14]: #Explore Incident Zip

df['Incident Zip'].unique()
Out[14]: array(['11368', '10014', '11358', '10018', nan, '10466', '11357', '10309',

'10030', '10029', '11417', '11369', '10465', '10473', '10461',
'11101', '10010', '11217', '11355', '10019', '10128', '10462',
'10456', '11385', '11356', '11225', '10028', '11104', '11203',
'10024', '11249', '11103', '11211', '11364', '11432', '10023',
'10003', '11375', '11354', '11361', '10009', '10314', '11412',
'10036', '10012', '10163-4668', '11429', '10451', '11426', '10011',
'11233', '10457', '10460', '10305', '10021', '11237', '10007',
'10016', '10306', '11210', '11213', '11372', '11226', '11218',
'11221', '11414', '11205', '10307', '11204', '11418', '11234',
'11230', '11223', '10004', '11377', '11370', '11423', '10017',
'11367', '10031', '10002', '11222', '10027', '11207', '10025',
'11235', '11421', '10469', '11216', '11219', '11434', '10312',
'11238', '11411', '11214', '10005', '11365', '11373', '10452',
'11201', '10468', '10038', '11228', '10282', '10303', '11105',
'10472', '10000', '10308', '11378', '11236', '10001', '10475',
'10020', '10034', '11004', '11102', '11374', '10464', '11428',
'11416', '10304', '10453', '11208', '11209', '10022', '11379',
'11435', '11360', '11212', '10026', '10065', '11001', '11106',
'07114', '11215', '07512', '10032', '11419', '11427', '11096',
'07631', '10035', '10463', '11220', '11436', '10467', '11422',
'11229', '10301', '10013', '10040', '10033', '11366', '11433',
'11362', '10075', '11363', '11040', '10458', '10470', '11206',
'11231', '11413', '10454', '11239', '10455', '10459', '10704',
'10302', '11232', '11514', '11420', '10037', '10310', '11415',
'10474', '11224', '10710', '10006', '11691', '10039', '11692',
'11693', '10044', '11694', '11430', '29616-0759', '10471', '07017',
'10069', '11746', '10280', '94524', '11109', '00000', '10048',
'23450', '85044', '10804', 'OOOOO', 'UNKNOWN', '07601', '11749',
'10533', '11566', '11359', '11716', '11516', '00083', '32789',
'11803', '11559', '55426-1066', '11706', '11710', '10281', '10701',
'11553', '23541', '55438', '11580', '11757', '37615', '11697',
'07054', '11042', '11802-9060', '10116', '11554', '08857', '10805',
'12602-0550', '07030', '07642', '77072', '92123', '11735', '10801',
'10119', '11520', '10163', '11003', '55614', '92046-3023', '95476',
'10129', '11005', '10601', '08817', '07087', '11452', '10153',
'NEW YORK A', '11581', '33319', '12550', '11590', '10402',
'61702-3068', '08691', '11579', '07024', '07981', '07305', '11550',
'44139', '000000', '11747', '07047', '10548', '11501', '98057',
'10591', '23502', '10538', '75081', '10803', '77081', '07302',
'11776', '11021', '75244', '75201-4234', '10589', '77057', '1000',
'10112', '91752', '07306', '11598', '89703', '11242', '07652',
'63301', '32824', '11563', '07010', '66211', '11572', '10122',
'10603', '11556', '10606', '10705', '11791', '11758', '11030',
'07666', '07310', '10103', '07351', '75007-1958', '10930',
'NJ 07114', '01701-0166', '10954', '11720', '06851', '10536',
'14221-7820', '10552', '61654', '07647', '60089', '10501', '10162',
'30071', '11552', '15251', '11241', '75024-2959', '11801', '11371',
'11530', '11771', '08876', '11725', '07936', '0000', '11722',
'07094', '07605', '29603', '11714', 'NJ', '327500', '75240',
'37917', '32896', '11723', '08003', '07390', '19053', '000',
'07073', '06902', '11777', '30356', '11545', '10507', '92508',
'11010', '11901', '60181', '99999', '30005', '07052', '33634',
'07111', '07207', '20705', '07960', '55439', '14221-5783', '30188',
'84020', 11370.0, 11361.0, 11417.0, 10021.0, 10312.0, 11375.0,
10465.0, 11206.0, 11697.0, 11357.0, 10458.0, 11224.0, 10305.0,
11236.0, 11214.0, 11691.0, 11354.0, 11102.0, 10022.0, 10023.0,
11209.0, 10463.0, 11229.0, 10075.0, 11231.0, 10024.0, 11223.0,
10028.0, 11412.0, 11385.0, 11360.0, 10451.0, 11692.0, 11040.0,
10308.0, 10038.0, 11220.0, 11369.0, 11235.0, 11416.0, 11372.0,
11238.0, 10128.0, 10017.0, 11230.0, 11694.0, 10012.0, 10309.0,
11414.0, 11232.0, 10472.0, 10033.0, 11434.0, 11356.0, 10468.0,
11358.0, 11228.0, 10306.0, 10016.0, 10036.0, 10304.0, 11001.0,
10454.0, 11377.0, 11216.0, 11106.0, 10461.0, 10003.0, 10006.0,
10314.0, 10030.0, 11365.0, 11005.0, 11103.0, 11379.0, 10462.0,
11368.0, 11435.0, 10014.0, 11215.0, 10029.0, 11233.0, 11222.0,
10471.0, 11225.0, 11373.0, 10301.0, 11212.0, 11201.0, 10307.0,
10452.0, 11237.0, 11104.0, 11432.0, 11436.0, 11362.0, 11420.0,
11208.0, 10470.0, 11366.0, 11429.0, 11219.0, 11413.0, 10466.0,
11367.0, 11419.0, 10455.0, 10019.0, 11422.0, 10467.0, 11239.0,
11378.0, 10025.0, 11226.0, 10011.0, 10009.0, 11210.0, 11423.0,
11426.0, 10034.0, 11105.0, 10475.0, 10002.0, 11355.0, 11221.0,
10032.0, 11204.0, 11374.0, 11364.0, 10001.0, 11249.0, 11411.0,
11234.0, 10473.0, 10065.0, 11211.0, 11217.0, 11218.0, 11203.0,
10460.0, 10013.0, 11205.0, 10310.0, 10010.0, 11101.0, 10007.0,
11004.0, 11433.0, 10005.0, 10027.0, 10474.0, 11421.0, 10469.0,
10026.0, 10035.0, 11207.0, 11213.0, 10453.0, 11418.0, 11415.0,
10040.0, 10457.0, 10031.0, 10303.0, 11427.0, 10069.0, 11428.0,
10464.0, 10004.0, 10018.0, 10456.0, 10302.0, 11363.0, 11693.0,
10459.0, 10037.0, 10039.0, 7094.0, 10103.0, 10044.0, 10280.0,
7030.0, 11563.0, 10553.0, 11430.0, 7117.0, 10523.0, 11552.0,
11003.0, 7114.0, '19044', '11701', '11754', '54305-1654', '07664',
'10549', '95476-9005', '10577', '33137-0098', '07102', '171111',
'92019', '11797-9012', 75081.0, 10020.0, 10595.0, 10119.0, 10111.0,
11109.0, 10591.0, 83.0, 10701.0, 10129.0, 11520.0, 10122.0, 7310.0,
11733.0, 92123.0, 7632.0, 11507.0, 8776.0, 8691.0, 11042.0, 10710.0,
98206.0, 14225.0, 11577.0, '10523', '23541-1223', 11716.0, 75240.0,
11550.0, 7070.0, 10000.0, 13202.0, 8540.0, 10282.0, 11021.0,
11763.0, 7801.0, 60148.0, 11501.0, 11749.0, 7663.0, 11241.0,
10153.0, 10123.0, 10281.0, 0.0, 33486.0, 85080.0, 11704.0, 55812.0,
11.0, 10801.0, 11530.0, 11371.0, 7104.0, 23113.0, 11741.0, 11695.0,
10162.0, 85044.0, 7073.0, 10048.0, 11566.0, 11202.0, 10705.0,
11553.0, 11561.0, 8724.0, '48195-0391', '08540', 11746.0, 11797.0,
77201.0, 11542.0, 11251.0, 7410.0, 7450.0, 11793.0, 11559.0,
11768.0, 11581.0, 11359.0, 11590.0, 11703.0, 75024.0, 95476.0,
92506.0, 7307.0, 10107.0, 8520.0, 7144.0, 11011.0, 10803.0, 14644.0,
7920.0, 10984.0, 11570.0, 10965.0, 11778.0, 17032.0, 11580.0,
11569.0, 11565.0, 7901.0, 7041.0, 11756.0, 90304.0, 18466.0,
11386.0, 33304.0, 7306.0, 1123.0, 11710.0, 30339.0, 8069.0, 7024.0,
11598.0, '11788', '07086', '45274-2596', 7042.0, 10604.0, 7304.0,
8854.0, 11743.0, 11803.0, 10708.0, 6851.0, 30005.0, 60018.0, 6511.0,
91754.0, 7302.0, 8527.0, 23502.0, 11599532.0, 11735.0, 8690.0,
7430.0, 11554.0, 8003.0, 7601.0, 10112.0, 11725.0, 10041.0,
114566.0, 95762.0, '14221', 20188.0, 73126.0, '60179', '11434-420',
'07661', '14450', '07632', '07712', '61602', '10583', '29616',
'11576', '20201', '11570', '11317', 'ZIP CODE', '11730',
'35210-6928', '11251', '63042', '32255', 92018.0, 11776.0, 6854.0,
1000000.0, 3110.0, 53707.0, 80155.0, 7652.0, 11791.0, 11801.0,
11030.0, 10601.0, 11804.0, 7047.0, 7730.0, 11020.0, 19044.0,
12203.0, '10041', '11561', '75403', '43216', '06801-1459',
'6462430478', '02140', '07410', '11738', '74147', '10925', '11695',
'08854', '07458', '32750', '11215-0389', '18519', '15219',
'11756-0215', '11715', '48195-0954', '33428', '70163', '11704',
'11779', '75026-0848', '02205', '7056', '11802', '02459', '08807',
'11510', '64108', '85711', '07620', '98036', '10541', '10573',
'11590-5114', '92013-0848', '10514', '07304', '07650', '91710',
'18773-9635', '30303', '0', '11729', '78265', '07090', '11542',
'10550', '11797', '55125', '33486', '30348-5658', '60076', '30360',
'10101', '10111', '12203', '12941-0129', '91754', '11001-2024',
'10123', '91356', '44122-5340', '10602-5055', '48451-0505', '35219',
'07144', '11024', '1143', '07042', '07079', '08873', '10522',
'11756', '11762', '08827-4010', '08080', '18966', '07002', '10994',
'75086', '07093', '11797-9001', '60018-4501', '11020', '48393-1022',
'33611', '11793', '11111', '43607', '32073', '12345', '11766',
'INDIAN WEL', '03104', '10528', '20123', '07203', '12024', '11549',
'19850', '27713', '07670', '07067', '54602', '11767', '37214',
'78265-9707', '0000000', '14206', '07417', '11540', '92923',
'85251-6547', '30144-7518', '11741', '11797-9004', '91504', '43054',
'10015', '117', '11507', '11753', 11747.0, 7981.0, 11980.0, 11576.0,
7372.0, 14615.0, 1208.0, 7940.0, '07675', '28201-1115', '14108',
'10532', '14009', '10566', '34230', '14203', '06901', 'NEWARK',
'11743', '14883', '11570-4705', '11535', '94566-9049', '6201-5773',
'10008', '11533', '60523', '66207', '43218-3083', '80111', '10703',
'10107', '29659', '320896', '08086', '12590', 'ANONYMOUS', '92036',
'10952', '35476', 'NY', '45241', '11772-3518', '43614', '10150',
'18042', '48090-2036', '11795', '14692-2878', 'UNSURE', '20188',
'90036', '30345', '782659705', '11735-9000', '08830', '11760',
'10435', '11518', '33122', '07045', '67504', '92506', '55438-0902',
10177.0, 10115.0, 11242.0, 10165.0, 76406.0, 17602.0, 12814.0,
10550.0, 7646.0, 33130.0, 11111.0, 7093.0, 33480.0, 2907.0, 22172.0,
11787.0, 14626.0, 68135.0, 7017.0, 10104.0, 10589.0, '10023-0007',
11341.0, 29006.0, 10703.0, 10605.0, 11431.0, 10116.0, 1850.0,
74106.0, 11514.0, 12345.0, 84770.0, 18015.0, 10150.0, 12779.0,
100000.0, 10603.0, 10108.0], dtype=object)
Its clear to see that Incident Zip has some issues other than the mixed data types, these issues are:
Mixed data types, sometimes floats, sometimes strings
Some on the zipcodes have 4 digits added to it after an hyphen
Some zipcodes are nan, others are ?, UNKNOWN, ANONYMOUS and so on
In [15]: # Function that cleans the Incident Zip values and returns nan for data that cannot be cleaned
def correct_zip(zip_code):
try:
zip_code = int(float(zip_code))
except:
try:
zip_code = int(float(zip_code.split('-')[0]))
except:
return np.nan
if zip_code < 10000 or zip_code > 19999:
return np.nan
else:
return str(zip_code)
I [16] # A l t i f ti t l I id t Zi d t
In [16]: # Apply correct_zip function to clean Incident Zip data
df['Incident Zip'] = df['Incident Zip'].apply(correct_zip)
In [17]: #Remove rows from data that have incident zip as null i.e nan
df = df[df['Incident Zip'].notnull()]
In [18]: #Count of non-null values for every column


Closed Date 1910835
Agency 1951306
City 1951245
Status 1951303
Borough 1951306
Latitude 1887423
Longitude 1887423
dtype: int64
In [19]: #Closed date, Latitude, and Longitude all have missing values, best to remove the rows where data in those
columns are missing
df = df[(df['Latitude'].notnull()) & (df['Longitude'].notnull()) & (df['Closed Date'].notnull())]


Closed Date 1847268
Agency 1847268
City 1847211
Status 1847265
Borough 1847268
Latitude 1847268
Longitude 1847268
dtype: int64
In [21]: #As part of the EDA process, all columns should be explored to identify columns with messy data or nan
df['Agency'].unique()
Out[21]: array(['DOT', 'DCA', 'NYPD', 'FDNY', 'DOHMH', 'TLC', 'DOE', 'DOB', 'DPR',
'EDC', 'DSNY', 'DEP', 'DOITT', 'HPD', '3-1-1', 'DFTA', 'DHS'], dtype=object)
In [22]: df['Complaint Type'].unique()
Out[22]: array(['Sidewalk Condition', 'Consumer Complaint', 'Blocked Driveway',

'EAP Inspection - F59', 'Derelict Vehicle', 'Street Condition',
'Indoor Air Quality', 'Broken Muni Meter', 'Illegal Parking',
'Animal Abuse', 'Taxi Complaint', 'Street Sign - Damaged',
'Noise - Street/Sidewalk', 'Noise - Commercial',
'School Maintenance', 'Homeless Encampment', 'Noise - Vehicle',
'Construction', 'Vending', 'Traffic', 'Food Establishment',
'Damaged Tree', 'Street Sign - Missing',
'For Hire Vehicle Complaint', 'Dead Tree', 'Standing Water',
'Illegal Tree Damage', 'Highway Condition', 'Noise - Park',
'Overgrown Tree/Branches', 'Drinking', 'Maintenance or Facility',
'Fire Safety Director - F58', 'Noise - Helicopter',
'Illegal Fireworks', 'Root/Sewer/Sidewalk Condition',
'Urinating in Public', 'Dirty Conditions',
'Noise - House of Worship', 'DPR Internal', 'Food Poisoning',
'Industrial Waste', 'Water System', 'Graffiti', 'Curb Condition',
'Building/Use', 'Fire Alarm - New System', 'Mold',
'Street Sign - Dangling', 'Rodent',
'Unsanitary Animal Pvt Property', 'Public Payphone Complaint',
'Animal in a Park', 'General Construction/Plumbing', 'Asbestos',
'Sewer', 'Street Light Condition', 'Plumbing', 'FLOORING/STAIRS',
'Harboring Bees/Wasps', 'Fire Alarm - Modification',
'Fire Alarm - Addition', 'Bike/Roller/Skate Chronic',
'Violation of Park Rules', 'City Vehicle Placard Complaint',
'Fire Alarm - Reinspection', 'Disorderly Youth',
'Bus Stop Shelter Placement', 'Open Flame Permit', 'Indoor Sewage',
'Bike Rack Condition', 'Public Toilet', 'Bridge Condition',
'Posting Advertisement', 'GENERAL', 'Senior Center Complaint',
'Panhandling', 'Snow', 'Special Projects Inspection Team (SPIT)',
'Municipal Parking Facility', 'Special Enforcement',
'Highway Sign - Damaged', 'Taxi Report', 'Broken Parking Meter',
'Beach/Pool/Sauna Complaint', 'Derelict Vehicles',
'Traffic Signal Condition', 'HEAT/HOT WATER',
'Missed Collection (All Materials)', 'Public Assembly',
'Overflowing Litter Baskets', 'Sweeping/Missed', 'Vacant Lot',
'ELEVATOR', 'Sanitation Condition', 'Other Enforcement',
'Investigations and Discipline (IAD)', 'Drinking Water',
'Found Property', 'Squeegee', 'Unsanitary Pigeon Condition',
'ELECTRIC', 'Hazardous Materials', 'UNSANITARY CONDITION',
'Bottled Water', 'Boilers', 'Electrical', 'Elevator',
'Emergency Response Team (ERT)', 'BEST/Site Safety',
'Request Xmas Tree Collection', 'Cranes and Derricks', 'Tanning',
'PLUMBING', 'APPLIANCE', 'HEATING', 'Litter Basket / Request',
'For Hire Vehicle Report', 'DOOR/WINDOW', 'WATER LEAK',
'Sweeping/Inadequate', 'Scaffold Safety',
'Fire Alarm - Replacement', 'Radioactive Material',
'Recycling Enforcement', 'Overflowing Recycling Baskets',
'Derelict Bicycle', 'Homeless Person Assistance', 'PAINT/PLASTER',
'Highway Sign - Dangling', 'OUTSIDE BUILDING', 'Water Quality',
'Public Assembly - Temporary', 'Miscellaneous Categories',
'Lifeguard', 'NONCONST', 'PAINT - PLASTER', 'GENERAL CONSTRUCTION',
'CONSTRUCTION', 'Legal Services Provider Complaint',
'Non-Residential Heat', 'Highway Sign - Missing',
'X-Ray Machine/Equipment', 'SAFETY', 'VACANT APARTMENT',
'Stalled Sites', 'Building Condition', 'AGENCY',
'Transportation Provider Complaint', 'Water Conservation', 'Noise',
'Air Quality', 'Plant', 'Lead', 'Collection Truck Noise',
'Special Natural Area District (SNAD)', 'Adopt-A-Basket',
'Literature Request', 'SG-99', 'Noise - Residential',
'Non-Emergency Police Matter', 'New Tree Request'], dtype=object)
In [23]: df['City'].unique()
Out[23]: array(['CORONA', 'NEW YORK', 'FLUSHING', 'BRONX', 'WHITESTONE',

'STATEN ISLAND', 'OZONE PARK', 'EAST ELMHURST', 'LONG ISLAND CITY',
'BROOKLYN', 'RIDGEWOOD', 'COLLEGE POINT', 'SUNNYSIDE', 'ASTORIA',
'OAKLAND GARDENS', 'FOREST HILLS', 'BAYSIDE', 'SAINT ALBANS',
'QUEENS VILLAGE', 'BELLEROSE', 'JACKSON HEIGHTS', 'HOWARD BEACH',
'RICHMOND HILL', 'WOODSIDE', 'HOLLIS', 'WOODHAVEN', 'JAMAICA',
'FRESH MEADOWS', 'ELMHURST', 'Ridgewood', 'MASPETH', 'GLEN OAKS',
'REGO PARK', 'Long Island City', 'MIDDLE VILLAGE', 'Bayside', nan,
'SOUTH RICHMOND HILL', 'ROSEDALE', 'Little Neck', 'LITTLE NECK',
'Jamaica', 'Richmond Hill', 'SPRINGFIELD GARDENS', 'Fresh Meadows',
'East Elmhurst', 'Woodhaven', 'Howard Beach', 'FLORAL PARK',
'KEW GARDENS', 'SOUTH OZONE PARK', 'CAMBRIA HEIGHTS',
'Far Rockaway', 'Flushing', 'South Ozone Park', 'Elmhurst',
'Ozone Park', 'Corona', 'South Richmond Hill', 'Jackson Heights',
'FAR ROCKAWAY', 'Queens Village', 'Springfield Gardens', 'Astoria',
'Cambria Heights', 'Glen Oaks', 'ROCKAWAY PARK', 'Rego Park',
'ARVERNE', 'Middle Village', 'NEW HYDE PARK', 'Woodside',
'Kew Gardens', 'Rockaway Park', 'Hollis', 'Maspeth', 'Rosedale',
'Saint Albans', 'Arverne', 'BREEZY POINT', 'Forest Hills',
'Oakland Gardens', 'Sunnyside', 'Bellerose', 'QUEENS', 'Whitestone',
'Floral Park', 'New Hyde Park', 'College Point', 'NEW HEMPSTEAD',
'UNKNOWN', 'BEDFORD HILLS', 'Breezy Point', 'BELLMORE', 'MANHATTAN'], dtype=object)
In [24]: df['Status'].unique()
Out[24]: array(['Closed', 'Pending', 'Assigned', nan, 'Open', 'Started'], dtype=object)
In [25]: df['Borough'].unique()
Out[25]: array(['QUEENS', 'MANHATTAN', 'BRONX', 'STATEN ISLAND', 'BROOKLYN',

'Unspecified'], dtype=object)
Columns with messy or missing data
City - contains some cities in uppercase and others in lowercase
status - contains nan values
Borough - contains 'Unspecified' boroughs
In [26]: #Lets look at the unspecified boroughs, we want to be sure that removing data from df won't cause problems
later on
df[df['Borough']=='Unspecified'][['Agency', 'City']]
Out[26]: Agency City
Unique Key
29167930 NYPD STATEN ISLAND
28650971 DPR STATEN ISLAND
28768853 TLC NEW HEMPSTEAD
27339785 TLC BEDFORD HILLS
29183835 DCA BELLMORE
27739170 DPR STATEN ISLAND
28135459 TLC BAYSIDE
In [27]: # Majority of the data belongs to NYPD Agency and occurs in Staten Island
# To ensure I don't lose too much data from NYPD, I need to ensure this accounts for a neglegible number o
f NYPD
nypd_total = df[df['Agency']=='NYPD']['Borough'].count()
nypd_unspecified = df[(df['Borough']=='Unspecified') & (df['Agency']=="NYPD")]['Borough'].count()
nypd_unspec_perct = nypd_unspecified/nypd_total*100
print("%1.3f"%nypd_unspec_perct)
0.005
In [28]: #Boroughs that are unspecified are negligible that it can be removed
df = df[df['Borough'] != 'Unspecified']
In [29]: #Number of Status columns with nan

status_nan = len(df[df['Status'].isnull()].index)
print(status_nan)
In [30]: #The number of rows with columns Status as nan is 3, which is also negligible, I can remove it from the da
taframe also.
df = df[df['Status'].notnull()]
In [31]: # Since some city values are represented both in uppercase and lowercase, it's better to have the city in
the same case
# Convert all City Values to Camel Case
def camel_case(city):
try:
city = city.split(' ')
city = ' '.join([x.lower().capitalize() for x in city])
if city == 'Unknown':
return np.nan
else:
return city
except:
return np.nan
In [32]: # Apply camel_case function to City column

df['City'] = df['City'].apply(camel_case)
In [33]: # Lets view the City values with nan

df[df['City'].isnull()].groupby('Agency')['Status'].count()
Out[33]: Agency
DOT 57
TLC 1
Name: Status, dtype: int64
In [34]: # 57 of Cities with nan value are of DOT Agency.

# It's better to know if this is significant before removing it
city_null_dot = len(df[(df['City'].isnull()) & (df['Agency']=='DOT')].index)
dot_total = len(df[df['Agency']=='DOT'].index)
city_null_dot_perct = (city_null_dot/dot_total)*100
print("%1.3f"%city_null_dot_perct)
0.024
In [35]: # 0.024% is negligible and so Cities with nan can be removed from df
df = df[df['City'].notnull()]
In [36]: # Created Date and Closed Date aren't in DateTime object. It's convenient when working with DateTime objec
t
# Convert Created Date and Closed Date values to DateTime object.
import datetime
df['Created Date'] = df['Created Date'].apply(lambda x:datetime.datetime.strptime(x,'%m/%d/%Y %I:%M:%S %p'
))
df['Closed Date'] = df['Closed Date'].apply(lambda x:datetime.datetime.strptime(x,'%m/%d/%Y %I:%M:%S %p'))
In [37]: # It would be useful to create a column to compute how long it takes to close a complaint
df['Processing Time'] = df['Closed Date'] - df['Created Date']
In [38]: # Viewing the descriptive statistics on the Processing Time can give some insights on turn around time
df['Processing Time'].describe()
Out[38]: count 1847178

mean 14 days 18:28:19.930685
std 47 days 13:22:45.772770
min -365 days +00:00:00
25% 0 days 03:10:15
50% 2 d 00 00 00
50% 2 days 00:00:00
75% 8 days 19:38:00.750000
max 918 days 14:08:12
Name: Processing Time, dtype: object
From the descriptive statistis, we can see minimum processing time is negative, this means something is wrong with the date data and it
should be explored
In [39]: # View Prcoessing Time data that is negative

df[df['Processing Time']<datetime.timedelta(0,0,0)].head(3)
Out[39]:
Created Closed Complaint Incident Process
Agency City Status Borough Latitude Longitude
Date Date Type Zip Time
Unique
Key
Unsanitary
2014- 2014-
28581213 DOHMH Animal Pvt 10456 Bronx Pending BRONX 40.835153 -73.912449 -8 days
07-31 07-23
Property
2014- 2014-
28541630 DOHMH Rodent 11206 Brooklyn Pending BROOKLYN 40.701265 -73.929265 -18 days
07-25 07-07
2014- 2014- New

28934215 DOHMH Rodent 10031 Pending MANHATTAN 40.827318 -73.946620 -28 days
09-22 08-25 York
There are issues with some data in df, the Closed Date in some rows preceede its Created Date, thus,
resulting in the negative processing time.
In [40]: # Remove all data from df that have negative Processing Time
df = df[df['Processing Time']>=datetime.timedelta(0,0,0)]


Closed Date 1830970
Agency 1830970
City 1830970
Status 1830970
Borough 1830970
Latitude 1830970
Longitude 1830970
Processing Time 1830970
dtype: int64
Now that the data looks clean enough for further exploration, I'll create a function to incorporate all the
data cleaning process
This makes future work on the dataset convenient.
In [42]: def open_311_data(datafile):

import numpy as np
import pandas as pd
import datetime
#Function to clean Incident Zip

def correct_zip(zip_code):
try:
zip_code = int(float(zip_code))
except:
try:
zip_code = int(float(zip_code.split('-')[0]))
except:
return np.nan
if zip_code < 10000 or zip_code > 19999:
return np.nan
else:
return str(zip_code)
#Function to clean City values, i.e convert City values to Camel Case
def camel_case(city):
try:
city = city.split(' ')
city = ' '.join([x.lower().capitalize() for x in city])
if city == 'Unknown':
return np.nan
else:
return city
except:
return np.nan
#Read the file

df = pd.read_csv(datafile, index_col='Unique Key')
#Drop columns that aren't relevant to this analysis

df_cols_rmv = ['Incident Address', 'Street Name', 'Cross Street 1', 'Cross Street 2',
'Intersection Street 1', 'Intersection Street 2', 'Landmark', 'Facility Type',
'Due Date', 'Resolution Description','Community Board', 'X Coordinate (State Plane)',
'Y Coordinate (State Plane)', 'Park Facility Name', 'Park Borough', 'School Name',
'School Number', 'School Region', 'School Code', 'School Phone Number', 'School Addres
s',
'School City', 'School State', 'School Zip', 'School Not Found', 'School or Citywide Co
mplaint',
'Vehicle Type', 'Taxi Company Borough', 'Taxi Pick Up Location', 'Bridge Highway Name',
'Bridge Highway Direction', 'Road Ramp', 'Bridge Highway Segment', 'Garage Lot Name',
'Ferry Direction', 'Ferry Terminal Name', 'Location', 'Address Type', 'Agency Name',
'Resolution Action Updated Date', 'Descriptor', 'Location Type']
df.drop(df_cols_rmv, inplace=True, axis=1)
#Clean Incident Zip

df['Incident Zip'] = df['Incident Zip'].apply(correct_zip)
#Clean City values

df['City'] = df['City'].apply(camel_case)
#Drop unspecified boroughs

df = df[df['Borough'] != 'Unspecified']
#Drop all rows with nan

df = df.dropna(how='any')
#Convert Created Date and Closed Date to datetime objects, create a Processing Time column
df['Created Date'] = df['Created Date'].apply(lambda x:datetime.datetime.strptime(x,'%m/%d/%Y %I:%M:%S
%p'))
df['Closed Date'] = df['Closed Date'].apply(lambda x:datetime.datetime.strptime(x,'%m/%d/%Y %I:%M:%S %
p'))
df['Processing Time'] = df['Closed Date'] - df['Created Date']
#Remove negative processing time rows from the dataframe

df = df[df['Processing Time']>=datetime.timedelta(0,0,0)]
return df
In [43]: # Open, read, and process the NYC 311 dataset using the open_311_data function
datafile = '311_Service_Requests_from_2014.csv'
df = open_311_data(datafile)
df.head(3)

if self.run_code(code, result):
C:\Users\Ikp\Anaconda3\lib\site-packages\numpy\lib\arraysetops.py:463: FutureWarning: elementwise comparis
on failed; returning scalar instead, but in the future will perform elementwise comparison
mask |= (ar1 == a)
Out[43]:
Created Closed Complaint Incident Processi
Agency City Status Borough Latitude Longitude
Date Date Type Zip Time
Unique
Key
2014- 2014-
Sidewalk 24 days
28457271 07-11 08-05 DOT 11368 Corona Closed QUEENS 40.751870 -73.862718
Condition 21:32:39
15:08:58 12:41:37
2014- 2014-
Consumer New 3 days
28644314 08-08 08-12 DCA 10014 Closed MANHATTAN 40.732623 -74.001119
Complaint York 21:27:12
14:06:22 11:33:34
2014-11- 2014-11-
Blocked 0 days
29306886 18 18 NYPD 11358 Flushing Closed QUEENS 40.760384 -73.806826
Driveway 00:42:42
00:52:40 01:35:22
Visualizations
In [44]: import matplotlib.pyplot as plt
%matplotlib inline
In [45]: # Visualizing 311 call data Incidents with a heat map

import gmaps
In [46]: import settings # Contains my Google map API key

gmaps.configure(api_key=settings.API_KEY) # Fill in with your API key
new_york_coordinates = (40.75, -74.00)
locations = df[['Latitude','Longitude']]
fig = gmaps.figure(center=new_york_coordinates, zoom_level=12)
heatmap_layer = gmaps.heatmap_layer(locations)
fig.add_layer(heatmap_layer)
fig
Failed to display Jupyter Widget of type Figure.
If you're reading this message in the Jupyter Notebook or JupyterLab Notebook, it may mean that the widgets JavaScript is still
loading. If this message persists, it likely means that the widgets JavaScript library is either not installed or not enabled. See the
Jupyter Widgets Documentation (https://ipywidgets.readthedocs.io/en/stable/user_install.html) for setup instructions.
If you're reading this message in another frontend (for example, a static rendering on GitHub or NBViewer
(https://nbviewer.jupyter.org/)), it may mean that your frontend doesn't currently support widgets.
In [47]: #Exploration of incidents by Borough

borough = df.groupby('Borough')
borough.size().plot(kind='bar', figsize=(12,6), title=('Incidents by Borough'));
From the graph, we can see that Brooklyn has the most incidents, while, Staten Island has the least. It should also be noted that Staten
Island is the smallest of the five boroughs so that could be why it has the least incidents.
In [48]: # Visualization of incidents by Agency

agency = df.groupby('Agency')
agency.size().plot(kind='bar', figsize=(12,6), title=('Incidents calls per Agency'));
HPD has the highest complaints followed by NYPD in Brooklyn
In [49]: #Visualization of numnber of incidents in each Borough by Agency

agency_borough = df.groupby(['Agency','Borough']).size().unstack()
agency_borough.plot(kind='bar', title='Total Inicidents in each Borough by Agency', figsize=(15,7));
In [50]: #Visualization of top Agencies with most incidents per borough

col_number = 2
row_number = 3
fig, axes = plt.subplots(row_number,col_number, figsize=(12,12))
for i, (label,col) in enumerate(agency_borough.iteritems()):

ax = axes[int(i/col_number), i%col_number]
col = col.sort_values(ascending=True)[:5]
col.plot(kind='barh', ax=ax)
ax.set_title(label)
plt.tight_layout()
In [51]: # Visualization of most Complaints per Borough

borough_comp = df.groupby(['Complaint Type','Borough']).size().unstack()
col_number = 2
row_number = 3
fig, axes = plt.subplots(row_number,col_number, figsize=(12,12))
for i, (label,col) in enumerate(borough_comp.iteritems()):

ax = axes[int(i/col_number), i%col_number]
col = col.sort_values(ascending=True)[:15]
col.plot(kind='barh', ax=ax)
ax.set_title(label)
plt.tight_layout()
Visualization on processing time.
The Processing Time in the dataframe is a datetime object, it is easier to convert the processing time into floats for calcuation
In [52]: import numpy as np

df['Processing Time Float'] = df['Processing Time'].apply(lambda x:x/np.timedelta64(1, 'D'))
In [53]: # Histogram of Processing Time

df['Processing Time Float'].hist(bins=30, figsize=(15,7));
Since datetime objects occurs in the dataframe, I can build a bar graph to show Incidents per month and
other interesting informtation. This would allow easy discovery of noticible trends and
seasonality.
I can do this by adding a column to the data to keep track of year and months only
In [54]: import datetime

df['YYYY-MM'] = df['Created Date'].apply(lambda x: datetime.datetime.strftime(x, '%Y-%m'))
In [55]: #Incidents on a monthly basis

monthly_incidents = df.groupby('YYYY-MM').size().plot(figsize=(12,5), title='Incidents on a monthly basi
s');
In [56]: # Boroughs with Processing Time on a monthly basis

df.groupby(['YYYY-MM','Borough'])['Processing Time Float'].mean().unstack().plot(figsize=(15,7),
title='Processing time per Borough
on a monthly basis');
In [57]: # Processing time per Borough

df.groupby('Borough')['Processing Time Float'].mean().plot(kind='bar', figsize=(15,7),
title='Processing Time per Borough');
In [58]: # Visulization of Number of Complaints per Agency on a monthly basis

date_agency = df.groupby(['YYYY-MM', 'Agency'])
date_agency.size().unstack().plot(kind='bar', figsize=(15,7), title='Number of Complaints per Agency on a
monthly basis');
plt.legend(loc='center left', bbox_to_anchor=(1.0, 0.5));
In [59]: # Visualization of Agency with their number of Complaints

df.groupby('Agency').size().sort_values(ascending=False).plot(kind='bar',figsize=(15,7),
title='Number of Complaints per Agency');
Since HPD has the most complaints, I'll explore data relating to HPD to learn more about complaints handled by HPD
In [60]: # Visualization of Incidents handled by HPD by Borough on a monthly basis

df[df['Agency']=='HPD'].groupby(['YYYY-MM','Borough']).size().unstack().plot(figsize=(12,7),
title='Incidents per Borough o
n a monthly basis');
In [61]: # Visualizations of Complaints handled by HPD

df[df['Agency']=='HPD'].groupby('Complaint Type').size().sort_values(ascending=False).plot(kind='bar',
figsize=(12,6),
title='Number of each complaint ty
pe handled by HPD');
Visualizations of Complaint Type

In [62]: # Visualization of number of complaint type
df.groupby('Complaint Type').size().sort_values(ascending=False)[:20].plot(kind='bar', figsize=(15,6),
title='Bar graph of Complaint Ty
pe');
Noise - Residential has the most complaints, lets explore it further
In [63]: # Borough with the most Noise Complaints - Residential

df[df['Complaint Type']=='Noise - Residential'].groupby('Borough').size()[:10].sort_values(ascending=False
).plot(kind='bar',
title='Residential Noise Compl
aints per Borough');
Brooklyn has the most noise complaints for residential, it would be interesting to know if this noise peaked anytime within the year of if
its uniform through the year
In [64]: brooklyn_noise = df[(df['Borough']=='BROOKLYN') & (df['Complaint Type']=='Noise - Residential')]

brooklyn_noise.groupby('YYYY-MM').size().plot(kind='bar', figsize=(12,6),
title='Residential noise complaint in Brooklyn on a monthly b
asis');
In [65]: # Complaints per Borough through the year

df.groupby(['YYYY-MM','Borough']).size().unstack().plot(figsize=(15,6))
plt.legend(loc='center left', bbox_to_anchor=(1.0, 0.5));
Observations
Brooklyn has the highest the number of incident calls followed by Queens. Staten Island has the least incident calls.
HPD has the most incident calls follwed by NYPD.
Majority of incidents occur in January followed by November and then the incident calls dips to its lowest in September followed by April.
HPD related incident calls follow a nearly regular pattern across all boroughs from month to month. Heat/Hot water complaints are the most
frequent
Noise in residential areas were the most complaints in 2014 followed by Heat/Hot water complaints
Noise complaints were peaked in September and was lowest in February
Conclusion
Brooklyn has on average the slowest processing from month to month and this caould be associated with the fact that it has the highest number of
i id t ll b t it h ld b t d th t hil St t I l d h b f th l t i id t ll it d 't h th f t t i ti F th

Nyc

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Nyc

Enviado por

Direitos autorais:

Formatos disponíveis

10/2/2018 nyc-311-data-analysis/311_data_analysis.

ipynb at master · ikp-ogbeide/nyc-311-data-analysis · GitHub

Branch: master nyc-311-data-analysis / 311_data_analysis.ipynb Find file Copy path

ikp-ogbeide Update file e830bc7 on Jan 2

2540 lines (2539 sloc) 716 KB

Author: Ikponmwosa Felix Ogbeide

Questions to answer through this project:

Which boroughs have the most call incidents

Which agency gets the most incidents

What borough has the fastest incident resolution time

How does the incidents vary month to month

In [1]: # Import python libraries data manipulation

C:\Users\Ikp\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py:2698: DtypeWarning: Columns (8,1

08/08/2014 08/12/2014 Department of 113

11/18/2014 11/18/2014 New York City

C:\Users\Ikp\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py:2698: DtypeWarning: Columns (8,1

In [6]: # View object types of each columns

In [7]: # Count of non-null values for every column

Out[7]: Created Date 2099056

In [8]: # A list of columns to remove from the dataframe

In [11]: #Time to investigate df columns with mixed data types

Out[11]: ['Incident Zip',

In [12]: #Drop columns in df-mixed_dt that aren't in df anymore

Out[13]: ['Incident Zip']

In [14]: #Explore Incident Zip

Out[14]: array(['11368', '10014', '11358', '10018', nan, '10466', '11357', '10309',

Mixed data types, sometimes floats, sometimes strings

Some on the zipcodes have 4 digits added to it after an hyphen

Some zipcodes are nan, others are ?, UNKNOWN, ANONYMOUS and so on

In [18]: #Count of non-null values for every column

Out[18]: Created Date 1951306

In [20]: #Count of non-null values for every column

Out[20]: Created Date 1847268

In [22]: df['Complaint Type'].unique()

Out[22]: array(['Sidewalk Condition', 'Consumer Complaint', 'Blocked Driveway',

Out[23]: array(['CORONA', 'NEW YORK', 'FLUSHING', 'BRONX', 'WHITESTONE',

Out[24]: array(['Closed', 'Pending', 'Assigned', nan, 'Open', 'Started'], dtype=object)

Out[25]: array(['QUEENS', 'MANHATTAN', 'BRONX', 'STATEN ISLAND', 'BROOKLYN',

Columns with messy or missing data

City - contains some cities in uppercase and others in lowercase

status - contains nan values

Borough - contains 'Unspecified' boroughs

Out[26]: Agency City

29167930 NYPD STATEN ISLAND

28650971 DPR STATEN ISLAND

28993659 NYPD STATEN ISLAND

28337889 NYPD STATEN ISLAND

28768853 TLC NEW HEMPSTEAD

27339785 TLC BEDFORD HILLS

28911820 NYPD STATEN ISLAND

28767733 NYPD STATEN ISLAND

28867255 NYPD STATEN ISLAND

29044115 NYPD STATEN ISLAND

29183835 DCA BELLMORE

28994609 NYPD STATEN ISLAND

29503930 NYPD STATEN ISLAND

29040790 NYPD STATEN ISLAND

29211842 NYPD STATEN ISLAND

28151378 NYPD STATEN ISLAND

28356529 NYPD STATEN ISLAND

28983245 NYPD STATEN ISLAND

28763594 NYPD STATEN ISLAND

29217588 NYPD STATEN ISLAND