Você está na página 1de 12

10/2/2018 nyc-311-data-analysis/311_data_analysis.

ipynb at master · ikp-ogbeide/nyc-311-data-analysis · GitHub

ikp-ogbeide / nyc-311-data-analysis

Dismiss
Join GitHub today
GitHub is home to over 28 million developers working together to host
and review code, manage projects, and build software together.

Sign up

Branch: master nyc-311-data-analysis / 311_data_analysis.ipynb Find file Copy path

ikp-ogbeide Update file e830bc7 on Jan 2

1 contributor

2540 lines (2539 sloc) 716 KB

https://github.com/ikp-ogbeide/nyc-311-data-analysis/blob/master/311_data_analysis.ipynb 1/12
10/2/2018 nyc-311-data-analysis/311_data_analysis.ipynb at master · ikp-ogbeide/nyc-311-data-analysis · GitHub

Exploratory Data Analysis of the 2014 New York City 311 Call Data
The aim of this EDA is to understand the NYC 311 call data and observe any noticible trends that we can make predictions on.

Author: Ikponmwosa Felix Ogbeide

Questions to answer through this project:

Which boroughs have the most call incidents

Which agency gets the most incidents

What borough has the fastest incident resolution time

How does the incidents vary month to month

PS
I'm also on the lookout for interesting trends and observations beyond the questions above!

Before I begin exploring the data, it is essential to understand what type of data I am working with. This would enable me decide on what kind data
cleaning or wrangling is necessary for this project

In [1]: # Import python libraries data manipulation


import numpy as np
import pandas as pd

In [2]: # Read 311 data from xls file into pandas dataframe
datafile = '311_Service_Requests_from_2014.csv'
df = pd.read_csv(datafile)
df.head(3)

C:\Users\Ikp\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py:2698: DtypeWarning: Columns (8,1


7,40,41,42,43,44,45,46,47,48,49) have mixed types. Specify dtype option on import or set low_memory=False.
interactivity=interactivity, compiler=compiler, result=result)
Out[2]:
Unique Created Closed Agency Complaint Incident Incident
Agency Descriptor Location Type
Key Date Date Name Type Zip Address

07/11/2014 08/05/2014
Department of Sidewalk 37-73 104
0 28457271 03:08:58 12:41:37 DOT Defacement Sidewalk 11368
Transportation Condition STREET
PM PM

08/08/2014 08/12/2014 Department of 113


Consumer False
1 28644314 02:06:22 11:33:34 DCA Consumer NaN 10014 WASHINGTO
Complaint Advertising
PM AM Affairs PLACE

11/18/2014 11/18/2014 New York City


Blocked 42-10 159
2 29306886 12:52:40 01:35:22 NYPD Police No Access Street/Sidewalk 11358
Driveway STREET
AM AM Department

3 rows × 53 columns

In [3]: # Setting pandas maximum colummns display so as to view all columns at once.
pd.set_option('display.max_columns', None)

In [4]: # Since the datafile has a Unique Key column its better to have that as the index
# Re-read datafile into df pandas dataframe setting unique key as index
df = pd.read_csv(datafile, index_col='Unique Key')

C:\Users\Ikp\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py:2698: DtypeWarning: Columns (8,1


7,40,41,42,43,44,45,46,47,48,49) have mixed types. Specify dtype option on import or set low_memory=False.
interactivity=interactivity, compiler=compiler, result=result)
C:\Users\Ikp\Anaconda3\lib\site-packages\numpy\lib\arraysetops.py:463: FutureWarning: elementwise comparis
on failed; returning scalar instead, but in the future will perform elementwise comparison
mask |= (ar1 == a)

From warning, we know we have mixed types in some columns. Its best to investigate what the mixtypes are.

In [5]: # I'll create a list to store the data with mixed data types.
# Later on in the project, I'll visit this list of mixed data types
df_mixed_dt = [df.columns[7], df.columns[16], df.columns[39],df.columns[40],df.columns[41],df.columns[42],
df.columns[43], df.columns[44], df.columns[45],df.columns[46], df.columns[47], df.columns[4
8]]

In [6]: # View object types of each columns


df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2099056 entries, 28457271 to 29607189
Data columns (total 52 columns):
Created Date object
Closed Date object
Agency object
Agency Name object
Complaint Type object
Descriptor object
Location Type object
Incident Zip object
Incident Address object
Street Name object
Cross Street 1 object
Cross Street 2 object
Intersection Street 1 object
Intersection Street 2 object
Address Type object
City object
Landmark object
Facility Type object
Status object
Due Date object
Resolution Description object
Resolution Action Updated Date object
Community Board object
Borough object
X Coordinate (State Plane) float64
Y Coordinate (State Plane) float64
Park Facility Name object
Park Borough object
School Name object
School Number object
School Region object
School Code object
School Phone Number object
School Address object
School City object
School State object
School Zip object
School Not Found object
School or Citywide Complaint float64
Vehicle Type object
Taxi Company Borough object
Taxi Pick Up Location object
Bridge Highway Name object
Bridge Highway Direction object
Road Ramp object
Bridge Highway Segment object
Garage Lot Name object
Ferry Direction object
Ferry Terminal Name object
Latitude float64
Longitude float64
Location object

https://github.com/ikp-ogbeide/nyc-311-data-analysis/blob/master/311_data_analysis.ipynb 2/12
10/2/2018 nyc-311-data-analysis/311_data_analysis.ipynb at master · ikp-ogbeide/nyc-311-data-analysis · GitHub
dtypes: float64(5), object(47)
memory usage: 848.8+ MB

In [7]: # Count of non-null values for every column


df_columns = list(df.columns)
df[df_columns].count()

Out[7]: Created Date 2099056


Closed Date 2053916
Agency 2099056
Agency Name 2099056
Complaint Type 2099056
Descriptor 2086658
Location Type 1546759
Incident Zip 1952035
Incident Address 1641505
Street Name 1641401
Cross Street 1 1675745
Cross Street 2 1668168
Intersection Street 1 323622
Intersection Street 2 323523
Address Type 1994598
City 1953071
Landmark 810
Facility Type 536769
Status 2099053
Due Date 870518
Resolution Description 2085477
Resolution Action Updated Date 2060578
Community Board 2099056
Borough 2099056
X Coordinate (State Plane) 1888333
Y Coordinate (State Plane) 1888333
Park Facility Name 2099056
Park Borough 2099056
School Name 2099056
School Number 2098598
School Region 2088711
School Code 2088711
School Phone Number 2099056
School Address 2099055
School City 2099056
School State 2099056
School Zip 2099054
School Not Found 857159
School or Citywide Complaint 0
Vehicle Type 481
Taxi Company Borough 1689
Taxi Pick Up Location 14906
Bridge Highway Name 9223
Bridge Highway Direction 9216
Road Ramp 9177
Bridge Highway Segment 10678
Garage Lot Name 624
Ferry Direction 479
Ferry Terminal Name 1596
Latitude 1888333
Longitude 1888333
Location 1888333
dtype: int64

Columns with a lot of missing data won't be useful to my analysis. Also, some columns in the data aren't just useful to this analysis, its
best to remove these columns

In [8]: # A list of columns to remove from the dataframe


df_cols_rmv = ['Incident Address', 'Street Name', 'Cross Street 1', 'Cross Street 2',
'Intersection Street 1', 'Intersection Street 2', 'Landmark', 'Facility Type',
'Due Date', 'Resolution Description','Community Board', 'X Coordinate (State Plane)',
'Y Coordinate (State Plane)', 'Park Facility Name', 'Park Borough', 'School Name',
'School Number', 'School Region', 'School Code', 'School Phone Number', 'School Address',
'School City', 'School State', 'School Zip', 'School Not Found', 'School or Citywide Compla
int',
'Vehicle Type', 'Taxi Company Borough', 'Taxi Pick Up Location', 'Bridge Highway Name',
'Bridge Highway Direction', 'Road Ramp', 'Bridge Highway Segment', 'Garage Lot Name',
'Ferry Direction', 'Ferry Terminal Name', 'Location', 'Address Type', 'Agency Name',
'Resolution Action Updated Date', 'Descriptor', 'Location Type']

In [9]: # Remove the columns added to the df_cols_rmv list from df dataframe
df.drop(df_cols_rmv, inplace=True, axis=1)

In [10]: df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2099056 entries, 28457271 to 29607189
Data columns (total 10 columns):
Created Date object
Closed Date object
Agency object
Complaint Type object
Incident Zip object
City object
Status object
Borough object
Latitude float64
Longitude float64
dtypes: float64(2), object(8)
memory usage: 176.2+ MB

In [11]: #Time to investigate df columns with mixed data types


df_mixed_dt

Out[11]: ['Incident Zip',


'Landmark',
'Vehicle Type',
'Taxi Company Borough',
'Taxi Pick Up Location',
'Bridge Highway Name',
'Bridge Highway Direction',
'Road Ramp',
'Bridge Highway Segment',
'Garage Lot Name',
'Ferry Direction',
'Ferry Terminal Name']

In [12]: #Drop columns in df-mixed_dt that aren't in df anymore


df_mixed_dt = [x for x in df_mixed_dt if x in list(df.columns)]

In [13]: df_mixed_dt

Out[13]: ['Incident Zip']

In [14]: #Explore Incident Zip


df['Incident Zip'].unique()

Out[14]: array(['11368', '10014', '11358', '10018', nan, '10466', '11357', '10309',


'10030', '10029', '11417', '11369', '10465', '10473', '10461',
'11101', '10010', '11217', '11355', '10019', '10128', '10462',
'10456', '11385', '11356', '11225', '10028', '11104', '11203',
'10024', '11249', '11103', '11211', '11364', '11432', '10023',
'10003', '11375', '11354', '11361', '10009', '10314', '11412',
'10036', '10012', '10163-4668', '11429', '10451', '11426', '10011',
'11233', '10457', '10460', '10305', '10021', '11237', '10007',
'10016', '10306', '11210', '11213', '11372', '11226', '11218',
'11221', '11414', '11205', '10307', '11204', '11418', '11234',
'11230', '11223', '10004', '11377', '11370', '11423', '10017',
'11367', '10031', '10002', '11222', '10027', '11207', '10025',
'11235', '11421', '10469', '11216', '11219', '11434', '10312',
'11238', '11411', '11214', '10005', '11365', '11373', '10452',
'11201', '10468', '10038', '11228', '10282', '10303', '11105',
'10472', '10000', '10308', '11378', '11236', '10001', '10475',
'10020', '10034', '11004', '11102', '11374', '10464', '11428',
'11416', '10304', '10453', '11208', '11209', '10022', '11379',
'11435', '11360', '11212', '10026', '10065', '11001', '11106',
'07114', '11215', '07512', '10032', '11419', '11427', '11096',
'07631', '10035', '10463', '11220', '11436', '10467', '11422',
'11229', '10301', '10013', '10040', '10033', '11366', '11433',
'11362', '10075', '11363', '11040', '10458', '10470', '11206',

https://github.com/ikp-ogbeide/nyc-311-data-analysis/blob/master/311_data_analysis.ipynb 3/12
10/2/2018 nyc-311-data-analysis/311_data_analysis.ipynb at master · ikp-ogbeide/nyc-311-data-analysis · GitHub
'11231', '11413', '10454', '11239', '10455', '10459', '10704',
'10302', '11232', '11514', '11420', '10037', '10310', '11415',
'10474', '11224', '10710', '10006', '11691', '10039', '11692',
'11693', '10044', '11694', '11430', '29616-0759', '10471', '07017',
'10069', '11746', '10280', '94524', '11109', '00000', '10048',
'23450', '85044', '10804', 'OOOOO', 'UNKNOWN', '07601', '11749',
'10533', '11566', '11359', '11716', '11516', '00083', '32789',
'11803', '11559', '55426-1066', '11706', '11710', '10281', '10701',
'11553', '23541', '55438', '11580', '11757', '37615', '11697',
'07054', '11042', '11802-9060', '10116', '11554', '08857', '10805',
'12602-0550', '07030', '07642', '77072', '92123', '11735', '10801',
'10119', '11520', '10163', '11003', '55614', '92046-3023', '95476',
'10129', '11005', '10601', '08817', '07087', '11452', '10153',
'NEW YORK A', '11581', '33319', '12550', '11590', '10402',
'61702-3068', '08691', '11579', '07024', '07981', '07305', '11550',
'44139', '000000', '11747', '07047', '10548', '11501', '98057',
'10591', '23502', '10538', '75081', '10803', '77081', '07302',
'11776', '11021', '75244', '75201-4234', '10589', '77057', '1000',
'10112', '91752', '07306', '11598', '89703', '11242', '07652',
'63301', '32824', '11563', '07010', '66211', '11572', '10122',
'10603', '11556', '10606', '10705', '11791', '11758', '11030',
'07666', '07310', '10103', '07351', '75007-1958', '10930',
'NJ 07114', '01701-0166', '10954', '11720', '06851', '10536',
'14221-7820', '10552', '61654', '07647', '60089', '10501', '10162',
'30071', '11552', '15251', '11241', '75024-2959', '11801', '11371',
'11530', '11771', '08876', '11725', '07936', '0000', '11722',
'07094', '07605', '29603', '11714', 'NJ', '327500', '75240',
'37917', '32896', '11723', '08003', '07390', '19053', '000',
'07073', '06902', '11777', '30356', '11545', '10507', '92508',
'11010', '11901', '60181', '99999', '30005', '07052', '33634',
'07111', '07207', '20705', '07960', '55439', '14221-5783', '30188',
'84020', 11370.0, 11361.0, 11417.0, 10021.0, 10312.0, 11375.0,
10465.0, 11206.0, 11697.0, 11357.0, 10458.0, 11224.0, 10305.0,
11236.0, 11214.0, 11691.0, 11354.0, 11102.0, 10022.0, 10023.0,
11209.0, 10463.0, 11229.0, 10075.0, 11231.0, 10024.0, 11223.0,
10028.0, 11412.0, 11385.0, 11360.0, 10451.0, 11692.0, 11040.0,
10308.0, 10038.0, 11220.0, 11369.0, 11235.0, 11416.0, 11372.0,
11238.0, 10128.0, 10017.0, 11230.0, 11694.0, 10012.0, 10309.0,
11414.0, 11232.0, 10472.0, 10033.0, 11434.0, 11356.0, 10468.0,
11358.0, 11228.0, 10306.0, 10016.0, 10036.0, 10304.0, 11001.0,
10454.0, 11377.0, 11216.0, 11106.0, 10461.0, 10003.0, 10006.0,
10314.0, 10030.0, 11365.0, 11005.0, 11103.0, 11379.0, 10462.0,
11368.0, 11435.0, 10014.0, 11215.0, 10029.0, 11233.0, 11222.0,
10471.0, 11225.0, 11373.0, 10301.0, 11212.0, 11201.0, 10307.0,
10452.0, 11237.0, 11104.0, 11432.0, 11436.0, 11362.0, 11420.0,
11208.0, 10470.0, 11366.0, 11429.0, 11219.0, 11413.0, 10466.0,
11367.0, 11419.0, 10455.0, 10019.0, 11422.0, 10467.0, 11239.0,
11378.0, 10025.0, 11226.0, 10011.0, 10009.0, 11210.0, 11423.0,
11426.0, 10034.0, 11105.0, 10475.0, 10002.0, 11355.0, 11221.0,
10032.0, 11204.0, 11374.0, 11364.0, 10001.0, 11249.0, 11411.0,
11234.0, 10473.0, 10065.0, 11211.0, 11217.0, 11218.0, 11203.0,
10460.0, 10013.0, 11205.0, 10310.0, 10010.0, 11101.0, 10007.0,
11004.0, 11433.0, 10005.0, 10027.0, 10474.0, 11421.0, 10469.0,
10026.0, 10035.0, 11207.0, 11213.0, 10453.0, 11418.0, 11415.0,
10040.0, 10457.0, 10031.0, 10303.0, 11427.0, 10069.0, 11428.0,
10464.0, 10004.0, 10018.0, 10456.0, 10302.0, 11363.0, 11693.0,
10459.0, 10037.0, 10039.0, 7094.0, 10103.0, 10044.0, 10280.0,
7030.0, 11563.0, 10553.0, 11430.0, 7117.0, 10523.0, 11552.0,
11003.0, 7114.0, '19044', '11701', '11754', '54305-1654', '07664',
'10549', '95476-9005', '10577', '33137-0098', '07102', '171111',
'92019', '11797-9012', 75081.0, 10020.0, 10595.0, 10119.0, 10111.0,
11109.0, 10591.0, 83.0, 10701.0, 10129.0, 11520.0, 10122.0, 7310.0,
11733.0, 92123.0, 7632.0, 11507.0, 8776.0, 8691.0, 11042.0, 10710.0,
98206.0, 14225.0, 11577.0, '10523', '23541-1223', 11716.0, 75240.0,
11550.0, 7070.0, 10000.0, 13202.0, 8540.0, 10282.0, 11021.0,
11763.0, 7801.0, 60148.0, 11501.0, 11749.0, 7663.0, 11241.0,
10153.0, 10123.0, 10281.0, 0.0, 33486.0, 85080.0, 11704.0, 55812.0,
11.0, 10801.0, 11530.0, 11371.0, 7104.0, 23113.0, 11741.0, 11695.0,
10162.0, 85044.0, 7073.0, 10048.0, 11566.0, 11202.0, 10705.0,
11553.0, 11561.0, 8724.0, '48195-0391', '08540', 11746.0, 11797.0,
77201.0, 11542.0, 11251.0, 7410.0, 7450.0, 11793.0, 11559.0,
11768.0, 11581.0, 11359.0, 11590.0, 11703.0, 75024.0, 95476.0,
92506.0, 7307.0, 10107.0, 8520.0, 7144.0, 11011.0, 10803.0, 14644.0,
7920.0, 10984.0, 11570.0, 10965.0, 11778.0, 17032.0, 11580.0,
11569.0, 11565.0, 7901.0, 7041.0, 11756.0, 90304.0, 18466.0,
11386.0, 33304.0, 7306.0, 1123.0, 11710.0, 30339.0, 8069.0, 7024.0,
11598.0, '11788', '07086', '45274-2596', 7042.0, 10604.0, 7304.0,
8854.0, 11743.0, 11803.0, 10708.0, 6851.0, 30005.0, 60018.0, 6511.0,
91754.0, 7302.0, 8527.0, 23502.0, 11599532.0, 11735.0, 8690.0,
7430.0, 11554.0, 8003.0, 7601.0, 10112.0, 11725.0, 10041.0,
114566.0, 95762.0, '14221', 20188.0, 73126.0, '60179', '11434-420',
'07661', '14450', '07632', '07712', '61602', '10583', '29616',
'11576', '20201', '11570', '11317', 'ZIP CODE', '11730',
'35210-6928', '11251', '63042', '32255', 92018.0, 11776.0, 6854.0,
1000000.0, 3110.0, 53707.0, 80155.0, 7652.0, 11791.0, 11801.0,
11030.0, 10601.0, 11804.0, 7047.0, 7730.0, 11020.0, 19044.0,
12203.0, '10041', '11561', '75403', '43216', '06801-1459',
'6462430478', '02140', '07410', '11738', '74147', '10925', '11695',
'08854', '07458', '32750', '11215-0389', '18519', '15219',
'11756-0215', '11715', '48195-0954', '33428', '70163', '11704',
'11779', '75026-0848', '02205', '7056', '11802', '02459', '08807',
'11510', '64108', '85711', '07620', '98036', '10541', '10573',
'11590-5114', '92013-0848', '10514', '07304', '07650', '91710',
'18773-9635', '30303', '0', '11729', '78265', '07090', '11542',
'10550', '11797', '55125', '33486', '30348-5658', '60076', '30360',
'10101', '10111', '12203', '12941-0129', '91754', '11001-2024',
'10123', '91356', '44122-5340', '10602-5055', '48451-0505', '35219',
'07144', '11024', '1143', '07042', '07079', '08873', '10522',
'11756', '11762', '08827-4010', '08080', '18966', '07002', '10994',
'75086', '07093', '11797-9001', '60018-4501', '11020', '48393-1022',
'33611', '11793', '11111', '43607', '32073', '12345', '11766',
'INDIAN WEL', '03104', '10528', '20123', '07203', '12024', '11549',
'19850', '27713', '07670', '07067', '54602', '11767', '37214',
'78265-9707', '0000000', '14206', '07417', '11540', '92923',
'85251-6547', '30144-7518', '11741', '11797-9004', '91504', '43054',
'10015', '117', '11507', '11753', 11747.0, 7981.0, 11980.0, 11576.0,
7372.0, 14615.0, 1208.0, 7940.0, '07675', '28201-1115', '14108',
'10532', '14009', '10566', '34230', '14203', '06901', 'NEWARK',
'11743', '14883', '11570-4705', '11535', '94566-9049', '6201-5773',
'10008', '11533', '60523', '66207', '43218-3083', '80111', '10703',
'10107', '29659', '320896', '08086', '12590', 'ANONYMOUS', '92036',
'10952', '35476', 'NY', '45241', '11772-3518', '43614', '10150',
'18042', '48090-2036', '11795', '14692-2878', 'UNSURE', '20188',
'90036', '30345', '782659705', '11735-9000', '08830', '11760',
'10435', '11518', '33122', '07045', '67504', '92506', '55438-0902',
10177.0, 10115.0, 11242.0, 10165.0, 76406.0, 17602.0, 12814.0,
10550.0, 7646.0, 33130.0, 11111.0, 7093.0, 33480.0, 2907.0, 22172.0,
11787.0, 14626.0, 68135.0, 7017.0, 10104.0, 10589.0, '10023-0007',
11341.0, 29006.0, 10703.0, 10605.0, 11431.0, 10116.0, 1850.0,
74106.0, 11514.0, 12345.0, 84770.0, 18015.0, 10150.0, 12779.0,
100000.0, 10603.0, 10108.0], dtype=object)

Its clear to see that Incident Zip has some issues other than the mixed data types, these issues are:

Mixed data types, sometimes floats, sometimes strings

Some on the zipcodes have 4 digits added to it after an hyphen

Some zipcodes are nan, others are ?, UNKNOWN, ANONYMOUS and so on

In [15]: # Function that cleans the Incident Zip values and returns nan for data that cannot be cleaned
def correct_zip(zip_code):
try:
zip_code = int(float(zip_code))
except:
try:
zip_code = int(float(zip_code.split('-')[0]))
except:
return np.nan
if zip_code < 10000 or zip_code > 19999:
return np.nan
else:
return str(zip_code)

I [16] # A l t i f ti t l I id t Zi d t
https://github.com/ikp-ogbeide/nyc-311-data-analysis/blob/master/311_data_analysis.ipynb 4/12
10/2/2018 nyc-311-data-analysis/311_data_analysis.ipynb at master · ikp-ogbeide/nyc-311-data-analysis · GitHub
In [16]: # Apply correct_zip function to clean Incident Zip data
df['Incident Zip'] = df['Incident Zip'].apply(correct_zip)

In [17]: #Remove rows from data that have incident zip as null i.e nan
df = df[df['Incident Zip'].notnull()]

In [18]: #Count of non-null values for every column


df_columns = list(df.columns)
df[df_columns].count()

Out[18]: Created Date 1951306


Closed Date 1910835
Agency 1951306
Complaint Type 1951306
Incident Zip 1951306
City 1951245
Status 1951303
Borough 1951306
Latitude 1887423
Longitude 1887423
dtype: int64

In [19]: #Closed date, Latitude, and Longitude all have missing values, best to remove the rows where data in those
columns are missing
df = df[(df['Latitude'].notnull()) & (df['Longitude'].notnull()) & (df['Closed Date'].notnull())]

In [20]: #Count of non-null values for every column


df_columns = list(df.columns)
df[df_columns].count()

Out[20]: Created Date 1847268


Closed Date 1847268
Agency 1847268
Complaint Type 1847268
Incident Zip 1847268
City 1847211
Status 1847265
Borough 1847268
Latitude 1847268
Longitude 1847268
dtype: int64

In [21]: #As part of the EDA process, all columns should be explored to identify columns with messy data or nan
df['Agency'].unique()

Out[21]: array(['DOT', 'DCA', 'NYPD', 'FDNY', 'DOHMH', 'TLC', 'DOE', 'DOB', 'DPR',
'EDC', 'DSNY', 'DEP', 'DOITT', 'HPD', '3-1-1', 'DFTA', 'DHS'], dtype=object)

In [22]: df['Complaint Type'].unique()

Out[22]: array(['Sidewalk Condition', 'Consumer Complaint', 'Blocked Driveway',


'EAP Inspection - F59', 'Derelict Vehicle', 'Street Condition',
'Indoor Air Quality', 'Broken Muni Meter', 'Illegal Parking',
'Animal Abuse', 'Taxi Complaint', 'Street Sign - Damaged',
'Noise - Street/Sidewalk', 'Noise - Commercial',
'School Maintenance', 'Homeless Encampment', 'Noise - Vehicle',
'Construction', 'Vending', 'Traffic', 'Food Establishment',
'Damaged Tree', 'Street Sign - Missing',
'For Hire Vehicle Complaint', 'Dead Tree', 'Standing Water',
'Illegal Tree Damage', 'Highway Condition', 'Noise - Park',
'Overgrown Tree/Branches', 'Drinking', 'Maintenance or Facility',
'Fire Safety Director - F58', 'Noise - Helicopter',
'Illegal Fireworks', 'Root/Sewer/Sidewalk Condition',
'Urinating in Public', 'Dirty Conditions',
'Noise - House of Worship', 'DPR Internal', 'Food Poisoning',
'Industrial Waste', 'Water System', 'Graffiti', 'Curb Condition',
'Building/Use', 'Fire Alarm - New System', 'Mold',
'Street Sign - Dangling', 'Rodent',
'Unsanitary Animal Pvt Property', 'Public Payphone Complaint',
'Animal in a Park', 'General Construction/Plumbing', 'Asbestos',
'Sewer', 'Street Light Condition', 'Plumbing', 'FLOORING/STAIRS',
'Harboring Bees/Wasps', 'Fire Alarm - Modification',
'Fire Alarm - Addition', 'Bike/Roller/Skate Chronic',
'Violation of Park Rules', 'City Vehicle Placard Complaint',
'Fire Alarm - Reinspection', 'Disorderly Youth',
'Bus Stop Shelter Placement', 'Open Flame Permit', 'Indoor Sewage',
'Bike Rack Condition', 'Public Toilet', 'Bridge Condition',
'Posting Advertisement', 'GENERAL', 'Senior Center Complaint',
'Panhandling', 'Snow', 'Special Projects Inspection Team (SPIT)',
'Municipal Parking Facility', 'Special Enforcement',
'Highway Sign - Damaged', 'Taxi Report', 'Broken Parking Meter',
'Beach/Pool/Sauna Complaint', 'Derelict Vehicles',
'Traffic Signal Condition', 'HEAT/HOT WATER',
'Missed Collection (All Materials)', 'Public Assembly',
'Overflowing Litter Baskets', 'Sweeping/Missed', 'Vacant Lot',
'ELEVATOR', 'Sanitation Condition', 'Other Enforcement',
'Investigations and Discipline (IAD)', 'Drinking Water',
'Found Property', 'Squeegee', 'Unsanitary Pigeon Condition',
'ELECTRIC', 'Hazardous Materials', 'UNSANITARY CONDITION',
'Bottled Water', 'Boilers', 'Electrical', 'Elevator',
'Emergency Response Team (ERT)', 'BEST/Site Safety',
'Request Xmas Tree Collection', 'Cranes and Derricks', 'Tanning',
'PLUMBING', 'APPLIANCE', 'HEATING', 'Litter Basket / Request',
'For Hire Vehicle Report', 'DOOR/WINDOW', 'WATER LEAK',
'Sweeping/Inadequate', 'Scaffold Safety',
'Fire Alarm - Replacement', 'Radioactive Material',
'Recycling Enforcement', 'Overflowing Recycling Baskets',
'Derelict Bicycle', 'Homeless Person Assistance', 'PAINT/PLASTER',
'Highway Sign - Dangling', 'OUTSIDE BUILDING', 'Water Quality',
'Public Assembly - Temporary', 'Miscellaneous Categories',
'Lifeguard', 'NONCONST', 'PAINT - PLASTER', 'GENERAL CONSTRUCTION',
'CONSTRUCTION', 'Legal Services Provider Complaint',
'Non-Residential Heat', 'Highway Sign - Missing',
'X-Ray Machine/Equipment', 'SAFETY', 'VACANT APARTMENT',
'Stalled Sites', 'Building Condition', 'AGENCY',
'Transportation Provider Complaint', 'Water Conservation', 'Noise',
'Air Quality', 'Plant', 'Lead', 'Collection Truck Noise',
'Special Natural Area District (SNAD)', 'Adopt-A-Basket',
'Literature Request', 'SG-99', 'Noise - Residential',
'Non-Emergency Police Matter', 'New Tree Request'], dtype=object)

In [23]: df['City'].unique()

Out[23]: array(['CORONA', 'NEW YORK', 'FLUSHING', 'BRONX', 'WHITESTONE',


'STATEN ISLAND', 'OZONE PARK', 'EAST ELMHURST', 'LONG ISLAND CITY',
'BROOKLYN', 'RIDGEWOOD', 'COLLEGE POINT', 'SUNNYSIDE', 'ASTORIA',
'OAKLAND GARDENS', 'FOREST HILLS', 'BAYSIDE', 'SAINT ALBANS',
'QUEENS VILLAGE', 'BELLEROSE', 'JACKSON HEIGHTS', 'HOWARD BEACH',
'RICHMOND HILL', 'WOODSIDE', 'HOLLIS', 'WOODHAVEN', 'JAMAICA',
'FRESH MEADOWS', 'ELMHURST', 'Ridgewood', 'MASPETH', 'GLEN OAKS',
'REGO PARK', 'Long Island City', 'MIDDLE VILLAGE', 'Bayside', nan,
'SOUTH RICHMOND HILL', 'ROSEDALE', 'Little Neck', 'LITTLE NECK',
'Jamaica', 'Richmond Hill', 'SPRINGFIELD GARDENS', 'Fresh Meadows',
'East Elmhurst', 'Woodhaven', 'Howard Beach', 'FLORAL PARK',
'KEW GARDENS', 'SOUTH OZONE PARK', 'CAMBRIA HEIGHTS',
'Far Rockaway', 'Flushing', 'South Ozone Park', 'Elmhurst',
'Ozone Park', 'Corona', 'South Richmond Hill', 'Jackson Heights',
'FAR ROCKAWAY', 'Queens Village', 'Springfield Gardens', 'Astoria',
'Cambria Heights', 'Glen Oaks', 'ROCKAWAY PARK', 'Rego Park',
'ARVERNE', 'Middle Village', 'NEW HYDE PARK', 'Woodside',
'Kew Gardens', 'Rockaway Park', 'Hollis', 'Maspeth', 'Rosedale',
'Saint Albans', 'Arverne', 'BREEZY POINT', 'Forest Hills',
'Oakland Gardens', 'Sunnyside', 'Bellerose', 'QUEENS', 'Whitestone',
'Floral Park', 'New Hyde Park', 'College Point', 'NEW HEMPSTEAD',
'UNKNOWN', 'BEDFORD HILLS', 'Breezy Point', 'BELLMORE', 'MANHATTAN'], dtype=object)

In [24]: df['Status'].unique()

Out[24]: array(['Closed', 'Pending', 'Assigned', nan, 'Open', 'Started'], dtype=object)

In [25]: df['Borough'].unique()

Out[25]: array(['QUEENS', 'MANHATTAN', 'BRONX', 'STATEN ISLAND', 'BROOKLYN',


'Unspecified'], dtype=object)

Columns with messy or missing data

https://github.com/ikp-ogbeide/nyc-311-data-analysis/blob/master/311_data_analysis.ipynb 5/12
10/2/2018 nyc-311-data-analysis/311_data_analysis.ipynb at master · ikp-ogbeide/nyc-311-data-analysis · GitHub

City - contains some cities in uppercase and others in lowercase

status - contains nan values

Borough - contains 'Unspecified' boroughs

In [26]: #Lets look at the unspecified boroughs, we want to be sure that removing data from df won't cause problems
later on
df[df['Borough']=='Unspecified'][['Agency', 'City']]

Out[26]: Agency City

Unique Key

29167930 NYPD STATEN ISLAND

28650971 DPR STATEN ISLAND

28993659 NYPD STATEN ISLAND

28337889 NYPD STATEN ISLAND

28768853 TLC NEW HEMPSTEAD

27339785 TLC BEDFORD HILLS

28911820 NYPD STATEN ISLAND

28767733 NYPD STATEN ISLAND

28867255 NYPD STATEN ISLAND

29044115 NYPD STATEN ISLAND

29183835 DCA BELLMORE

28994609 NYPD STATEN ISLAND

29503930 NYPD STATEN ISLAND

29040790 NYPD STATEN ISLAND

29211842 NYPD STATEN ISLAND

28151378 NYPD STATEN ISLAND

28356529 NYPD STATEN ISLAND

28983245 NYPD STATEN ISLAND

28763594 NYPD STATEN ISLAND

29217588 NYPD STATEN ISLAND

29209600 NYPD STATEN ISLAND

28961292 NYPD STATEN ISLAND

29159476 NYPD STATEN ISLAND

28945701 NYPD STATEN ISLAND

28305246 NYPD STATEN ISLAND

28501571 NYPD STATEN ISLAND

27739170 DPR STATEN ISLAND

28135459 TLC BAYSIDE

28944841 NYPD STATEN ISLAND

In [27]: # Majority of the data belongs to NYPD Agency and occurs in Staten Island
# To ensure I don't lose too much data from NYPD, I need to ensure this accounts for a neglegible number o
f NYPD
nypd_total = df[df['Agency']=='NYPD']['Borough'].count()
nypd_unspecified = df[(df['Borough']=='Unspecified') & (df['Agency']=="NYPD")]['Borough'].count()
nypd_unspec_perct = nypd_unspecified/nypd_total*100
print("%1.3f"%nypd_unspec_perct)

0.005

In [28]: #Boroughs that are unspecified are negligible that it can be removed
df = df[df['Borough'] != 'Unspecified']

In [29]: #Number of Status columns with nan


status_nan = len(df[df['Status'].isnull()].index)
print(status_nan)

In [30]: #The number of rows with columns Status as nan is 3, which is also negligible, I can remove it from the da
taframe also.
df = df[df['Status'].notnull()]

In [31]: # Since some city values are represented both in uppercase and lowercase, it's better to have the city in
the same case
# Convert all City Values to Camel Case
def camel_case(city):
try:
city = city.split(' ')
city = ' '.join([x.lower().capitalize() for x in city])
if city == 'Unknown':
return np.nan
else:
return city
except:
return np.nan

In [32]: # Apply camel_case function to City column


df['City'] = df['City'].apply(camel_case)

In [33]: # Lets view the City values with nan


df[df['City'].isnull()].groupby('Agency')['Status'].count()

Out[33]: Agency
DOT 57
TLC 1
Name: Status, dtype: int64

In [34]: # 57 of Cities with nan value are of DOT Agency.


# It's better to know if this is significant before removing it
city_null_dot = len(df[(df['City'].isnull()) & (df['Agency']=='DOT')].index)
dot_total = len(df[df['Agency']=='DOT'].index)
city_null_dot_perct = (city_null_dot/dot_total)*100
print("%1.3f"%city_null_dot_perct)

0.024

In [35]: # 0.024% is negligible and so Cities with nan can be removed from df
df = df[df['City'].notnull()]

In [36]: # Created Date and Closed Date aren't in DateTime object. It's convenient when working with DateTime objec
t
# Convert Created Date and Closed Date values to DateTime object.
import datetime
df['Created Date'] = df['Created Date'].apply(lambda x:datetime.datetime.strptime(x,'%m/%d/%Y %I:%M:%S %p'
))
df['Closed Date'] = df['Closed Date'].apply(lambda x:datetime.datetime.strptime(x,'%m/%d/%Y %I:%M:%S %p'))

In [37]: # It would be useful to create a column to compute how long it takes to close a complaint
df['Processing Time'] = df['Closed Date'] - df['Created Date']

In [38]: # Viewing the descriptive statistics on the Processing Time can give some insights on turn around time
df['Processing Time'].describe()

Out[38]: count 1847178


mean 14 days 18:28:19.930685
std 47 days 13:22:45.772770
min -365 days +00:00:00
25% 0 days 03:10:15
50% 2 d 00 00 00
https://github.com/ikp-ogbeide/nyc-311-data-analysis/blob/master/311_data_analysis.ipynb 6/12
10/2/2018 nyc-311-data-analysis/311_data_analysis.ipynb at master · ikp-ogbeide/nyc-311-data-analysis · GitHub
50% 2 days 00:00:00
75% 8 days 19:38:00.750000
max 918 days 14:08:12
Name: Processing Time, dtype: object

From the descriptive statistis, we can see minimum processing time is negative, this means something is wrong with the date data and it
should be explored

In [39]: # View Prcoessing Time data that is negative


df[df['Processing Time']<datetime.timedelta(0,0,0)].head(3)

Out[39]:
Created Closed Complaint Incident Process
Agency City Status Borough Latitude Longitude
Date Date Type Zip Time

Unique
Key

Unsanitary
2014- 2014-
28581213 DOHMH Animal Pvt 10456 Bronx Pending BRONX 40.835153 -73.912449 -8 days
07-31 07-23
Property

2014- 2014-
28541630 DOHMH Rodent 11206 Brooklyn Pending BROOKLYN 40.701265 -73.929265 -18 days
07-25 07-07

2014- 2014- New


28934215 DOHMH Rodent 10031 Pending MANHATTAN 40.827318 -73.946620 -28 days
09-22 08-25 York

There are issues with some data in df, the Closed Date in some rows preceede its Created Date, thus,
resulting in the negative processing time.

In [40]: # Remove all data from df that have negative Processing Time
df = df[df['Processing Time']>=datetime.timedelta(0,0,0)]

In [41]: #Count of non-null values for every column


df_columns = list(df.columns)
df[df_columns].count()

Out[41]: Created Date 1830970


Closed Date 1830970
Agency 1830970
Complaint Type 1830970
Incident Zip 1830970
City 1830970
Status 1830970
Borough 1830970
Latitude 1830970
Longitude 1830970
Processing Time 1830970
dtype: int64

Now that the data looks clean enough for further exploration, I'll create a function to incorporate all the
data cleaning process
This makes future work on the dataset convenient.

In [42]: def open_311_data(datafile):


import numpy as np
import pandas as pd
import datetime

#Function to clean Incident Zip


def correct_zip(zip_code):
try:
zip_code = int(float(zip_code))
except:
try:
zip_code = int(float(zip_code.split('-')[0]))
except:
return np.nan
if zip_code < 10000 or zip_code > 19999:
return np.nan
else:
return str(zip_code)

#Function to clean City values, i.e convert City values to Camel Case
def camel_case(city):
try:
city = city.split(' ')
city = ' '.join([x.lower().capitalize() for x in city])
if city == 'Unknown':
return np.nan
else:
return city
except:
return np.nan

#Read the file


df = pd.read_csv(datafile, index_col='Unique Key')

#Drop columns that aren't relevant to this analysis


df_cols_rmv = ['Incident Address', 'Street Name', 'Cross Street 1', 'Cross Street 2',
'Intersection Street 1', 'Intersection Street 2', 'Landmark', 'Facility Type',
'Due Date', 'Resolution Description','Community Board', 'X Coordinate (State Plane)',
'Y Coordinate (State Plane)', 'Park Facility Name', 'Park Borough', 'School Name',
'School Number', 'School Region', 'School Code', 'School Phone Number', 'School Addres
s',
'School City', 'School State', 'School Zip', 'School Not Found', 'School or Citywide Co
mplaint',
'Vehicle Type', 'Taxi Company Borough', 'Taxi Pick Up Location', 'Bridge Highway Name',
'Bridge Highway Direction', 'Road Ramp', 'Bridge Highway Segment', 'Garage Lot Name',
'Ferry Direction', 'Ferry Terminal Name', 'Location', 'Address Type', 'Agency Name',
'Resolution Action Updated Date', 'Descriptor', 'Location Type']

df.drop(df_cols_rmv, inplace=True, axis=1)

#Clean Incident Zip


df['Incident Zip'] = df['Incident Zip'].apply(correct_zip)

#Clean City values


df['City'] = df['City'].apply(camel_case)

#Drop unspecified boroughs


df = df[df['Borough'] != 'Unspecified']

#Drop all rows with nan


df = df.dropna(how='any')

#Convert Created Date and Closed Date to datetime objects, create a Processing Time column
df['Created Date'] = df['Created Date'].apply(lambda x:datetime.datetime.strptime(x,'%m/%d/%Y %I:%M:%S
%p'))
df['Closed Date'] = df['Closed Date'].apply(lambda x:datetime.datetime.strptime(x,'%m/%d/%Y %I:%M:%S %
p'))
df['Processing Time'] = df['Closed Date'] - df['Created Date']

#Remove negative processing time rows from the dataframe


df = df[df['Processing Time']>=datetime.timedelta(0,0,0)]

return df

In [43]: # Open, read, and process the NYC 311 dataset using the open_311_data function
datafile = '311_Service_Requests_from_2014.csv'
df = open_311_data(datafile)
df.head(3)

C:\Users\Ikp\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py:2802: DtypeWarning: Columns (8,1


7,40,41,42,43,44,45,46,47,48,49) have mixed types. Specify dtype option on import or set low_memory=False.
if self.run_code(code, result):
C:\Users\Ikp\Anaconda3\lib\site-packages\numpy\lib\arraysetops.py:463: FutureWarning: elementwise comparis
on failed; returning scalar instead, but in the future will perform elementwise comparison
mask |= (ar1 == a)
Out[43]:
Created Closed Complaint Incident Processi
Agency City Status Borough Latitude Longitude
Date Date Type Zip Time

https://github.com/ikp-ogbeide/nyc-311-data-analysis/blob/master/311_data_analysis.ipynb 7/12
10/2/2018 nyc-311-data-analysis/311_data_analysis.ipynb at master · ikp-ogbeide/nyc-311-data-analysis · GitHub
Unique
Key

2014- 2014-
Sidewalk 24 days
28457271 07-11 08-05 DOT 11368 Corona Closed QUEENS 40.751870 -73.862718
Condition 21:32:39
15:08:58 12:41:37

2014- 2014-
Consumer New 3 days
28644314 08-08 08-12 DCA 10014 Closed MANHATTAN 40.732623 -74.001119
Complaint York 21:27:12
14:06:22 11:33:34

2014-11- 2014-11-
Blocked 0 days
29306886 18 18 NYPD 11358 Flushing Closed QUEENS 40.760384 -73.806826
Driveway 00:42:42
00:52:40 01:35:22

Visualizations
In [44]: import matplotlib.pyplot as plt
%matplotlib inline

In [45]: # Visualizing 311 call data Incidents with a heat map


import gmaps

In [46]: import settings # Contains my Google map API key


gmaps.configure(api_key=settings.API_KEY) # Fill in with your API key
new_york_coordinates = (40.75, -74.00)
locations = df[['Latitude','Longitude']]
fig = gmaps.figure(center=new_york_coordinates, zoom_level=12)
heatmap_layer = gmaps.heatmap_layer(locations)
fig.add_layer(heatmap_layer)
fig

Failed to display Jupyter Widget of type Figure.

If you're reading this message in the Jupyter Notebook or JupyterLab Notebook, it may mean that the widgets JavaScript is still
loading. If this message persists, it likely means that the widgets JavaScript library is either not installed or not enabled. See the
Jupyter Widgets Documentation (https://ipywidgets.readthedocs.io/en/stable/user_install.html) for setup instructions.

If you're reading this message in another frontend (for example, a static rendering on GitHub or NBViewer
(https://nbviewer.jupyter.org/)), it may mean that your frontend doesn't currently support widgets.

In [47]: #Exploration of incidents by Borough


borough = df.groupby('Borough')
borough.size().plot(kind='bar', figsize=(12,6), title=('Incidents by Borough'));

From the graph, we can see that Brooklyn has the most incidents, while, Staten Island has the least. It should also be noted that Staten
Island is the smallest of the five boroughs so that could be why it has the least incidents.

In [48]: # Visualization of incidents by Agency


agency = df.groupby('Agency')
agency.size().plot(kind='bar', figsize=(12,6), title=('Incidents calls per Agency'));

HPD has the highest complaints followed by NYPD in Brooklyn

In [49]: #Visualization of numnber of incidents in each Borough by Agency


agency_borough = df.groupby(['Agency','Borough']).size().unstack()
agency_borough.plot(kind='bar', title='Total Inicidents in each Borough by Agency', figsize=(15,7));

In [50]: #Visualization of top Agencies with most incidents per borough


col_number = 2
row_number = 3

https://github.com/ikp-ogbeide/nyc-311-data-analysis/blob/master/311_data_analysis.ipynb 8/12
10/2/2018 nyc-311-data-analysis/311_data_analysis.ipynb at master · ikp-ogbeide/nyc-311-data-analysis · GitHub
fig, axes = plt.subplots(row_number,col_number, figsize=(12,12))

for i, (label,col) in enumerate(agency_borough.iteritems()):


ax = axes[int(i/col_number), i%col_number]
col = col.sort_values(ascending=True)[:5]
col.plot(kind='barh', ax=ax)
ax.set_title(label)

plt.tight_layout()

In [51]: # Visualization of most Complaints per Borough


borough_comp = df.groupby(['Complaint Type','Borough']).size().unstack()

col_number = 2
row_number = 3
fig, axes = plt.subplots(row_number,col_number, figsize=(12,12))

for i, (label,col) in enumerate(borough_comp.iteritems()):


ax = axes[int(i/col_number), i%col_number]
col = col.sort_values(ascending=True)[:15]
col.plot(kind='barh', ax=ax)
ax.set_title(label)

plt.tight_layout()

Visualization on processing time.

The Processing Time in the dataframe is a datetime object, it is easier to convert the processing time into floats for calcuation

In [52]: import numpy as np


df['Processing Time Float'] = df['Processing Time'].apply(lambda x:x/np.timedelta64(1, 'D'))

In [53]: # Histogram of Processing Time


df['Processing Time Float'].hist(bins=30, figsize=(15,7));

https://github.com/ikp-ogbeide/nyc-311-data-analysis/blob/master/311_data_analysis.ipynb 9/12
10/2/2018 nyc-311-data-analysis/311_data_analysis.ipynb at master · ikp-ogbeide/nyc-311-data-analysis · GitHub

Since datetime objects occurs in the dataframe, I can build a bar graph to show Incidents per month and
other interesting informtation. This would allow easy discovery of noticible trends and
seasonality.

I can do this by adding a column to the data to keep track of year and months only

In [54]: import datetime


df['YYYY-MM'] = df['Created Date'].apply(lambda x: datetime.datetime.strftime(x, '%Y-%m'))

In [55]: #Incidents on a monthly basis


monthly_incidents = df.groupby('YYYY-MM').size().plot(figsize=(12,5), title='Incidents on a monthly basi
s');

In [56]: # Boroughs with Processing Time on a monthly basis


df.groupby(['YYYY-MM','Borough'])['Processing Time Float'].mean().unstack().plot(figsize=(15,7),
title='Processing time per Borough
on a monthly basis');

In [57]: # Processing time per Borough


df.groupby('Borough')['Processing Time Float'].mean().plot(kind='bar', figsize=(15,7),
title='Processing Time per Borough');

In [58]: # Visulization of Number of Complaints per Agency on a monthly basis


date_agency = df.groupby(['YYYY-MM', 'Agency'])
date_agency.size().unstack().plot(kind='bar', figsize=(15,7), title='Number of Complaints per Agency on a
monthly basis');
plt.legend(loc='center left', bbox_to_anchor=(1.0, 0.5));

In [59]: # Visualization of Agency with their number of Complaints


df.groupby('Agency').size().sort_values(ascending=False).plot(kind='bar',figsize=(15,7),
title='Number of Complaints per Agency');

https://github.com/ikp-ogbeide/nyc-311-data-analysis/blob/master/311_data_analysis.ipynb 10/12
10/2/2018 nyc-311-data-analysis/311_data_analysis.ipynb at master · ikp-ogbeide/nyc-311-data-analysis · GitHub

Since HPD has the most complaints, I'll explore data relating to HPD to learn more about complaints handled by HPD

In [60]: # Visualization of Incidents handled by HPD by Borough on a monthly basis


df[df['Agency']=='HPD'].groupby(['YYYY-MM','Borough']).size().unstack().plot(figsize=(12,7),
title='Incidents per Borough o
n a monthly basis');

In [61]: # Visualizations of Complaints handled by HPD


df[df['Agency']=='HPD'].groupby('Complaint Type').size().sort_values(ascending=False).plot(kind='bar',
figsize=(12,6),
title='Number of each complaint ty
pe handled by HPD');

Visualizations of Complaint Type


In [62]: # Visualization of number of complaint type
df.groupby('Complaint Type').size().sort_values(ascending=False)[:20].plot(kind='bar', figsize=(15,6),
title='Bar graph of Complaint Ty
pe');

Noise - Residential has the most complaints, lets explore it further

In [63]: # Borough with the most Noise Complaints - Residential


df[df['Complaint Type']=='Noise - Residential'].groupby('Borough').size()[:10].sort_values(ascending=False
).plot(kind='bar',
title='Residential Noise Compl
aints per Borough');

https://github.com/ikp-ogbeide/nyc-311-data-analysis/blob/master/311_data_analysis.ipynb 11/12
10/2/2018 nyc-311-data-analysis/311_data_analysis.ipynb at master · ikp-ogbeide/nyc-311-data-analysis · GitHub

Brooklyn has the most noise complaints for residential, it would be interesting to know if this noise peaked anytime within the year of if
its uniform through the year

In [64]: brooklyn_noise = df[(df['Borough']=='BROOKLYN') & (df['Complaint Type']=='Noise - Residential')]


brooklyn_noise.groupby('YYYY-MM').size().plot(kind='bar', figsize=(12,6),
title='Residential noise complaint in Brooklyn on a monthly b
asis');

In [65]: # Complaints per Borough through the year


df.groupby(['YYYY-MM','Borough']).size().unstack().plot(figsize=(15,6))
plt.legend(loc='center left', bbox_to_anchor=(1.0, 0.5));

Observations

Brooklyn has the highest the number of incident calls followed by Queens. Staten Island has the least incident calls.

HPD has the most incident calls follwed by NYPD.

Majority of incidents occur in January followed by November and then the incident calls dips to its lowest in September followed by April.

HPD related incident calls follow a nearly regular pattern across all boroughs from month to month. Heat/Hot water complaints are the most
frequent

Noise in residential areas were the most complaints in 2014 followed by Heat/Hot water complaints

Noise complaints were peaked in September and was lowest in February

Conclusion
Brooklyn has on average the slowest processing from month to month and this caould be associated with the fact that it has the highest number of
i id t ll b t it h ld b t d th t hil St t I l d h b f th l t i id t ll it d 't h th f t t i ti F th

https://github.com/ikp-ogbeide/nyc-311-data-analysis/blob/master/311_data_analysis.ipynb 12/12

Você também pode gostar