Inspiration

We compiled a bunch of brainstorming ideas and this happened to be the one we collectively were interested in purusing.

What it does

It shows compiled open New York City data dating from 2013-2017. It relates Housing & Development property sales to open parking and camera violations and parking tickets to try to discover if a relationship exists between property values and the number of tickets issued in that area.

How we built it

We cleaned our data through R and Python.

Challenges we ran into

There were numerous zero sale prices in a neighborhood. We developed an equation to combat this by finding the mean of property sales to estimate those sale prices.

Accomplishments that we're proud of

We managed to clean out data effectively!

What we learned

This experience has offered the opportunity to experiment with different ways of approaching code. It was fun to be connected within the CUNY community to take on a project together with people of similar interests.

What's next for The Correlation of Ticket Violations & Property Sales in NYC

We went through various options:

Mean sales price in borough
Mean sales price within neighborhood
If time allowed: a. Would have wanted to determine mean pricing within a neighborhood going through slicing methods. As we iterate through each row, take average of pervious rows and replace zero with mean value. b. compare neighborhood with residential/commericial building type and replace with avg.
Went with: Determined to check normal distribution of sales pricing to either replace with mean or median depending on skewness

CODE (R):

2013 2014

Keeping necessary columns

keeps <- c("Summons.Number", "Plate.ID", "Registration.State", "Plate.Type", "Issue.Date", "Violation.Code", "Violation.County", "Violation.Precinct")

Aug_2013_June_2014 <- CUNY.Hackathon.Aug2013June2014

cleaning neighborhood names.

Aug_2013_June_2014 <- Aug_2013_June_2014 %>% mutate(Violation.County= case_when(Violation.County == "BX" ~ "BRONX", Violation.County == "K" ~ "KINGS", Violation.County == "KING" ~ "KINGS", Violation.County == "Q" ~ "QUEENS", Violation.County == "QUEEN" ~ "QUEENS", Violation.County == "RC" ~ "RICHMOND", Violation.County == "RICH" ~ "RICHMOND", Violation.County == "R" ~ "RICHMOND", Violation.County == "NEWY" ~ "MANHATTAN", Violation.County == "MAN" ~ "MANHATTAN", Violation.County == "NEW Y" ~ "MANHATTAN", Violation.County == "QU" ~ "QUEENS", Violation.County == "MH" ~ "MANHATTAN", Violation.County == "QNS" ~ "QUEENS", Violation.County == "BK" ~ "KINGS", Violation.County == "QN" ~ "QUEENS", Violation.County == "MN" ~ "MANHATTAN", Violation.County == "ST" ~ "STATEN_ISLAND", Violation.County == "NYC" ~ "MANHATTAN", Violation.County == "NY" ~ "MANHATTAN", TRUE ~ Violation.County))

Aug_2013_June_2014 = Aug_2013_June_2014[keeps]

changing name for violation.precinct column for later join

names(Aug_2013_June_2014)[8] <- "pct"

writing csv file

write.csv(Aug_2013_June_2014, file = "CUNY Hackathon Aug_2013_June_2014.csv")

2015 cleaning

Keeping necessary columns

keeps <- c("Summons.Number", "Plate.ID", "Registration.State", "Plate.Type", "Issue.Date", "Violation.Code", "Violation.County", "Violation.Precinct")

renaming file

fiscal_2015 <- CUNY.Hackathon.Year_2015

cleaning violation.county

fiscal_2015 <- fiscal_2015 %>% mutate(Violation.County= case_when(Violation.County == "BX" ~ "BRONX", Violation.County == "K" ~ "KINGS", Violation.County == "KING" ~ "KINGS", Violation.County == "Q" ~ "QUEENS", Violation.County == "QUEEN" ~ "QUEENS", Violation.County == "RC" ~ "RICHMOND", Violation.County == "RICH" ~ "RICHMOND", Violation.County == "R" ~ "RICHMOND", Violation.County == "NEWY" ~ "MANHATTAN", Violation.County == "MAN" ~ "MANHATTAN", Violation.County == "NEW Y" ~ "MANHATTAN", Violation.County == "QU" ~ "QUEENS", Violation.County == "MH" ~ "MANHATTAN", Violation.County == "QNS" ~ "QUEENS", Violation.County == "BK" ~ "KINGS", Violation.County == "QN" ~ "QUEENS", Violation.County == "MN" ~ "MANHATTAN", Violation.County == "ST" ~ "STATEN ISLAND", Violation.County == "NYC" ~ "MANHATTAN", Violation.County == "NY" ~ "MANHATTAN", TRUE ~ Violation.County))

CODE (PYTHON):

getting rid of unnecessary columns

fiscal_2015 = fiscal_2015[keeps]

changing name of violation. Precinct

names(fiscal_2015)[8] <- "pct"

writing csv

write.csv(fiscal_2015, file = "CUNY Hackathon Year_2015.csv")

def meanMedianImputation(df): # import pdb; pdb.set_trace();

# cleaning 

# remove \n from columns
df.columns = df.columns.str.replace('\n','')
# remove whitespace from both sides
df["NEIGHBORHOOD"] = df_list["NEIGHBORHOOD"].str.strip()

# get unique neighborhood names
neighborhoods = df["NEIGHBORHOOD"].unique()

# loop over neighborhoods
for x in range(len(neighborhoods)):

    # get neighborhood sale price
    neighborhood_sales = df.loc[df["NEIGHBORHOOD"]==neighborhoods[x], "SALE PRICE"]

    # get median for each neighborhood
    median = neighborhood_sales.median()
    # get mean for each neighborhood
    mean = neighborhood_sales.mean()



    # get skewness, if greater than 1, replace with median
    if ((neighborhood_sales.skew() > 1) | (neighborhood_sales.skew() < -1)) :
        # neighborhood_sales.replace(0.0, median, inplace=True)
        df.loc[df["NEIGHBORHOOD"]==neighborhoods[x], "SALE PRICE"] = df.loc[df["NEIGHBORHOOD"]==neighborhoods[x], "SALE PRICE"].replace(0, median)

    # else replace with mean
    elif neighborhood_sales.skew()==0:
        df.loc[df["NEIGHBORHOOD"]==neighborhoods[x], "SALE PRICE"] = df.loc[df["NEIGHBORHOOD"]==neighborhoods[x], "SALE PRICE"].replace(0, mean)
    else:
        df.loc[df["NEIGHBORHOOD"]==neighborhoods[x], "SALE PRICE"] = df.loc[df["NEIGHBORHOOD"]==neighborhoods[x], "SALE PRICE"].replace(0, mean)

**We will have a video prepared by 4:30 P.M. the video posted now at 2PM is a placeholder*