Willis Danielson

I'm a quantitative sociologist and analyst with an interest in demography, data visualization, and deep learning.


Green Space and Income in NYC

Using data from the 2015 NYC Tree Census and the services of geocod.io, I was able to plot the number of living street trees in a census tract by the tract's median income.


When using OLS, the bivariate relationship between a tract's median income and the number of living street trees present has a clear postive trend. Yet, there are glaring signs of heteroskedasticity in the data and the dependent variable is a count. As such, a simple OLS regression is not the correct choice for modeling this scenario. I added a group of sociodemographic predictors as controls, modeled the dependent variable using a negative binomial regression, used robust standard errors to account for the heteroskedascticity, and exlcuded an exposure/offset variable as they are not applicable here. It is far from academically rigorous, but it is still better than a basic linear bivariate model. Results from this restricted model show that percent white and percent owner-occupied emerge as significant predictors while the effect of income category shrinks below significance. This, possibly, reveals deeper socioeconomic disparities than simply income inequality. That said, I think it is important to keep in mind the lingering effects of the the Federal Housing Authority and its redlining, the fact that whites still have an outsized effect on the political economies they inhabit, and the benefits of generational wealth accrued under a racial police state. I checked the variance inflation factors, but none of my variables were worrisome (VIFs were all < 5). Regardless, it seems that percent white and home ownership may still be somewhat collinear considering that more than 73% of homeowners in the United States are Non-Hispanic Whites.

The results of the model I specified are summarized below.

Count of Street Trees Negative Binomial Regression

Dep. Variable: num_tree No. Observations: 2027
Model: GLM Df Residuals: 2018
Model Family: NegativeBinomial Df Model: 8
Link Function: Log Scale: 1.0000
Method: IRLS Log-Likelihood: -13375.
Date: Wed, 30 Jul 2025 Deviance: 623.18
Time: 14:39:35 Pearson chi2: 464.
No. Iterations: 30 Pseudo R-squ. (CS): 0.2132
Covariance Type: HC0

coef std err z P>|z| [0.025 0.975]
Intercept 4.8999 0.054 90.698 0.000 4.794 5.006
C(medIncCat)[T.1.0] -0.0093 0.028 -0.327 0.744 -0.065 0.047
C(medIncCat)[T.2.0] 0.0416 0.035 1.196 0.232 -0.027 0.110
C(edu)[T.1.0] 0.0694 0.045 1.542 0.123 -0.019 0.158
C(edu)[T.2.0] 0.0793 0.049 1.621 0.105 -0.017 0.175
pctWhite 0.0015 0.000 3.957 0.000 0.001 0.002
tract_pop 1.833e-05 1.93e-05 0.951 0.342 -1.95e-05 5.61e-05
shape_area 8.152e-08 1.12e-08 7.258 0.000 5.95e-08 1.04e-07
pct_own_occ 0.0060 0.001 8.451 0.000 0.005 0.007


SOGIE Symbol Set for AAC Users

Using a list of LGBTQIA vocabulary collected from several advocacy groups, researchers from the Rocky Mountain university of Health Professions and myself are working to understand the vernacular of the LGBTQIA community. We are hoping to isolate a core vocabulary which might be easily incorporated into AAC systems and make it possible for interested users to engage with social discourse on SOGIE issues more easily.

We are using data gathered from calls to the APIs of Wikipedia and Twitter to validate the current importance of these terms as well as identify novel terms that might be added.

Below is a node and edge graph that summarizes the hyperlink connections between terms on Wikipedia. Edges are bidirectional with an edge representing a link to or a link out of a term's wikipedia article.

Click for fullscreen
node and edges

US Housing Starts

Using data from the U.S. Census's Building Permits Survey and linearly interpolated population estimates, I was able to plot new residential building permits across a span of 16 years for all 50 states and the District of Columbia. I am always amazed that permitting appears to begin declining, almost across the board, years before the housing crisis came to a head in 2008. What is more, it is striking how similar the trend in housing starts for North Dakota matches with the growth of fracking and oil drilling in the United States.