The Green Space Data Challenge
To Registrants: The data challenge has officially begun! Please see important communications for registered participants, including registration instructions and next steps, on this page. To ensure that you have the full month to plan and conduct your analysis, we highly recommend completing this process as soon as possible.
How does access to green space impact public health? How can green space data provide actionable information about our communities and where people’s needs are or aren’t being met?
The Massive Data Institute at Georgetown’s McCourt School of Public Policy welcomes your participation in the virtual Green Space Data Challenge. Participants will have the entire month of February 2023 to demonstrate the value of green space data by creating analyses, visualizations, and new community indicators.
Access to green space is an essential need and has been found to have important quality of life implications. While access to green space is not equitable across communities nationwide, it has significant implications for community well-being, and has been found to encourage outdoor recreation, human connection, and positive mental and physical health.
The Challenge
In this data challenge, you will analyze the impact of inequitable green space access on communities using various data sources on our Redivis platform in individual or team notebooks. Your goal will be to transform these green space datasets into actionable community indicators that illustrate the effects of green space across one of four dimensions: public health, public safety, effects on a specific population (e.g., by age, race, or location), or physical environment (e.g. pollution levels).
We provide six datasets directly with information on green space for you to work with for the challenge. We also are linking to a variety of datasets on green space as well as subject area indicators (e.g. for public health, public safety, etc.) from the Environmental Impact Data Collaborative; you are free to use any of these in the challenge. In addition, you may request to use additional publicly available data in your submission by submitting a request by Wednesday, February 8th at 5:00PM EST. This data will be made available to all participants through the Redivis site. See the FAQ below for more information.
You will be evaluated by judges based on the relevance, completeness, and quality of your submission. Judges include:
Stephen T. Dickinson, PhD, Urban Health Collaborative, Drexel University
Alexander Din, United States Department of Housing and Urban Development (HUD)
Michelle Kondo, PhD, USDA-Forest Service
Jarlath O’Neil-Dunne, Spatial Analysis Lab, University of Vermont
Example Topics:
- Are communities in close proximity to green space more likely to have better air quality?
Is there a relationship between green space and social vulnerability? - Do communities with better green space access experience lower rates of violent crime or gun deaths?
- How strong is the relationship between community green space and obesity?
- What are the effects of good park systems on mortality?
Lit Review
For more ideas, see our lit review with examples of research involving green space in each of the four challenge subject areas.
explore moreGuidelines
When: Feb 1-28, 2023
Where: The data challenge is entirely virtual, with information and rules of the challenge available on this website. Updates will be posted on this website as well as emailed to registered participants.
Who you are: We welcome any participants over age 18, especially undergraduate students, graduate students, and early professional data scientists. Analysts based in academic institutions, government statistical offices, think tanks and policy labs, and community organizations are encouraged to participate.
What topics you can explore: Participants can analyze green space data in tandem with various other datasets that include health, environment, and/or public safety data.
Submissions and evaluation: Participants will conduct their analyses and submit a short project narrative that describes the research question, analytic approach, and key findings. We encourage participants to find creative ways to incorporate visualizations and other aspects of data storytelling to create a compelling narrative. In addition to their completed analysis with the indicators they used, participants will be asked to submit documentation describing each step of their process. The documentation should be detailed enough as to make the project fully replicable.
The narrative, indicator and methods, and organization and documentation of each participant’s project will be evaluated for relevance, completeness, and quality. More information on evaluation can be found in the FAQs at the bottom of this page.
Accommodation requests related to a disability should be made by 1/27/23 to pbidatachallenge@georgetown.edu .
Prize Information
There will be separate prize categories for submissions examining the effects of green space on the following subject areas:
Prize Categories | First | Second | Third |
---|---|---|---|
Community Health | $5000 | $2000 | $1000 |
Community Safety | $5000 | $2000 | $1000 |
Specific Populations | $5000 | $2000 | $1000 |
Physical Environment | $5000 | $2000 | $1000 |
Best Graduate Student Submission | one $1000 prize | ||
Best Undergraduate Submission | one $1000 prize |
Participating teams will be listed on MDI’s Place-Based Indicators project webpage. Winners will be invited to present their project both at a webinar hosted by the Association of Public Data Users (APDU) and at APDU’s annual conference in July 2023.
The Data
We will provide access to the following datasets. Challenge participants are welcome to bring their own data sources to be added or to use sources from our companion site, the Environmental Impact Data Collaborative (see our FAQs for more details).
A variety of datasets on community health, public safety, physical environment, and specific populations are available for use in your analysis through the EIDC and from other sources.
Dataset | Description | Geographic Coverage | Data type(s) | Estimated difficulty |
EnviroAtlas | US EPA | This data shows land cover for 30 US urban areas at 1-meter spatial resolution. Land cover data present a “birds-eye” view that can help identify important features, patterns, and relationships in the landscape. | 30 U.S. Communities. See full list here. | ESRI FileGDB, CSV | 🌳🌳🌳 |
Provision and Access to Open Spaces in Cities | Shows average share of built-up area in nine urban areas that is open public space, as well as percent of population living within 400 meters walking distance of open public space. | 9 US cities and urban areas. | CSV (various) | 🌳 |
ParkServe Data | Comprehensive database of local parks in census-defined urban areas. | Census-defined urban areas. | ESRI Shapefile, CSV | 🌳🌳🌳 |
Green space data by census block | Illustrates the square meters of total land per person within each census block group that is covered by green space. | National | ESRI FileGDB, CSV | 🌳🌳 |
Climate and Economic Justice Screening Tool | Highlights disadvantaged census tracts across all 50 states, the District of Columbia, and the U.S. territories. | National (census tract) | Varies | 🌳🌳 |
Tree Equity Score | Metric intended to help communities assess how well they are delivering equitable tree canopy cover to all residents. | National (by state) | ESRI Shapefile | 🌳 |
PAD-US-AR | Curated dataset based on the Protected Areas Database-United States (PAD-US) from the USGS. | National | Varies | 🌳🌳 |