InternAt — California Consortium for Public Health Informatics and Technology (CCPHIT)
Project Description:
Participants will design and implement a reproducible pipeline to discover, collect, and standardize location-based community resources (e.g., heat shelters, community health clinics, immunization centers, food banks, crisis lines). Learners will use ethical web-scraping and open data techniques, transform unstructured web content into structured datasets, geocode locations, and apply quality checks so the data can power PHapp’s community resource features. The project emphasizes privacy, data ethics, accessibility, and reproducibility.
- Apply academic knowledge to real-world tasks and challenges.
- Strengthen time management and organizational abilities.
- Acquire hands-on experience with industry-specific tools, software, or procedures.
- Learn how to conduct research, analyze data, or support projects in a practical context.
- Identify personal strengths and areas for improvement.
This project is designed to:
Build technical skills in Python-based web scraping (LLM/Requests/BeautifulSoup or Scrapy), data cleaning, and geocoding.
Strengthen applied data management (schema design, deduplication, source attribution, versioning).
Practice public-interest technology principles: consent-aware scraping, robots.txt, rate limiting, and bias mitigation.
Deliver a high-quality, re-usable open data asset aligned with agency needs.
| Hours | Duration |
|---|---|
| 60 | hours per week |