InternAt — California Consortium for Public Health Informatics and Technology (CCPHIT)

Opportunity Summary 

Project Description:
Participants will design and implement a reproducible pipeline to discover, collect, and standardize location-based community resources (e.g., heat shelters, community health clinics, immunization centers, food banks, crisis lines). Learners will use ethical web-scraping and open data techniques, transform unstructured web content into structured datasets, geocode locations, and apply quality checks so the data can power PHapp’s community resource features. The project emphasizes privacy, data ethics, accessibility, and reproducibility.

Opportunity Learning Outcomes 
  • Apply academic knowledge to real-world tasks and challenges.
  • Strengthen time management and organizational abilities.
  • Acquire hands-on experience with industry-specific tools, software, or procedures.
  • Learn how to conduct research, analyze data, or support projects in a practical context.
  • Identify personal strengths and areas for improvement.
Opportunity Training 

This project is designed to:

Build technical skills in Python-based web scraping (LLM/Requests/BeautifulSoup or Scrapy), data cleaning, and geocoding.
Strengthen applied data management (schema design, deduplication, source attribution, versioning).
Practice public-interest technology principles: consent-aware scraping, robots.txt, rate limiting, and bias mitigation.
Deliver a high-quality, re-usable open data asset aligned with agency needs.

Program 
Center for Community Engagement
Location Type 
Remote
Expected Hours 
HoursDuration
60hours per week
Students required to have a personal vehicle 
No
Fees students may incur with this opportunity 
No Fees will be incurred by students
This opportunity provides some form of compensation 
No
Opportunity Availability 
Ongoing