Legal Review of the Collection and Use of Personal Health Data via Web Scraping: A Comparative Analysis of South Korea, the United States, and the European Union
Legal Review of the Collection and Use of Personal Health Data via Web Scraping: A Comparative Analysis of South Korea, the United States, and the European Union
Choi Ho-Young(Southern Gyeonggi Branch Office, Health Insurance Review)
5권 2호, 110~129쪽
초록
The rapid spread of tools such as web scraping and automated macros has made it technically easy—but legally complex—to collect large volumes of health-related data from websites and online services. This review compares the principal frameworks in South Korea, the United States, and the European Union to identify conditions for lawful and ethical research use. Baseline privacy statutes (Korea’s PIPA, U.S. HIPAA, EU GDPR), sectoral instruments, and enforcement trends reveal convergent requirements: (1) robust de-identification or pseudonymization; (2) a valid legal basis (explicit consent or statutory alternatives for scientific research in the public interest); (3) strict respect for access controls and anti-circumvention rules (no bypassing logins, CAPTCHAs, paywalls, or technical protection measures); (4) transparency and independent oversight (e.g., notices, data-subject rights handling, IRB/ethics review); and (5) safeguards for cross-border transfers, including emerging national-security limits on bulk health datasets. In South Korea, PIPA treats health information as sensitive; pseudonymized data may be used without consent for statistics, scientific research, or archiving under defined safeguards, while cross-controller combinations are confined to designated institutions and API-based sharing is preferred. In the U.S., HIPAA governs research uses by covered entities (authorization or IRB waiver), while non-HIPAA actors face FTC oversight; scraping of publicly accessible pages may avoid CFAA liability but still implicates DMCA and contract/tort claims. In the EU, GDPR requires both an Article 6 basis and an Article 9 condition, with Article 14 transparency even for indirectly collected data; database rights and text-and-data-mining (TDM) rules shape permissible extraction, and the EHDS will expand controlled research access via secure environments. Together, these regimes point to a risk-managed pathway for research that centers lawful sourcing, technical safeguards, and accountable governance.
Abstract
The rapid spread of tools such as web scraping and automated macros has made it technically easy—but legally complex—to collect large volumes of health-related data from websites and online services. This review compares the principal frameworks in South Korea, the United States, and the European Union to identify conditions for lawful and ethical research use. Baseline privacy statutes (Korea’s PIPA, U.S. HIPAA, EU GDPR), sectoral instruments, and enforcement trends reveal convergent requirements: (1) robust de-identification or pseudonymization; (2) a valid legal basis (explicit consent or statutory alternatives for scientific research in the public interest); (3) strict respect for access controls and anti-circumvention rules (no bypassing logins, CAPTCHAs, paywalls, or technical protection measures); (4) transparency and independent oversight (e.g., notices, data-subject rights handling, IRB/ethics review); and (5) safeguards for cross-border transfers, including emerging national-security limits on bulk health datasets. In South Korea, PIPA treats health information as sensitive; pseudonymized data may be used without consent for statistics, scientific research, or archiving under defined safeguards, while cross-controller combinations are confined to designated institutions and API-based sharing is preferred. In the U.S., HIPAA governs research uses by covered entities (authorization or IRB waiver), while non-HIPAA actors face FTC oversight; scraping of publicly accessible pages may avoid CFAA liability but still implicates DMCA and contract/tort claims. In the EU, GDPR requires both an Article 6 basis and an Article 9 condition, with Article 14 transparency even for indirectly collected data; database rights and text-and-data-mining (TDM) rules shape permissible extraction, and the EHDS will expand controlled research access via secure environments. Together, these regimes point to a risk-managed pathway for research that centers lawful sourcing, technical safeguards, and accountable governance.
- 발행기관:
- 건강보험심사평가원
- 분류:
- 의료/복지/사회정책