CSET

Building a Data Collection Pipeline: Insights from a CSET Internship

Jordan Monts

September 12, 2024

This blog post recounts the development of a Python-based data collection pipeline project completed in the summer of 2024 by CSET inaugural intern Jordan Monts. During his project research and work, he used the Requests and BeautifulSoup libraries to create a two-part system to gather and process web data, support ongoing research initiatives, and strengthen his skills in data processing as well as application programming interface (API) management.

Related Content

Making sense of the often overwhelming world of emerging tech with data-driven tools and resources. Read More

Data Visualization

ETO PARAT

June 2024

PARAT (Private-sector AI-Related Activity Tracker) is ETO's hub for data about private-sector companies and their AI activities. PARAT's easy-to-use interface brings together data on companies' AI research publications, patents, and hiring, enabling customizable, data-driven comparison… Read More

Data Visualization

ETO Scout

September 2023

Scout is ETO's discovery tool for Chinese-language writing on science and technology. Scout compiles, tags, and summarizes news and commentary from selected Chinese sources, helping English-speaking users easily keep up to date, skim the latest… Read More

Data Visualization

ETO Research Almanac

May 2023

ETO’s Research Almanac provides high-level data on key trends in emerging technology research, including overall research output, growth, and trends among countries, research institutions, and companies active in R&D. This initial version of the Almanac… Read More