What is the Cyber Jobs dataset?
In the past year, several reports have emphasized a persistent shortage of cybersecurity professionals around the world (ISC2, Cyberseek). The International Information System Security Certification Consortium (ISC2), a leading professional organization of cybersecurity workers, estimated that there are up to 700,000 unfulfilled cybersecurity roles in the United States. Many organizations, such as the Cybersecurity and Infrastructure Security Agency (CISA), predicted that part of the issue does not lie in a lack of talent but in a lack of credentials. Namely, traditional recruitment may exclude large pools of potential candidates who do not have the narrow educational and technical credentials specified in a job posting. Nevertheless, many candidates are still equipped with the necessary skills and experiences that could transfer into a cybersecurity career. As a result, the issue can be reframed as one of identification, rather than a deficit, of talent.
Previous research and data sources have approached this issue by studying the demand for workers. However, little data is available on the supply of workers themselves. CSET’s cybersecurity jobs data is one of the first resources that provides detailed and comprehensive information on the supply side of the cybersecurity workforce. Our work is attuned to the growing number of initiatives focused on leveraging non-traditional talent for both cybersecurity and STEM fields at large, such as the Biden administration’s National Cyber Workforce and Education Strategy (NCWES), the National Initiative for Cybersecurity Education (NICE), and a National Science Foundation (NSF) pilot program on boosting entrepreneurship in underrepresented communities.
We created our dataset by running a machine learning classifier over an existing Revelio Labs dataset of 1.3 billion job roles self-reported across 513 million LinkedIn user profiles (Revelio Labs). To train the model, members of our project team hand-labeled a portion of the jobs as a cybersecurity position or otherwise. Additional training data were labeled using Snorkel Flow, a platform that programmatically generates labels based on heuristics provided by the project team.
We determined whether a position qualifies as a cybersecurity role based on the National Initiative for Cybersecurity Education (NICE) Cybersecurity Workforce Framework. This standardized lexicon of cybersecurity positions consists of 52 distinct work roles, each detailed with the relevant knowledge, skills, and abilities (KSAs) required to succeed in the role. The framework is inclusive of information security analysts, information technology workers, and systems development professionals. Notably, this broadens the Bureau of Labor Statistics’ conception of cybersecurity jobs as strictly information security analysts, of which the BLS estimates the United States to have 163,690 workers in 2023. Our application of this wider definition will be useful for examining workers’ diverse backgrounds and trajectories as they move between cybersecurity roles.
Our model produced a dataset of ~1.3 billion jobs labeled as either a cybersecurity role or otherwise.1 Out of the 1,398,946,692 classified positions, 19,623,302 positions (1.4% of all classified positions) worldwide were found to be cybersecurity roles. This number includes past and current positions, as users often have their entire employment history listed on their LinkedIn profiles. When we adjusted for current U.S. positions only, we found 1,456,764 current cybersecurity positions in the U.S. workforce and 1,377,816 workers. The discrepancy between the number of current workers and positions may be due to some workers listing more than one cyber position without an end date on their profile. Figure 1 is an orientation of our data that visualizes these numbers.
Figure 1: A Representation of Our Data
Source: CSET.
Our results are consistent with past estimates of the current number of professionals in the U.S. cyber workforce by ISC2 (2.26% difference) and Cyberseek (10.45% difference), a cybersecurity jobs data collection program by the National Institute of Standards and Technology (NIST).2. Still, several constraints are inherent to using self-reported LinkedIn data. First, the population represented on the site may be limited by selection bias. A variety of personal and professional reasons may deter individuals from using LinkedIn, underrepresenting certain demographics in our data. For instance, professionals in sensitive national security positions may have a more limited digital footprint. Other professionals may be on the site but update their profiles infrequently or have private profiles. As a result, our dataset contains a subset of incomplete profiles. Figure 2 illustrates the percentage of missing values for some of the variables in our dataset. Note that 2,851,121 cybersecurity workers in our final dataset have no education information reported at all.
What can we do with the Cyber Jobs data?
Albeit these caveats, a wide variety of information on cybersecurity workers is readily available within the dataset. Factors we can consider are employment history, skills, certifications, company, location, job title, educational background, and time of employment. These metrics are self-reported by the user and standardized by Revelio Labs. We additionally have information on the salary and seniority associated with each position, both predicted by Revelio Labs’ proprietary models trained on prior data sources. However, without a way to properly validate these results, we chose to omit any variables that were not strictly user-inputted data from our analysis.
Figure 2: A Representation of the Missing Values in our Dataset
Source: CSET.
In the coming weeks, we hope to illuminate the current landscape of the U.S. cyber workforce through several avenues of analysis. These first forays with our dataset will lay the foundation for larger projects on strengthening the cybersecurity talent ecosystem in the United States. Our next data snapshots in this series will highlight our efforts and discuss the following topics:
- NCAE-C Graduates: In May 2023, CSET found that schools in the National Centers of Academic Excellence in Cybersecurity (NCAE-C) consortium graduated almost 50% of all bachelor’s degrees in cybersecurity-specific disciplines in 2020 (CSET), yet only accounted for 40% of bachelor’s degree graduates in all disciplines. Does the disproportionate amount of NCAE-C-affiliated cyber graduates translate into the same degree of representation in the cybersecurity workforce? If so, are cyber workers with NCAE-C school degrees obtaining jobs at higher rates? At what rates are they pursuing cyber careers and what roles are they landing compared to non-NCAE-C graduates?
- PhD Cyber Workers: How are cyber workers with PhDs performing in the cyber workforce? What career trajectories are they following and what roles are they landing?
- Community College versus Bachelor’s Degree Cyber Workers: How are community college degree holders performing in the cyber jobs market? How are their job outcomes comparable (or not) to bachelor’s degree holders?
- Geographical Hubs: Where are cyber workers located in the United States? Are there any previously unidentified tech hubs in smaller metropolitan areas? How has talent flowed and migrated geographically over time?
- Online Schools: How do people in the cyber jobs landscape utilize online programs? Have online degrees provided a point of entry into the cyber workforce? Do many seasoned professionals seek additional online education?
- Apprenticeships: From July to November 2022, the Biden administration conducted a cybersecurity apprenticeship sprint, creating 194 new apprenticeship programs and hiring 7,000 new apprentices (White House). A future blog will explore the demographics, locations, skills, and job outcomes of cybersecurity apprentices, both from the sprint and beyond.
As of February 6, 2024, the last cyber jobs data update was on January 18, 2024, and the last Revelio data update was on January 17, 2024.
- To measure the performance of our model, we calculated its precision and recall. Our precision score was 0.79, indicating that 79% of the positions that the model labeled as “cybersecurity” were true cybersecurity positions. Our recall score was 0.84, indicating that the model classified 84% of true cybersecurity positions correctly.
- These calculations are based on the October 2023 update of the cyber jobs dataset, the 2023 ISC2 Workforce study, and the 2023 Cyberseek estimate. They do not reflect estimates in the current year.