We’re hiring! If you’re interested in joining our team, check out the positions in the “Job Openings” section below or consult our careers page.
A Language Model Trained to Mimic 4chan Might Portend AI’s Grim Future: A machine learning researcher trained a language model on three and half years’ worth of 4chan posts to create what he dubbed “the most horrible model on the Internet,” raising concerns about the public availability of language models and sparking debate about their ethical use. Yannic Kilcher, a Swiss ML expert who covers AI and ML advances on his popular YouTube channel, fine-tuned an existing open-source language model — EleutherAI’s GPT-J-6B — using a dataset of more than 130 million posts from 4chan’s “Politically Incorrect” board, an online forum with a longstanding reputation for toxicity and offensiveness. As Kilcher described in a video documenting the process, he then programmed a team of bots to post on the board as often as they could. According to Kilcher, the bots posted approximately 30,000 times during two separate 24-hour periods. While 4chan users were able to identify some of the bots for what they were, this appeared to be due less to the model’s shortcomings and more to the bots’ superhuman indefatigability — they posted round-the-clock, as frequently as the site allowed. Kilcher’s experiment was criticized by a number of experts and observers, who called it irresponsible and unethical. While Kilcher made it possible for anyone to use his “GPT-4chan” by uploading it to Hugging Face, an online repository for AI and ML code, the site quickly restricted access. But the cat could be out of the bag: as Kilcher’s experiment shows, currently available open-source models and datasets can be used to create surprisingly effective language models with relative ease.
Pumping the Brakes on Sentient Chatbots and AI’s Secret Language: Is Google’s conversational AI, LaMDA, sentient? Does OpenAI’s DALL-E 2 have its own language? Both claims went viral recently, but there is reason to doubt each:
In a viral Twitter thread, Giannis Daras, a computer science PhD student at UT Austin, argued that DALL-E 2, OpenAI’s new text-to-image generator, has “a secret language.” It’s no secret that DALL-E 2 struggles with generating coherent images of text, but Daras said that the seeming gibberish is actually part of a “DALLE-2 language.” Re-inputting DALLE-2-generated text — “Vicootes,” for example — will generate images of whatever the word means in DALLE-2’s “language” — in this case, vegetables. Daras’s findings were also published in an arXiv paper. But many observers were skeptical. As Benjamin Hilton pointed out, it was difficult to recreate the effect with many DALLE-2-generated words or phrases, and the tool frequently produced seemingly unrelated images in response.
NAIRR Task Force Releases Report: In its interim report released late last month, the National AI Research Resource task force described its initial findings and recommendations, which included:
The NAIRR should be set up to meet four primary goals: “Spur Innovation,” “Increase Diversity of Talent,” “Improve Capacity,” and “Advance Trustworthy AI.”
The NAIRR should operate as a “federated cyberinfrastructure ecosystem run by a single management entity, with governance and external advisory bodies.” The report warns that assigning a single agency to manage the NAIRR could risk “narrowing its focus to that agency’s specific mission” and recommends the resource be run by a non-governmental entity with appropriate federal oversight.
There should be an “integrated access portal through which all resources are made available.”
The NAIRR should be accessible to “researchers and students from diverse backgrounds,” including both expert-level AI researchers and students beginning to experiment with AI development.
The NAIRR’s managers should explore making statistical, administrative, and federally funded research data available to researchers.
The NAIRR should offer a “federated mix of on-premise and commercial computational resources, including conventional servers, computing clusters, HPC, and cloud computing” to researchers.
DOD Planning Autonomous Weapons Policy Update: The Pentagon will be updating its guidance on autonomous weapons systems by the end of the year, Emerging Capabilities Policy Director Michael Horowitz told Breaking Defense. The current guidance — DOD directive 3000.9 — was signed nearly a decade ago, in November 2012. While both AI research and the DOD’s AI-development efforts have advanced dramatically in that time, Breaking Defense noted that the update is coming as part of a DOD-standard 10-year update process, not in response to any specific technological advances. Horowitz didn’t disclose any planned changes and expressed broad satisfaction with the “very responsible approach” of the current guidance, but said that there could be “some updates and clarifications that would be helpful.” One such update could be a clearer distinction between AI-enabled and autonomous weapons systems — which Horowitz noted are not the same. The original guidance made no mention of AI, but with Horowitz’s office getting input from many of the DOD’s new emerging tech offices, including the office of the CDAO, that omission seems unlikely in the new guidance.
In Translation CSET’s translations of significant foreign language documents on AI
PRC Think Tank White Paper:Artificial Intelligence White Paper (2022). This white paper from a Chinese think tank describes the state of AI in China and the world. It divides its focus among AI innovation and breakthroughs, engineering and other practical uses of AI, and AI governance initiatives in the areas of trustworthiness and safety.
PRC Education Budget:Ministry of Education 2022 Budget. This document is the 2022 budget for China’s Ministry of Education. The ministry devotes the vast majority of its budget to fully funding 75 of China’s top universities, including all scientific research undertaken by them. This year, the ministry is also funding the launch of a long-term project to incorporate more Xi Jinping-related content into mandatory Marxist ideology courses for Chinese college students.
PRC Public Security Budget:Ministry of Public Security 2022 Budget. This document is the 2022 budget for the PRC Ministry of Public Security, which is responsible for Chinese police departments, border security, counterterrorism, counter-narcotics, top leaders’ security details, maintaining social order, and monitoring the Chinese internet for dissent. In 2022, the ministry is funding projects to renovate police academy campus facilities, train air marshals, and enforce the ban on fishing on the Yangtze River, among others.
If you have a foreign-language document related to security and emerging technologies that you’d like translated into English, CSET may be able to help! Click here for details.
We’re hiring! Please apply or share the roles below with candidates in your network:
Executive Coordinator: The Executive Coordinator will provide critical executive, logistical, and project management support to the CSET Operations and Leadership Teams with limited supervision and high levels of autonomy. Apply by July 1.
Research Fellow — Standards and Testing: The Research Fellow will focus on standards, testing, evaluation, safety and national security issues associated with AI systems. To do this, they will examine how the limitations, risk, and society and security impacts of AI can be understood and managed. Apply by July 15.
Research Fellow — AI Applications: The Research Fellow will focus on helping decision makers evaluate and translate new and emerging technologies, particularly in the field of AI, into novel capabilities by separating real trends and strategic opportunities from technological hope and hype. Apply by July 15.
UI/UX Designer: The UI/UX Designer will perform user interviews, write user stories, create user interface mockups, and conduct usability testing for public-facing support tools. Rolling application — apply today.
Please bookmark our careers page to stay up to date on all active job postings. You can also subscribe to receive job announcements by updating your subscription preferences in the footer of this email.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.