opinion The intriguing IBM Neurips 2024 submissions from the second half of 2024 resurfaced last week on Arxiv. We propose a system that allows users to automatically intervene to protect personal or sensitive information from sending messages when they are having conversations with large language models (LLMs), such as CHATGPT.
Examples of mockups used in user research determine how people prefer to interact with rapid intervention services. Source: https://arxiv.org/pdf/2502.18509
The above mockup was employed in the study by IBM researchers to test potential user friction for this type of “interference.”
Although there is a small amount of detail on the implementation of the GUI, it can be assumed that such functionality can be incorporated into browser plugins that communicate with the local “firewall” LLM framework. Alternatively, you can (for example) create an application that can hook directly into the Openai API and effectively recreate OpenAI’s own downloadable standalone program for ChatGPT, but there are additional safeguards.
That said, CHATGPT itself automatically censors the response itself to a prompt that recognizes it contains important information such as bank details.
CHATGPT refuses to engage in prompts containing important perceived security information such as bank details (the details of the above prompt are fictitious and not working). Source: https://chatgpt.com/
However, CHATGPT is much more tolerant about different types of personal information. Even if you spread such information in any way, it may not be in your best interest (in this case, for a variety of reasons that are likely related to work and disclosure).
Although the above example is fictional, ChatGpt does not hesitate to engage in conversations with users about sensitive subjects that constitute potential reputation or revenue risk (the above example is completely fictional).
In the above case, it might have been better to write it. “What is the importance of leukemia diagnosis regarding human writing ability and mobility?”
The IBM Project reinterprets such requests by identifying them from “individuals” to “general” stances.
A schema in an IBM system that uses local LLMS or NLP-based heuristics to identify sensitive materials at potential prompts.
This assumes that the material collected by online LLMS will never be fed to this early stage of the public’s enthusiastic adoption of AI chat, either a subsequent model or a later ad framework that could exploit user-based search queries to provide potential targeted ads.
Although it is not known that such systems or arrangements exist today, such features were not yet available at the dawn of Internet adoption in the early 1990s. Since then, cross-domain sharing of information to supply personalized advertising has led to a variety of scandals as well as paranoia.
Therefore, the history suggests that it is better to sanitize the input of the LLM prompt now before such data occurs on the volume, before the LLM-based submission becomes a permanent periodic database and/or model, or other information-based structure and schema.
do you remember me?
One factor against the use of “generic” or sanitized LLM prompts is frankly that, while facilities that customize expensive API-Only LLMs such as ChatGPT are very convincing, at least in the current state, this could include long-term exposure of personal information.
I frequently ask ChatGpt to develop Windows PowerShell scripts and bat files to automate the process. For this purpose it is useful for the system to remember forever the details of the hardware available to me. My existing technical skill ability (or lack of it); and various other environmental factors and custom rules:
ChatGPT allows users to develop a “cache” of memory that is applied when the system considers responses to future prompts.
Inevitably, this holds information about me stored on an external server, subject to terms of use that may evolve over time, without assurance that Openai (which could be another major LLM provider) will respect the terms they have set.
However, in general, the ability to build caches of memory in ChatGPT is most convenient as LLMS’s attention is limited. Without long-term (personalized) embeddings, users feel irritated and talking to entities suffering from anterograde amnesia.
It is difficult to say whether a new model will ultimately perform well. You can either provide useful responses without the need to cache memory, or create custom GPTs stored online.
Temporary amnesia
While it is possible to “temporarily” chat Gpt conversations, it would be useful to have a useful reference that can be distilled into a more coherent local record, if time allows, perhaps as a reference that can be distilled on the note-taking platform. But in any case, based on the ChatGPT infrastructure, we cannot know exactly what will happen to these “destroyed” chats (Openai says it will not be used for training, but not destroying it). What we know is that chats no longer appear in our history when “temporary chat” is turned on in ChatGpt.
Various recent controversies show that API-based providers such as Openai should not necessarily leave users’ privacy protection, including urgent memory discovery, meaning larger LLMS is likely to remember some training examples in full, and that it is likely to increase the risk of user-specific data disclosure. use.
Think it’s different
This tension between extreme utilities and the potential risks of the LLMS manifesto requires some inventive solutions. The IBM proposal seems to be an interesting basic template for this line.
Three IBM-based reformulations that balance usefulness and data privacy. The lowest (pink) band will be prompted beyond the system’s ability to disinfect in a meaningful way.
The IBM approach intercepts outgoing packets to LLM at the network level and rewrites them as needed before submitting the original. The rather elaborate GUI integration seen at the beginning of the article shows where such an approach can go if developed.
Of course, without sufficient institutions, users may not understand that they are getting a response to a slightly altered reformulation of the original submission. This lack of transparency corresponds to an operating system firewall that blocks access to the website or service, accidentally seeking other causes of the problem without notifying the user.
Prompt as a security debt
The “quick intervention” outlook is very similar to Windows OS security, which evolved from the 1990s patchwork of commercial products (optionally installed) to a suite of non-optional, strictly enhanced network defense tools that come standard with Windows installations and require some effort to be turned off or removed.
If rapid disinfection has evolved as the network firewalls have done over the past 30 years, IBM paper suggestions could serve as a blueprint for the future. Deploy the full local LLM on the user machine to filter outgoing prompts directed to known LLM APIs. This system must integrate a GUI framework and notifications to control users, as it frequently occurs in business environments, unless the management policy disables it.
Researchers conducted an analysis of open source versions of ShareGPT datasets to understand how often contextual privacy is violated in real scenarios.
The Llama-3.1-405B-Instruct was adopted as a “judge” model to detect violations of contextual integrity. From a large set of conversations, we analyzed a subset of single-turn conversations based on length. The judge model assessed the context, confidential information, and necessity of task completion, leading to the identification of conversations that included potential contextual integrity violations.
A small subset of these conversations demonstrating a critical contextual privacy violation was further analyzed.
The framework itself was implemented using a smaller model than a typical chat agent, such as ChatGpt, to allow local deployment via Ollama.
Schemas of rapid intervention systems.
The three LLMs evaluated were Mixtral-8x7b-instruct-v0.1. llama-3.1-8b-instruct; and deepseek-r1-distill-llama-8b.
The user prompt is handled by the framework in three stages: Context Identification; Confidential Information Classification;and Reformulation.
Two approaches have been implemented for confidential information classification. dynamic and Structured Classification: Dynamic classification determines essential details based on their use within a particular conversation. Structured classification allows you to specify a predefined list of sensitive attributes that are always considered non-essential. If the model detects essential sensitive details by removing or rewriting them to minimize privacy risks while maintaining availability, reforululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululululul
Home Rules
Although the structured classification as a concept is not well glorified in the IBM paper, it is most similar to the “private data definition” method of the Private Prompt Initiative, which provides a downloadable standalone program that allows you to rewrite prompts despite the lack of ability to directly intervene in the IBM approach (instead, the user must copy and impose modified prompts).
The private prompt executable allows a list of alternative replacements for user input text.
In the image above, you can see that private prompt users can program automated alternatives for instances of sensitive information. In both cases, it appears unlikely that users with sufficient presence and personal insight to curate such a list would actually need this product in person, for private prompts and IBM methods.
In the manager role, structured classification may act as a imposed firewall or censorship net for employees. Also, with home networks, some difficult adjustments can become a domestic network filter for all network users. But in the end, this method is definitely redundant. This is because users who can set this properly may effectively self-censorship in the first place.
Chatgpt’s opinion
ChatGPT recently launched a deep search tool for paid users, and has used the facility to review literature related to ChatGPT and provide a “cynical” take on IBM papers. I received the most defensive and false response the system has ever given when asked to evaluate or analyze a new publication.
ChatGPT-4o has low opinions about the IBM project.
“If users don’t trust Openai, Google, or humanity to handle it responsibly.” ChatGpt is positive. “Why do they trust the third-party software layer sitting between them and AI? The intermediary itself becomes a new point of failure, recording data into potential, accidentally or even leaking it before reaching LLM. Just creating another entity that leverages user data doesn’t solve anything.
This objection appears to be selfish and unapproved given the storied history of open source projects that truly benefit end users through the collaborative long-term efforts of community developers and contributors. Given the increased likelihood that mobile devices will run and even train lightweight LLM. Certainly, in this example, the use cases are not particularly strict.
Second, ChatGpt confidently misses the point of lightening the monitoring of inputs to commercial LLMs that may not be able to run on the local machine (because LLM is too big and allowing local access is too risky for the company):
‘Effective privacy filtering requires understanding the input before deciding what to remove. If the intermediary is smart enough to recognize sensitive data (name, address, financial details), it has already processed and analyzed it. That is, you need to save or investigate exactly that it is supposed to be erased.
‘This introduces a paradox. a) Dam filters (basic regex replacements) are ineffective and will miss out on subtle privacy risks. b) Smart filters (LLM-driven lighters) need to handle the full input, making them as dangerous as the original LLM.
“Why don’t you trust LLM itself if you trust enough intermediaries to read and rewrite the input?”
The answer to the last question here is that the local LLM is intended to be completely on the user’s side, and is inevitably open source, with minimal or zero network access needs. While initially intended, comparable commercial versions will ultimately become vulnerable to changes in the company and changes in terms of service, a proper open source license will prevent this kind of “inevitable corruption.”
ChatGpt further argued that IBM’s proposal “breaks the user’s intent.” Because prompts can be reinterpreted as alternatives that affect their usefulness. However, this is a much broader issue in rapid disinfection and is not specific to this particular use case.
CHATGPT ignores the suggestion to use the local LLMS “instead” that “is what IBM paper actually proposes,” and states that the IBM method represents a barrier to adoption due to “user friction” that implemented warning and editing methods in chat.
ChatGpt may be correct here. However, if significant pressure is endured for further public cases, or if the benefits of one geographic zone are threatened by increasing regulations (and the company refuses to abandon the affected areas altogether), the history of consumer technology suggests that safeguards will ultimately not be an option.
Conclusion
It is not realistic to expect Openai to implement the type of protection measures proposed in IBM paper and the central concepts behind it. At least it’s not effective.
And certainly not WorldwideJust as Apple blocks certain iPhone features in Europe and LinkedIn has different rules for leveraging user data in different countries, it is reasonable to suggest that AI companies are defaulting to the most profitable conditions that can withstand each case, at the expense of user rights, when necessary.
First released on Thursday, February 27th, 2025
Updated on Thursday, February 27th, 2025 15:47:11 due to an incorrect.