The emergence of Generative AI (Gen AI) on the world stage represents much more than a divisive challenge to human dominance in the intelligence stakes. Its computational and evolving power poses serious questions for our species in whether and how we will harness its capabilities for good or bad. Now is not the time to be messing around. Humans need to be firmly in the equation to demonstrate who’s in charge - or pay the price!
To some this may come across as hyperbole. No need to descend into a tailspin of paranoia. But this misunderstands the relationship between the risks associated with Gen AI and its potential rewards. Its appeal lies in its ubiquitous application. Yet in this very universality lies its potential to do serious harm to our species on a collective level if it is not properly aligned with human interests. As such, we are confronted with the challenge to manage both the promise and the peril of this technology, whether we like it or not.
So, with this fundamental understanding in place, let’s dive into the precise reasons why human-in-the-loop interaction is essential to regulate Gen AI both in its use and its output. We’ll focus primarily on the demands faced by organizations, who have additional responsibilities to their immediate and wider stakeholders, but we’ll also scale out to issues that extend and penetrate further afield.
For this purpose, I have settled upon a handy mnemonic known as QUAIL: Quality, Underwriting, Accuracy, Intention and Learning. Although certainly they are intertwined, let's review these ideas in the context of a human-first approach:
Quality
At first instance, Quality may seem an obvious concept, but this is far from the case. It is difficult to pinpoint. There’s a lot of “I know it when I see/hear/feel/taste it” with Quality and we seem intuitively to know when it’s not in the offing. Given the familiar and indefinably robotic language used by Gen AI in its output (go figure) why would you rely on it without supervision for internal communications or, even worse, to represent you in the market? Of course you wouldn’t. Not unless you wish to tarnish your brand.
But, of course, poor Quality can also descend further into legal infringement, with possible violations including, but not limited to, privacy breaches, intellectual property concerns and biased decision-making. These are all serious issues that need to be managed and mitigated. They also call for a contextual understanding, that invariably requires good judgment and, very often, a collaborative approach. Algorithmic assessments can assist in improving Quality at the model level, but they are not a substitute for consistent output review. Reputational damage and other problems can ensue if this step is overlooked.
There is also the fundamental issue of ensuring that whatever leaves your hands properly reflects your corporate culture. This is a mark of your specific Quality. Given the prodigious output of Gen AI, soon we will all be drowning in much more content than we would like. If you think it’s noisy now....! Only those who focus on Quality over Quantity can reasonably hope to stand out from the crowd.
Creativity is essential to this process. It comes in many forms and flavors but, if required, how will you approach Gen AI to use imagination beyond the purely rational? How will you use it to ensure that emotion and aesthetic sensibility infuse your content? Gen AI can be an amazingly powerful tool, but it should only be in service of the tone of your message. It should not dictate Quality.
Underwriting
This leads us to the idea of Underwriting. Of course, there is an automatic association with insurance here. Indeed, in the context of unsupervised output from Gen AI and lawsuits emanating as a result, the insurance industry is destined to ask questions about governance procedures and to increase premiums accordingly. Soon, the expectation will shift to routine prevention mechanisms.
But there is also a wider definition of Underwriting at work here - that of accountability and ownership. Who is taking responsibility for the decisions being taken based upon content and recommendations from Gen AI? To underwrite output, human eyes and minds need to be integrated seamlessly into the process.
Increasingly, consumers are demanding to see badges of good AI governance by business, and we are starting to see the emergence of Gen AI Explainability Statements on deployer websites in the market also. Indeed, many governance frameworks and AI laws around the world call for explainability from AI systems themselves. Monitoring and audit trails demonstrating Gen AI decision-making with human input will move from recommended to required.
With the proliferation of open-source Gen AI models gaining a head of steam, it will also become increasingly important to ensure that verification procedures are in place in order to minimize any data breach potential. Basic misuse of these systems without data anonymization procedures is an ever-present risk with all Gen AI models. However, certain open-source models may also be more vulnerable to attacks from bad actors. In addition to market and ethical risks, each of the fifty states in the U.S. has data breach notification laws and onerous requirements. With Gen AI, the stakes have been raised overnight.
Accuracy
Let’s be honest. We all make mistakes, but that is what a review process is for.
Catch errors before they do any harm. This supports both Quality and Underwriting.
There are various processes and techniques at the model level of Gen AI including fine-tuning with Reinforcement Learning Human Feedback (RLHF), Retrieval Augmented Generation (RAG) and others, the cumulative effect of which is designed to ensure output is both appropriate and accurate.
Despite these techniques, errors inevitably creep in. If not corrected, these errors can then have a compounding effect, for example with the unchecked adoption of harmful or malicious computer code. Indeed, in analysis projects, incorrect baseline assumptions may also send them off in misleading directions. This issue can be particularly pronounced with research initiatives. Layered errors are notoriously difficult to fix easily since they often build on predicate assumptions. They need to be caught early in the review process.
There is also the issue of algorithmic hallucinations to contend with. Once again, these fabrications of knowledge may be minimized with robust testing protocols but, ultimately, there can be no replacement for human eyes. Gen AI does not deal with facts. There have even been cases, though rare, of deliberate “lying” by Gen AI. It must be managed to align it with human concerns.
When things do go wrong, even with all the procedures and human intervention in place, the review component takes on an additional significance with regards to post-incident transparency and explainability. Regulatory authorities are increasingly demanding that these parameters be met with incident response. In the context of black box Large Language Models, it is particularly important that human input and review procedures are properly documented with a retrievable audit trail. Process is key.
Accuracy is of paramount importance in certain areas. No-one would argue that marketing and medical diagnosis are of equivalent weight. Given its extensive computational ability and expanded working memory, Gen AI can be particularly useful in the healthcare arena, when subject to review. It is important, once again, to ensure that we do not suffer the consequences of neglect.
In the U.S. NIST Risk Management Framework for AI Systems recommended for all organizations, one of the principles enunciated isTEVV - Test, Evaluation, Validation and Verification. This standard of model assessment is equally applicable to the ongoing process of work project management. It makes sense to have a level of confidence in Gen AI when it has been optimized for performance and has been previously tested - but we should always verify. Not simply at initial assessment, but also in the ongoing process of work operations. Again, humans must be in the loop.
A further dimension to this problem is that there is a trade-off in Gen AI systems between Accuracy and the explainability of the output from Gen AI. Due to the sheer quantity of training data and the volume of parameters governing their operation, the more accurate models are the less explainable their decision-making may be. The value of post-output cross-checking speaks for itself.
Intention
Language is far from straightforward, and the concept of meaning is trickier still. Meaning is underpinned by Intention - and vice versa. Yet Gen AI models work based on nondeterministic algorithms. As such, it is vital to ensure that your unique Intention for a project is injected into the work product of any model, no matter how reliable it may seem to be.
Is the language of the output precisely what you are after? Who is the intended audience? How will you guard against misunderstanding? For example, my mind always strays to one telling illustration, namely the difference between using the expression “ruleoflaw” as opposed to “rulebylaw”. A world of difference in one preposition. The point is that alignment withyourIntention is critical. Failure to exercise oversight here can only store up problems for the future.
In fact, the story starts earlier in the interaction process with Gen AI. The 17th Century, Enlightenment writer and philosopher, Voltaire, famously advocated, “Judge a man by his questions rather than his answers”, and a few centuries later this timeless idea is particularly relevant in the context of Gen AI. The Quality we can expect in output is intimately tied to the nature and methodology of the prompt requests we use, i.e. the questions we ask. Structuring questions clearly and precisely is imperative. Garbage in, garbage out, as they say. Refer to our piece on prompting techniques INSERT CRAFTS LINK for more on this.
There is also a wider point at work here. Given Gen AI’s incredible power in computation, context memory and synthesis, no one individual human can possibly compete with an LLM in this regard (quite apart from the fact that humans require sleep). Moreover, autonomous AI agents are an increasing phenomenon. Writ large, one of the major societal challenges presented by Gen AI is for us to ensure that humans have full control over both questions and agenda.
This is not only to preserve and augment our own powers of imagination and discovery, but also to ensure that personal information and sensitive data are handled with care. This requires organization-wide understanding with regular training. We cannot maintain alignment with our Intention by handing over the reins of power to detached convenience. Being lazy will bite us hard in the proverbial.
Learning
This brings us to the last letter in our handy QUAIL mnemonic, denoting Learning. With so much data now at our cyber fingertips, the consensus is shifting to how effectively this data is being used. Simply having (even properly secured) data is not enough to create an advantage for an organization. Data needs to be leveraged effectively.
This is not just from a market standpoint, but also in how information is deployed to maintain individual and institutional Learning. Gen AI is not yet another nifty tech gizmo. It is a complete game-changer with significant capacity to “think” with limited interaction from us. This means that it also carries the risk of depleting human knowledge and collective morale if it is not properly harnessed. Much of the personal and collective Learning within an organization (and society itself) will be lost without continued human engagement in the process. Ask yourself: how will this impact employee retention and knowledge within your organization? How will your organization meet this challenge?
In this vein, human review of Gen AI output within business will need to become standard practice, both to capitalize on new insights for competitive advantage, and to curb the harms that will be amplified if Gen AI is left unchecked. Properly thought through interaction with this technology offers the possibility of great advances. For these to be achieved, however, it is essential that the multi-disciplinary and synthesized Learning is taken back into the organization in an ever-evolving loop of development.
So, there we have it. QUAIL puts forward the case for human-first interaction with Gen AI. With that, I’ll leave you with another quote from Voltaire,
About R. Scott Jones
I am a Partner in Generative Consulting, an attorney and CEO of Veritai. I am a frequent writer on matters relating to Generative AI and its successful deployment, both from a user perspective and that of the wider community.