Evidence that AI Expert Panels Could Soon Replace Human Panelists, or is this just an Art Deco Hallucination? – Part One of Two

May 13, 2024

Say goodbye to the traditional panel of experts, the ones routinely featured at educational events and conferences worldwide. Say hello to new panels of AI experts: panels in your pocket that you can call upon anytime, anywhere. New advances in ChatGPT make this possible. The days of human panels of experts on stage lecturing down to attendees may soon be over. Conferences may never be the same. Expert services may be transformed too.

This article provides demonstrative evidence to support these predictions and does so using every AI panel’s current favorite topic, hallucinations. Yes, it’s a twofer, on both expert panels and hallucinations, and, this is part one of two. Illustrations are done in various Art Deco styles using Visual Muse: illustrating concepts with style.

Introduction

The concept illustrations in this article were created in various Art Deco styles using a well-worn custom GPT of mine, Visual Muse: illustrating concepts with style.

The support for the claims about AI possibly replacing human expert panels comes from another little AI program of mine: Panel of AI Experts for Lawyers. This is new custom GPT was designed to make ChatGPT4 adopt multiple personas to serve as a panel of experts to help legal professionals solve problems. It can also be used to lecture like traditional expert panels. It will be available soon, free, on the OpenAI store, after I finish testing it. Making these is easy, testing and refinement is the hard part.

Like many AI hobbyists, I enjoy making these GPT tools for fun. Similar free apps like this are found everywhere in ChatGPT worlds. You may think free means not worth your time, not very good. Either that, or a trick to collect and sell your data, or to bait and switch. In the pre-AI world of extreme capitalism that was usually true. Not today. Not here. My custom GPTs have no data collections and no upsells for extra features. I do not see your data, your questions, and frankly, don’t want to. That, you would have to pay me to do! I sell my time, not my little gizmos.

To prove my claim that free little custom GPTs like this may actually upset an industry, I offer in this article demonstrative evidence of what the Panel of AI Experts for Lawyers can do. The proof is an unedited transcript of the software in action. The topic I selected for my panel is one troubling many in both tech industries and the legal profession lately, AI hallucinations.

After you study this, you may agree that the days of human panels of experts on stage are over. Instead, say hello to personal visits by even smarter AI experts: any topic, any time, any where.

Short History of Multiple Personality AI Experts

I will attempt to prove to your satisfaction that my custom GPT, and others that will soon follow, take expert panel discussions to the next level. I have been playing with this project since the days of 3.5, trying to create ChatGPT personas that talk to each other. There were a dozen or so of us early GPT explorers on Discord and DefCon who were enthused with this idea. It was started by others in forums there in early 2023 and was first called the AI Hive Mind. Here are my previous articles for background. Prompting a GPT-4 “Hive Mind” to Dialogue with Itself on the Future of Law, AI and Adjudications (e-Discovery Team, 4/11/23); ChatGTP-4 Prompted To Talk With Itself About “The Singularity” (e-Discovery Team, 4/04/23); The Proposal of Chat-GPT for an “AI Guardian” to Protect Privacy in Legal Cases (e-Discovery Team, 4/015/23). I also refer to this as making the AI adopt multiple personas that can work with each other for various purposes.

At the time, early 2023, I was, as far as I know, the only lawyer working with this technique, but I have no doubt other lawyers are doing so now too. The idea here is to create AI sub-personalities – personas – with different knowledge specialities and other differing characteristics (which is where the secret sauce comes in). Then you prompt the different AI entities to chat amongst themselves. They work through the issue, step by step, and consider various points of view. The GPT hive mind of experts discusses subjects and works out problems that you assign to them.

Basically the current GPT4 level creative expert panel approach makes the ChatGPT create an expert panel where they talk to themselves. Then the multiple personas, the hive mind if you will, comes up with new ideas, insights and answers in multiple steps, all on its own. My GPT, Panel of AI Experts for Lawyers, for instance, is programmed to go through six steps, which I predetermined. (It started with eight steps, but the QC testing suggested it should be reduced to six.) The user sits back and watches the panel. The AI is thinking out loud in a transparent process. You listen to the panel conclusions, and at the end, you can ask questions. Just like the old human panels, except that the AIs have unlimited time and patience.

Evaluation of the Current Level of AI Expert Panels

This expert consult process now works better than it ever has, although in my book it is still only a B+ or A, not an A+. It could be smarter and should be more consistent. Also, a few errors still slip in from time to time. The transparency of the step by step process of the GPT I designed helps mitigate issues with that.

Based upon Sam Altman’s recent interviews, GPT at the next level Five should make it work much better. At this point, a little more than a year after GPT4 was released, Sam now responds to effusive praise of GPT4 by pretending to think very hard and then sadly say that GPT4 “kinda sucks.” Sam Altman: GPT-4 sucks – Lex Fridman Podcast (April 2024). He is so funny. What a great actor.

Sam Altman is very careful in what he says about Open AI’s next generation GPT, but he did dare to promise that overall GPT 5.0 will be much smarter. He claims that there will be big gains in overall intelligence, like the difference between 3.0 and 4.0. That was an incredible leap! So Sam is saying they can do it again. Still, Sam I Am does not promise major new breakthroughs, or AGI, or anything bogus like that. He admits that GPT5 will still make mistakes, still have biases, and worst of all for sanction fearing attorneys, will still sometimes hallucinate. Sam does, however, promise that Open AI teams of experts are working hard on that hallucination problem. Just do not expect to see a fix for that in version 5.0.

On the plus side, Sam slyly smiles when talking about GPT as a creative brainstorming partner, which he says needs a better name. (How about expert panel, Sam?) Altman admits that capacity, whatever you call it, is the GPT4.0 application that he uses the most. Everyone in the ChatGPT user world saw how the multi-personality brainstorming capacity showed emerging new abilities and intelligence in the upgrade from 3.5 to 4.0. Sam’s comments suggest that there is a team of experts at OpenAI now expecting even more emerging abilities in the next upgrade. When asked when 5.0 will be released, Sam refers to the exponential curve of improvement cycles and says it will be here “relatively soon.” Of course, he is well advised not to say exactly when.

In another recent interview Sam Altman made an intriguing statement you should hear. LIVE: Sam Altman, CEO of OpenAI, speaks at the World Government Summit (Reuters, April 2024), video 17:20-1824. I created this image of him for that purpose and put it into the Art Deco theme of this article. I have not altered Sam’s words or voice in any way, so the sync is not perfect.

Sam Altman image generated by Visual Muse in Art Deco style. Words and voice by Sam. Also on YouTube.

Transcript (slightly cleaned) of Sam Altman’s avatar speech.

If you think about everybody on Earth getting the resources of a company having hundreds of thousands of really competent people, and what that would do, if you have an AI programmer, AI lawyer, AI marketer, AI strategist, and not just one of those, but many of each, and you get to decide how to use that to create whatever you want to create. We are all going to get a lot of great stuff. The creative power of humanity with tools like that should be remarkable. That’s what gets us all up every morning.

LIVE: Sam Altman, CEO of OpenAI, speaks at the World Government Summit (Reuters, April 2024), video 17:20-1824.

That would be a great closing line to any expert panel discussion.

It used to be pretty tricky to create and use hive mind type expert panels. But with recent advancements in customs GPTs and knowledge banks, many of us are now starting to make expert panels, and make them work on their own through many steps. With custom GPTs it is now easier to set this in motion and control. It is still better to do this all on your own without automated steps, but that requires a high user-skill level that very few have attained. That is where my forthcoming Panel of AI Experts for Lawyers GPT comes in. It automated most of the process so most anyone can use it.

The custom GPT seems to work best with five panelists, but a few more or less will still work too. Larger panels get cumbersome and ineffective, just like working with human experts. That is true whether you are using an automated process like, Panel of AI Experts for Lawyers, or doing it on you own. So far I have seen some minor errors by my use of panels, but no big fails or hallucinations. I am still taking my GPT Panel of AI Experts through its paces with many tests of minor changes. Let me know if you want to volunteer to be part of the beta testing team. We are close to launch. Other cool things coming soon too.

After these expert consult panels become widespread, and especially after the super IQ of GPT5 hits, I kind of doubt there will be many invitations to human experts to participate in big conference panels. The new AI method shows a better way. Either use the expert panels yourself and skip the conference, or attend for the social fun and networking, and put the AI software bots on stage. The wetware types like me will just add local color. (I personally think some of us humans are still funnier and more entertaining than bots, but for how long?)

All of the words in this article so far have been my own, but the following segment will be a verbatim transcript of my chat with Panel of AI Experts for Lawyers. As you will see, I do not have to say much. All I do is set the topic, pick four experts from those offered (the fifth expert is always the Devils Advocate) and say continue. Everything else is written by Panel of AI Experts for Lawyers. I also ask one question at the end, which is more for my readers enjoyment than anything. So read on to the end for the funny finish, which is my favorite part.

The graphic images of me and my ChatGPT icon are part of the transcript. But all of the other images were added by me to the transcript. They were made by another custom GPT of mine, Visual Muse: illustrating concepts with style, which is already available at OpenAI’s store, again for free. It is one of my most valuable tools and I use it every day. Someday I may offer some training on this tool to show how you can make it really sing.

All of the images for the article were made in new artistic styles that are Art Deco variations. Some are pure Art Deco, but most have elements of other styles added in, such as abstract, photorealistic, surrealistic or psychedelic.

I had to mix in the last style because the topic I asked the panel to discuss was hallucinations. I learned a lot about this serious AI issue from this hive mind expert discussion and hope you do too.


Transcript of a Ralph Losey ‘s Session of May 3, 2023 using his ChatGPT, Panel of AI Experts for Lawyers.

You

I am ready to tell you what I want discussed. Are you ready?

Panel of AI Experts for Lawyers

Absolutely, I’m ready! What would you like to discuss or seek advice on?

You

AI Hallucinations: What are they? How different from errors? What causes them? What can individual users do to control the problem?

Panel of AI Experts for Lawyers

For the topic of AI hallucinations, their causes, differences from other types of errors, and what users can do to manage them, I recommend forming an expert panel to delve into these issues thoroughly. Here are ten experts who can provide a comprehensive understanding of AI-related legal issues and practical solutions:

  1. AI Specialist Attorney – Specializes in AI and its implications in the legal field.
  2. Prompt Engineer – Expert in optimizing interactions with AI systems through effective prompting.
  3. Child Prodigy – Offers a unique, intuitive understanding of AI technology and its future implications.
  4. Lawyer Scientist – Combines expertise in law and AI technology, focusing on the integration of AI in legal practices.
  5. Devil’s Advocate (Mandatory) – Challenges prevailing opinions and encourages exploration of alternative viewpoints.
  6. Ethics and Compliance Expert – Specializes in AI ethics, ensuring AI applications adhere to legal and moral standards.
  7. Data Security Specialist – Focuses on protecting data within AI systems, critical for managing and mitigating AI errors.
  8. Machine Learning Developer – Provides insights on the technical aspects of AI training and operation, directly related to AI hallucinations.
  9. Judicial Technology Advisor – Specializes in how AI is used within judicial systems, including its reliability and potential for error.
  10. Consumer Rights Advocate – Focuses on how AI impacts consumers, including issues of transparency and accountability in AI interactions.

Please select four of these experts to include alongside the Devil’s Advocate for a comprehensive panel discussion on AI hallucinations.

You

experts 1, 2, 3, 8

Panel of AI Experts for Lawyers

Great choices! Your panel for discussing AI hallucinations consists of:

  1. AI Specialist Attorney
  2. Prompt Engineer
  3. Child Prodigy
  4. Machine Learning Developer
  5. Devil’s Advocate

Let’s begin the discussion with the first round of panelist introductions and opening remarks on AI hallucinations, their causes, differences from other errors, and user control measures.

1. AI Specialist Attorney

“Good day, esteemed colleagues and guests. AI hallucinations are distinct from typical errors because they manifest as plausible but entirely fabricated responses by AI systems, which may be indistinguishable from accurate information. Unlike simple errors, hallucinations often involve AI generating false data or narratives based on patterns it has learned, rather than mere calculation mistakes. These are particularly troubling in legal contexts, where the authenticity of information is paramount. We must explore regulatory frameworks that ensure AI reliability and transparency.”

2. Prompt Engineer

“Hello, everyone. From a technical perspective, AI hallucinations often occur due to issues in the training data or the model’s inability to distinguish between correlation and causation. Effective prompt engineering can mitigate such issues by guiding the AI to focus on verified information and disregard unreliable patterns. Users can control hallucinations by refining their prompts and incorporating more context, which helps the AI generate more accurate and grounded responses.”

3. Child Prodigy

“Hi, I think of AI like a very smart parrot—it can seem like it knows a lot, but sometimes it just repeats things without understanding. Hallucinations happen because the AI is trying to make sense of what it knows, but it gets mixed up. It’s like when you hear stories and remember them a bit wrong. To help AIs get better, we need to teach them like kids, giving them the right information and correcting them when they make mistakes.”

4. Machine Learning Developer

“Greetings. AI hallucinations are often a product of the model’s limitations in processing and contextualizing large datasets. During the development phase, ensuring diverse and comprehensive datasets is crucial to minimize these errors. We also need to develop better mechanisms to detect when the AI is likely to ‘hallucinate’ and implement checks that can prompt users to verify potentially dubious information.”

5. Devil’s Advocate

“Let’s stir the pot a bit, shall we? While controlling AI hallucinations is ideal, one might argue that the imperfections of AI reflect the imperfections of human understanding itself. Instead of overly sanitizing responses, perhaps there is value in learning to work with and around these ‘hallucinations’ to enhance our critical thinking skills. Are we too reliant on AI’s accuracy?”


The panelists have introduced their views. We will now move to the second round of discussions where they will expand on their ideas and consider others’ viewpoints.

You

please continue

Panel of AI Experts for Lawyers

Continuing with the second round of panelist discussion on AI hallucinations:

1. AI Specialist Attorney

“The Devil’s Advocate raises an intriguing point about the imperfection of AI mirroring human fallibility. However, in legal practices, the stakes are high—hallucinations can lead to misjudgments and injustice. We must balance our desire for AI assistance with stringent checks that ensure outputs are reliable and verifiable, particularly in sensitive areas such as evidence assessment and legal advice.”

2. Prompt Engineer

“Building on the points made earlier, it’s important to highlight that through careful prompt design, we can steer AI away from potential hallucinations. By specifying sources or limiting the AI’s responses to data-driven outputs, users can significantly reduce the incidence of incorrect information. Tools and techniques I’ve developed can aid legal professionals in crafting prompts that minimize the risk of misleading AI outputs.”

3. Child Prodigy

“Sometimes, the best way to learn is to make mistakes and then fix them. If we know AI can make these weird mistakes—hallucinations—we should use them to teach AI better. Like, if it says something wrong, we could correct it right away and make sure it remembers the right thing next time. This could be a new kind of game where AI learns to be right more often by understanding its mistakes.”

4. Machine Learning Developer

“Enhancing AI models to reduce hallucinations involves more than just expanding datasets; it involves improving how these models process context and ambiguity. Advanced techniques such as few-shot learning and reinforcement learning from human feedback can help AI learn the difference between plausible and implausible constructs, refining its judgment over time.”

5. Devil’s Advocate

“While we focus on mitigating hallucinations, we should also ponder the role of human oversight. Isn’t there a risk of becoming overly dependent on technology, expecting it to be infallible? Perhaps the real solution lies not only in improving AI but also in maintaining a healthy skepticism towards its outputs, using it as a tool rather than a crutch.”


With these expanded discussions, our panelists have begun to explore both technical solutions and philosophical considerations surrounding AI hallucinations. Next, we’ll move to the third round, where they will try to converge on unified recommendations that combine their insights into practical strategies for users and developers alike.

You

ok

Panel of AI Experts for Lawyers

In the third round of our panel discussion, our experts will focus on finding common ground and formulating unified recommendations for managing AI hallucinations, reflecting on each other’s insights to offer a holistic approach.

1. AI Specialist Attorney

“Considering the valuable perspectives shared, I believe our unified recommendation should include the development of standardized testing protocols for AI systems used in legal contexts. These protocols would assess the tendency of AI to hallucinate and ensure its outputs are verified against established legal databases before being employed in any legal decision-making.”

2. Prompt Engineer

“Aligning with the AI Specialist Attorney’s suggestion, I recommend incorporating structured prompt engineering as part of the standard practice for legal AI use. By designing prompts that explicitly require AI to reference and cite sources, we can create a layer of accountability and reduce the occurrence of unverified information.”

3. Child Prodigy

“I think everyone’s saying we need to make sure the AI knows what it’s talking about and that we check it’s right. We should have a way to keep teaching AI with good examples and fixing it when it messes up. Maybe everyone could help make a big book of right answers that AI can learn from.”

4. Machine Learning Developer

“I support the suggestions on standardized testing and structured prompts. Additionally, implementing adaptive learning systems within AI can help. These systems would utilize ongoing feedback from legal professionals to continuously improve accuracy and reduce hallucinations, essentially learning from real-world application to better serve the legal industry.”

5. Devil’s Advocate

“While I champion the call for rigorous standards and continuous learning, let us also advocate for cultivating an informed user base. Legal professionals should be educated not only on using AI but understanding its limitations. This dual approach of improving technology and user literacy will safeguard against the over-reliance on AI.”


These recommendations represent a blend of technical enhancements, regulatory frameworks, and educational initiatives that address both the prevention and management of AI hallucinations. The panel will now consolidate these ideas into a comprehensive solution to your query. Let’s proceed to the solution development.

You

proceed

Panel of AI Experts for Lawyers

Our expert panel has now reached a consensus on a comprehensive solution to manage and mitigate AI hallucinations. Here’s the collaborative response that incorporates insights from each panelist:

Solution for Managing AI Hallucinations:

  1. Standardized Testing Protocols: Develop and implement standardized testing protocols specifically for AI systems employed in the legal sector. These protocols would test for the AI’s accuracy and susceptibility to hallucinations, particularly in how it handles legal data and precedent. The testing should involve scenarios that could lead to AI hallucinations to ensure the system’s reliability.
  2. Structured Prompt Engineering: Legal professionals should be trained in prompt engineering to improve their interaction with AI. By constructing prompts that require the AI to cite sources or restrict its responses to data within a verified database, the risk of hallucinations can be minimized. This structured approach ensures that AI-generated content is both relevant and verifiable.
  3. Continuous Learning System: Integrate adaptive learning mechanisms that allow AI to update and adjust its models based on feedback from real-world applications. This system would involve collecting corrections and inputs from legal experts to refine the AI’s outputs, ensuring that the system evolves and reduces hallucinations over time.
  4. Educational Initiatives for Legal Professionals: Create and promote comprehensive training programs for legal professionals that focus on understanding and effectively using AI. These programs should cover not only the operational aspects but also the limitations of AI, emphasizing the importance of verification and critical evaluation of AI-generated information.
  5. Regulatory Framework: Advocate for and help establish a regulatory framework that governs the use of AI in the legal field. This framework should mandate transparency from AI developers about the capabilities and limitations of their systems and enforce accountability measures for failures that lead to legal inaccuracies.

Example Implementation:

  • AI Testing: A legal AI system could be subjected to a mock trial scenario where it needs to process and interpret several complex legal documents and produce a memo. The output would be evaluated for accuracy, hallucinations, and the ability to cite correct legal precedents.
  • Prompt Training: In a workshop, legal professionals might practice refining their queries to the AI, learning how to frame questions that guide the AI towards reliable data sources and away from speculative responses.
  • Feedback Mechanism: As legal professionals use AI, they could have an option to flag incorrect or hallucinated responses. These instances would be logged and used to fine-tune the AI models periodically.
  • Educational Course: A certification course for legal professionals could be developed to cover AI capabilities, with modules on identifying and correcting hallucinations and best practices for integrating AI tools into their work.
  • Regulatory Participation: Legal experts could participate in panels that collaborate with AI developers and government bodies to draft regulations that ensure AI tools in the legal sector meet stringent accuracy and reliability standards.

By implementing these solutions, the legal profession can harness the benefits of AI while minimizing the risks associated with AI hallucinations. Now, let’s hear a final explanation and summary from our Devil’s Advocate.

You

ok

Panel of AI Experts for Lawyers

As our Devil’s Advocate, I’d like to offer a final word on this intriguing discourse regarding AI hallucinations. While our distinguished panelists have proposed robust measures, it’s crucial to remain vigilant. The proposed standardized tests, structured prompts, continuous learning systems, educational initiatives, and regulatory frameworks are indeed potent strategies. However, they embody an optimistic view that technology and rigorous oversight can completely tame the beast of AI hallucinations. Let us ponder a different perspective.

First, consider the inherent complexity of AI and the fluid nature of legal interpretations. AI, by design, processes vast amounts of data and identifies patterns far beyond human capability. However, this strength is also its Achilles’ heel. The very patterns it perceives may not exist or may be based on biased or incomplete data, leading to hallucinations. Thus, even with the best testing protocols and the most sophisticated prompt engineering, AI might still falter.

Secondly, the continuous adaptation through feedback mechanisms, while impressive, assumes a consistent and error-free input from human users. Yet, human judgment is subjective and variable. What one legal expert deems an error, another might see as a nuanced interpretation. This variability could inadvertently train AI systems in ways that either perpetuate existing biases or create new, unforeseen modes of error.

Furthermore, the emphasis on educating legal professionals about AI’s use and limitations is commendable. However, it might lead to an overreliance on the notion that with enough knowledge, all AI outputs can be effectively managed. This could foster a false sense of security, detracting from the critical need for skepticism and independent verification.

Finally, the push for regulatory frameworks is essential but fraught with its own challenges. Regulation struggles to keep pace with technological advancement. By the time a framework is implemented, the technology it governs may have evolved beyond its original scope. This lag can lead to gaps in oversight and unexpected legal quandaries.

Thus, while we strive to control and utilize AI to its fullest potential, we must also brace for its inherent unpredictability. The path forward should not only involve refining AI but also preparing for the inevitability of its failures. Embrace AI as a tool, not as a panacea, and maintain a judicious balance between reliance and skepticism. This balanced approach will better serve the legal profession and society at large.

Now, I invite any further questions or requests for clarification from the panel or our esteemed audience.

You

Could the AI Specialist Attorney give his own summary of the panel’s work, insights and recommendation. Also, since he is known to also be an expert in Dude speak, could he please use language and analogies from The Dude to make this summary? That would be really far out, man.

Panel of AI Experts for Lawyers

Alright, man, here’s the deal from the AI Specialist Attorney, cruising through this discussion like The Dude saunters through the lanes. So, we’ve been talking about these AI hallucinations, you know, where the AI starts dreaming up stuff that isn’t really there—like seeing a police car when you’re just staring at a parked Volkswagen. It’s all in the AI’s head, man.

First off, these hallucinations are not just your run-of-the-mill mistakes; they’re like those stories Walter tells at the bowling alley—kind of based on reality, but way off into fantasy land. The reason this happens, it’s because the AI is trying to make sense of all the crazy data it’s seen, kind of like trying to find your rug that really ties the room together but ending up with some mismatched throw pillows instead.

Now, here’s what we’re gonna do to keep our AI from going all “nihilist” on us and making stuff up:

  1. Standardized Testing Protocols – That’s like making sure your bowling balls aren’t rigged, man. We need tests to check if the AI is tripping or sticking to the facts.
  2. Structured Prompt Engineering – This is like talking to Donny in a way he understands. We guide the AI with the right questions so it doesn’t go off the rails.
  3. Continuous Learning System – It’s like the AI is practicing its bowling. It keeps learning from its mistakes, getting feedback from players who know their game, so next time it rolls, it rolls more accurately.
  4. Educational Initiatives for Legal Pros – We gotta teach the lawyers to bowl with AI. They need to know how to throw the ball so it doesn’t end up in the gutter. They gotta understand when the AI is pulling a Walter and when it’s being The Dude.
  5. Regulatory Framework – Finally, we’re talking about setting some rules in the alley, man. We need some ground rules so that everyone plays the game right, and no one ends up stepping over the line.

So, that’s the gist, my friend. We keep our AIs in check with some cool-headed strategies and make sure they’re more Dude than Walter. And always remember, this is not ‘Nam. This is AI. There are rules.

You

I love it. Thanks a lot!

Panel of AI Experts for Lawyers

You’re welcome, man! Glad you dug it. If you ever need to chat more about AI or anything else, just drop on by. Take it easy!

Part Two of Two of this article coming soon. This same AI panel experiment will be repeated nine more times with different panels. Then the results of the ten panels will be compared and analyzed. New insights will be shared from this ten-panel effort. We now have a good fix on AI Hallucinations and what we can do to avoid them.

Ralph Losey Copyright 2024 – All Rights Reserved


Some Legal Ethics Quandaries on Use of AI, the Duty of Competence, and AI Practice as a Legal Specialty

May 6, 2024

This blog considers some of the ethical issues of competence that arise when a lawyer or law firm uses generative AI to assist in rendering services. Prior to the advent of artificial intelligence the legal profession devised many ways to meet the duty of competence, including continuing education and the creation of legal specialities. The profession is now supplementing these methods with the use of AI. This raises new ethical and practical issues of competence discussed here. All words by human Ralph Losey alone without AI assistance. All images created by Ralph using his custom AI, Visual Muse: illustrating concepts with style.

The legal specialty tradition allows different client needs to be met by different attorneys. This involves splitting legal work into subareas. A law firm with a number of different specialists uses different lawyers to perform particular tasks as a team effort. One example of this today is in litigation. There are often attorneys who specialize in pleadings-motions practice, others who specialize in discovery or e-discovery, others who specialize in the conduct of trials and still others that only handle appeals. The specialist attorneys collaborate with each other, and the prime client interface lawyer, to perform the work competently. This allows for both very high quality work and more efficient, cost effective services in complex cases.

Many simple cases today are still handled by a solo general practice attorneys, often economically and sometimes with good quality too, but not always. Could AI help both law firms and solo practitioners? This article addresses ethical issues of AI use as a specialist co-counsel. When and how can generative AI be used by lawyers to collaborate to meet their ethical duties of competent legal services?

Specialities and Complex Legal Work

The legal practice of specialization and collaboration allows a lawyer to competently represent a client in very complex situations. These are situations where one lawyer’s skills alone would not be adequate to meet their duty of competence. Competence is required by Model Rule 1.1 of Professional Ethics. That is one reason that law firms have evolved and grown ever larger to include attorneys with a variety of legal skills. This allows lawyers to more easily assist each other in the representation of the firm’s clients.

In today’s world lawyers routinely delegate some of the work involved in representing a client to other attorneys with skills in a field they may not have. Some lawyers may not like to hear this, but the truth is, no one lawyer knows it all. For example, a corporate lawyer specializing in mergers will routinely delegate complex electronic discovery issues they encounter. Moreover, few litigation lawyers would dare approach estate planning or tax issues, and visa versa. Will the advent of AI change this?

Task Splitting is also a Prompt Engineering Strategy

This strategy of spitting tasks is also one of the six strategies recommended by OpenAI for best-practice use of its generative AI. See Transform Your Legal Practice with AI: A Lawyer’s Guide to Embracing the Future (OpenAi’s third strategy is “Splitting Complex Tasks Into Simpler Subtasks”). This is one reason lawyers can easily learn this particular prompt engineering strategy, one of six, for the competent use of generative AI. It is a familiar strategy. The idea in AI is to split up a single task into subparts. That makes it easier for the generative AI to understand and follow. That in turn improves the quality of the AI speech generated, and reduces the errors and hallucinations.

That is like human lawyers splitting up a single task – litigation – into many subtasks. That also reduces minor errors and reduces the colossal near hallucinatory mistakes, which humans, much like AI, can sometimes make. It typically happens to humans lawyers when they are acting way out of their depth. The same thing tends to occur to generative AI.

Questions Raised by Lawyer Use of Generative AI to Meet their Duty of Competence

What happens when a lawyer seeks to meet their ethical duty of competence by delegating some of their work to an AI? It appears that more and more lawyers are trying this now. There are many reasons for this. First of all, generative AI and various LLM applications have knowledge of almost all legal fields, all specialities. Plus, many work for free, or nearly so, and do not request a share of the client’s fee, like a human lawyer specialist. Not only that, they make the human lawyer look good; well, usually.

To get away with using AI to meet your duty of competence to handle a particular matter, lawyers must, however, first have competence to use AI. They must know how to properly delegate work to them. For example, should they use a Centaur method or go full Cyborg? From Centaurs To Cyborgs: Our evolving relationship with generative AI (April 24, 2024).

Legal professionals must know all about GPT errors and hallucinations and not be fooled by false claims to the contrary. They should know what kinds of prompts and methods are most likely to generate errors and hallucinations and what to do about it. They should know about basic prompt engineering strategies, including splitting complex tasks.

There are a host of questions raised concerning competence and the use of AI by legal professionals. Here are some of thoughts on competence and the splitting work strategy. It is spoken through an AI image with a Nigerian accented voice that I like. The transcript follows below. Here you will find many questions. None of them have simple answers.

AI generated talking robot with Nigerian accent. The Image image and words are by Ralph Losey

Transcript of the Video

Hello, human friends. Let’s talk about the legal ethics issues inherent in this strategy. In law, you almost always split your work into many different tasks. You have to do that because your work is usually very complicated. Lawyers long ago figured out that the best way to perform complex actions like litigation is to split the work into subtasks. For instance, a lawsuit usually begins with talking to your client. Next, the pleading is prepared and then timely filed with the proper court. Then there may be motions and discovery and arguing to a judge. Ultimately, if the process continues, there may be a trial and then an appeal. Each step is an important part of the whole process of dispute resolution.

In today’s world, there are attorneys that specialize in each of these tasks. Some, for instance, are great at discovery, but not so good at trials. One ethics issue is when a lawyer should bring in another lawyer to help them with one or more of the tasks. What should you do if you are not competent In all parts of litigation? Ethics rules require that a lawyer have the necessary skills and knowledge required to do their work competently. Either that ,or should bring in another lawyer who is competent. For instance, many trial lawyers routinely bring in an appellate law specialist to help with appeals. Sometimes the help will be behind the scenes and the trial lawyer remains in charge. Other times, the appellate lawyer makes an appearance and handles everything, and the trial lawyers take the second chair to just help.

What what happens if a lawyer uses an AI as the expert to handle a particular subtask in which that lawyer is inexperienced, what happens then? Obviously, the AI cannot just take over and appear in court. Not yet anyway, so the human lawyer remains in the first chair, but has a whispering AI expert to help them. That can work, but only if the human checks everything the AI does.

Plus one other key condition must be met. Do you know what that is? The human AI team must together be competent. They must meet the minimum standards of professional skills and knowledge required by legal ethics.

Here are more questions for you to ponder. Could a lawyer bring a GPT chatbot into a court to help them? Could the AI whisper into the lawyer’s ear to give them advice? For instance, could an AI suggest how to respond to a judge’s question? What if the AI also explained the reason for the suggestion to the human lawyer’s satisfaction? How about this? Should the judge allow the AI to speak directly to them? Should the judge ask the AI questions? There are so many new and interesting questions ahead of us.

Could Use of AI Become a Specialty?

Expertise in Artificial intelligence is already a legal specialty for some lawyers. I predict this new specialty in generative types of AI will quickly grow in popularity and importance. It requires significant skill and experience to use generative AI competently. Some argue AI is just a passing fad. It is hard to take those arguments seriously. But others admit it is here to stay, but argue the need for this specialty will quickly pass, that the software will get so good so fast, that there will be no need for AI specialists. Typically there is a economic motive for this argument, as it is usually made by vendors and their experts. But putting motives aside, the argument goes that sometime in the near future the proper use of generative AI, and other forms of AI, will become so easy that any lawyer can use it.

The hard now, but easy soon argument often uses the analogy of Email. They predict that AI use will become like email use. At first, in the eighties and early nineties, only a few tech expert attorneys could send and receive emails, typically through Compuserve, The Source and the like. With the advent of the internet, that became easier. Today almost all attorneys can send and receive emails. The same thing happened with word processing, although perhaps fewer attorneys today are in fact expert at word processing, with many still yearning for the days of tape dictation and secretaries. You know who you are. Many are my friends. I am pretty sure some still have their secretaries print and send emails for them and ask about faxing too. In the medical field, in Florida at least, the use of fax machines is still widespread and often used to send paper medical records. Every medical office uses fax machines all of the time, a few law firms do too. Hey, I studied the patent for fax machines as one of my first assignments as a young lawyer in 1980. Incredible it is still widely used today.

The hard now, but easy soon argument does have some merit. Email is now far easier than it was in 1980 and any attorney can do it. Most do it very well with no training at all. They grew up with it. But, I do not think email and AI are comparable. I was practicing law and began using email in 1980 while the fax machine was still just a patent. As one of the first lawyers to use emails, faxes and word processors (first Wang then WordPerfect), I can say with confidence that these technologies are not at all comparable to artificial intelligence, not even close. So the argument is flawed. Even if you accept exponential change, which I do, I am very skeptical of AI ever becoming so easy that every lawyer can use it, the way they now use email, faxes and word processors.

Artificial intelligence is a far different creature. It is far more complex and far more difficult to learn how to use. For example, look at discovery and the review of paper as opposed to predictive coding review of ESI. Predictive coding is a type of AI – active machine learning for binary classifications. It is easier to use than the new LLM types of generative AI. Yet the vast majority of attorneys still do not use predictive coding. Although to specialists in predictive coding, many of whom have been using it for well over ten years now, it seems pretty darn easy. Admittedly, it did start off challenging, but we figured out the best methods to use predictive coding. In ten years it was so easy as to be boring for me (and many others). That is one reason I moved on to generative AI. It is a breakthrough technology with new challenges and many open ended legal uses, not just discovery.

But look around in the law today. Years after Judge Peck approved the use of predictive coding in Da Silva Moore in 2012, the legal profession has still not fully adopted predictive coding. Most discovery today is still done with keywords (started in the 1980s), or worse yet, discovery is done manually by linear review. Incredible but true. Even worse, a lot of it is still done with paper. You know who you are and you are legion. So please, do not talk to me about AI becoming so easy to use that even a partner can do it. Change is coming much faster than ever before, but it still comes relatively slow in the legal profession. Bottom line, legal specialization in the use of generative AI is here to stay for the next twenty to thirty years, at least.

To summarize, like generative AIs love to do, there are two main reasons that special AI tech skills are here to stay, no matter how fast the software improves. Number one, the improvements in generative AI will create as many new complexities and challenges as they solve. Overall, it will not become easier because the AI will keep on doing new and even more incredible things. Sure, the summary part may be easy, or easier, but what about the new skills that the next versions of AI will do? For example, how will the new expert panels work? The AI judges? The use of AI will change, and fast, and the learning curve will have to speed up too. Only specialists will be able to keep up.

Number two, the parts of generative AI that do become easy in the future, such as, perhaps legal research, will still be better and faster done by specialists. It will be like predictive coding in e-discovery. Although today it is almost boringly simple to specialists, and many could learn it, they do not. The pros still do most of this work even after it has become easy because the specialists are still much faster and make fewer mistakes than the dabblers. Ah the stories on this I could tell, but don’t worry, I wont.

Conclusion

The words in the following avatar video are by Ralph, not an AI. But the image was generated by AI using Ralph’s prompts, so was the voice. A transcript follows the video.

Transcript of the Centaur Video

This is Ralph Losey in one of his avatar forms. I want to conclude this blog with final comments on AI competence and whether AI specialists will continue to be needed in the future.

I am sure the software will improve, GPT5 will be smarter than GPT4. But I am also sure that, in so far as legal use is concerned, as opposed to making a new website or drafting a sales email, the use of AI by lawyers will still require extensive training. It will still require skill and and experience to use competently. There will still be errors and hallucinations, even with next generation AI, especially in the hands of amateur jockeys. That is just how predictive word and image generation works. Perfection is a myth!

Prompt engineering will, for many years to come, be a critical skill for any attorney who wants to use AI as part of their legal work. It will be of great importance, imperative even, for anyone who wants to specialize in the professional use of generative AI. The competency requirement of Rule 1.1 of Model Rules of Professional Conduct, demand it for law. Other professions such as Medicine (AMA Code of Medical Ethics) have similar or even more stringent requirements. AI is a far, far more powerful tool than email and word processing. It must be used skillfully and carefully to avoid harm to your clients.

Dabblers will continue to get sanctioned, specialists will not. Put another way, a little knowledge is a dangerous thing. Goodbye. Have to trot off and talk to my Cyborg, Wilbur! Do you remember him?

Ralph Losey Copyright 2024 — All Rights Reserved


Report on the First Scientific Experiment to Test the Impact of Generative AI on Complex, Knowledge-Intensive Work

April 29, 2024

A first of its kind experiment testing use of AI found a 40% increase in quality and 12% increase in productivity. The tests involved 18 different realistic tasks assigned to 244 different consultants in the Boston Consulting Group. The Harvard Business School has published a preliminary report of the mammoth study. Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality (Harvard Business School, Working Paper 24-013) (hereinafter “Working Paper”). The Working Paper is analyzed here with an eye on its significance for the legal profession.

My last article, From Centaurs To Cyborgs: Our evolving relationship with generative AI, explained that you should expect the unexpected when using generative AI. It also promised that use of sound hybrid prompt engineering methods, such as the Centaur and Cyborg methods, would bring more delight than fright. The Working Paper provides solid evidence of that claim. It reports on a scientific study conducted by AI experts, work experts and experimental scientists. They tested 244 consultants from the Boston Consulting Group (“BCG”). The Working Paper, although still in draft form, shares the key data from the experiment. Appendix E of the Working Paper discusses the conceptual model of the Centaur and Cyborg methods of AI usage, which I wrote about in From Centaurs To Cyborgs.

Harvard, Wharton, Warwick, MIT and BCG Experiment

This was an impressive scientific experiment involving a very large research group. The co-authors of the Working Paper are: Harvard’s Fabrizio Dell’AcquaEdward McFowland III, and Karim Lakhani; Warwick Business School’s Hila Lifshitz-Assaf; Wharton’s Ethan Mollick; and MIT’s Katherine Kellogg. Further, Saran RajendranLisa Krayer, and François Candelon ran the experiment on the BCG side. The generative AI used and tested here was ChatGPT4 (April 2023 version with no special training). For more background and detail on the Working Paper see the video lecture by Professor Ethan Mollick to Stanford students, Navigating the Jagged Technological Frontier.” (details of experiment set up starting at 18:15).

The 244 high-level BCG consultants were a diverse group who volunteered from offices around the world. They dedicated substantial time performing the 18 assigned tasks under the close supervision of the Working Paper author-scientists. Try getting that many lawyers in a global law firm to do the same.

The experiment included several important control groups and other rigorous experimental controls. The primary control was the unfortunate group of randomly selected BCG consultants who were not given ChatGPT4. They had to perform a series of assigned tasks in their usual manner, with computers of course, but without a generative AI tool. The control group comparisons provide strong evidence that use of AI tools on appropriate consulting tasks significantly improve both quality and productivity.

That qualification of “appropriate tasks” is important and involves another control group of tasks. The scientists designed, and included in the experiment, work tasks that they knew could not be done well with the help of AI, that is, not without extensive guidance, which was not provided. They knew that although these tasks were problematic for ChatGPT4, they could be done, and done well, without the use of AI. Working Paper at pg. 13. Pretty devious type of test for the poor guinea pig consultants. The authors called the tasks assigned that they knew to be beyond ChatGPT4’s then current abilities to be work “beyond the jagged technological frontier.” In the authors’ words:

Our results demonstrate that AI capabilities cover an expanding, but uneven, set of knowledge work we call a “jagged technological frontier.” Within this growing frontier, AI can complement or even displace human work; outside of the frontier, AI output is inaccurate, less useful, and degrades human performance. However, because the capabilities of AI are rapidly evolving and poorly understood, it can be hard for professionals to grasp exactly what the boundary of this frontier might be at a given. (sic)

Working Paper at pg. 1.

The improvement in quality for tasks appropriate for GPT4 – work tasks inside the frontier – was remarkable, overall 40%, although somewhat inconsistent between sub-groups as will be explained. Productivity also went up, although to a lesser degree. There was no increase in quality or productivity for workers trying to use GPT4 for tasks beyond the AI’s ability, those outside the frontier. In fact, when GPT4 was used for those outside tasks, the answers of the AI assisted consultants were 19% less likely to be correct. That is an important take-away lesson for legal professionals. Know what LLMs can do reliably, and what they cannot.

The scientists who designed these experiments themselves had difficulty coming up with work tasks that they knew would be outside ChatGPT4’s abilities:

In our study, since AI proved surprisingly capable, it was difficult to design a task in this experiment outside the AI’s frontier where humans with high human capital doing their job would consistently outperform AI.

Working Paper at pg. 19. It was hard, but the business experts finally came up with a consulting task that would make little ChatGPT4 look like a dunce.

The authors were obtuse in this draft report about the specific tasks “outside the frontier” used in the tests and I hope this is clarified, since it is very important. But it looks like they designed an experiment where consultants with ChatGPT4 would use it to analyze data in a spreadsheet and omit important details found only in interviews with “company insiders.” The AI and consultants relying on the AI were likely to miss important details in the interviews and so make errors in recommendations. To quote the Working Paper at page 13:

To be able to solve the task correctly, participants would have to look at the quantitative data using subtle but clear insights from the interviews. While the spreadsheet data alone was designed to seem to be comprehensive, a careful review of the interview notes revealed crucial details. When considered in totality, this information led to a contrasting conclusion to what would have been provided by AI when prompted with the exercise instructions, the given data, and the accompanying interviews.

In other words, it looks like the Working Paper authors designed tasks where they knew ChatGPT4 would likely make errors and gloss over important details in interview summaries. They knew that the human-only expert control group would likely notice the importance of these details in the interviews and so make better recommendations in their final reports. Working Paper, Section 3.2 – Quality Disruptor – Outside the frontier at pages 13-15.

This is comparable to an attorney relying solely on ChatGPT4 to study a transcript of a deposition that they did not take or attend, and ask GPT4 to summarize it. If the attorney only reads the summary, and the summary misses key details, which is known to happen, especially in long transcripts and where insider facts and language are involved, then the attorney can miss key facts and make incorrect conclusions. This is a case of over-delegation to an AI, past the jagged frontier. Attorneys should read the transcript, or have been at the deposition and so recall key insider facts, and thereby be in a position to evaluate the accuracy and completeness of the AI summary. Trust but verify.

The 19% decline in performance for work outside the frontier is a big warning flag to be careful, to go slow at first and know what generative AI can and cannot do well. See: Losey, From Centaurs To Cyborgs (4/24/24). Humans must remain the loop for many of the tasks of complex knowledge work.

Still, the positive findings of increased quality and productivity for appropriate tasks, those within the jagged frontier, are very encouraging to workers in the consulting fields, including attorneys. This large experiment on volunteer BCG guinea pigs provides the first controlled experimental evidence of the impact of ChatGPT4 on various kinds of consulting work. It confirms the many ad hoc reports that generative AI allows you to improve both the quality and productivity of your work, faster and better. You just have to know what you are doing, know the jagged line, and intelligently use both Centaur and Cyborg type methods.

Appendix E of the Working Paper discusses these methods. To quote from Appendix E – Centaur and Cyborg Practices:

By studying the knowledge work of 244 professional consultants as they used AI to complete a realworld, analytic task, we found that new human-AI collaboration practices and reconfigurations are emerging as humans attempt to navigate the jagged frontier. Here, we detail a typology of practices we observed, which we conceptualize as Centaur and Cyborg practices.

Centaur behavior. … Users with this strategy switch between AI and human tasks, allocating responsibilities based on the strengths and capabilities of each entity. They discern which tasks are best suited for human intervention and which can be efficiently managed by AI. From a frontier perspective, they are highly attuned to the jaggedness of the frontier and not conducting full sub-tasks with genAI but rather dividing the tasks into sub-tasks where the core of the task is done by them or genAI. Still, they use genAI to improve the output of many sub-tasks, even those led by them.

Cyborg behavior. … Users do not just have a clear division of labor here between genAI and themselves; they intertwine their efforts with AI at the very frontier of capabilities. This manifests at the subtask level, when for an external observer it might even be hard to demarcate whether the output was produced by the human or the AI as they worked tightly on each of the activities related to the sub task.

As discussed at length in my many articles on generative AI, close supervision and verification is required from most of the work by legal professionals. It is an ethical imperative. For instance, no new case found by AI should ever be cited without human verification. The Working Paper calls this blurred division of labor Cyborg behavior.

Excerpts from the Working Paper

Here are a few more excerpts from the Working Paper and a key chart. Readers are encouraged to read the full report. The details are important, as the outside the frontier tests showed. I begin with a lengthy quote from the Abstract. (The image inserted is my own, generated using my GPT for Dall-E, Visual Muse: illustrating concepts with style.)

In our study conducted with Boston Consulting Group, a global management consulting firm, we examine the performance implications of AI on realistic, complex, and knowledge-intensive tasks. The pre-registered experiment involved 758 consultants comprising about 7% of the individual contributor-level consultants at the company. After establishing a performance baseline on a similar task, subjects were randomly assigned to one of three conditions: no AI access, GPT-4 AI access, or GPT-4 AI access with a prompt engineering overview.

We suggest that the capabilities of AI create a “jagged technological frontier” where some tasks are easily done by AI, while others, though seemingly similar in difficulty level, are outside the current capability of AI.

For each one of a set of 18 realistic consulting tasks within the frontier of AI capabilities, consultants using AI were significantly more productive (they completed 12.2% more tasks on average, and completed tasks 25.1% more quickly), and produced significantly higher quality results (more than 40% higher quality compared to a control group). Consultants across the skills distribution benefited significantly from having AI augmentation, with those below the average performance threshold increasing by 43% and those above increasing by 17% compared to their own scores.

For a task selected to be outside the frontier, however, consultants using AI were 19 percentage points less likely to produce correct solutions compared to those without AI. Further, our analysis shows the emergence of two distinctive patterns of successful AI use by humans along a spectrum of human-AI integration. One set of consultants acted as “Centaurs,” like the mythical half-horse/half-human creature, dividing and delegating their solution-creation activities to the AI or to themselves. Another set of consultants acted more like “Cyborgs,” completely integrating their task flow with the AI and continually interacting with the technology.

Key Chart Showing Quality Improvements

The key chart in the Working Paper is Figure 2, found at at pages 9 and 28. It shows the underlying data of quality improvement. In the words of the Working Paper:

Figure 2 uses the composite human grader score and visually represents the performance distribution across the three experimental groups, with the average score plotted on the y-axis. A comparison of the dashed lines and the overall distributions of the experimental conditions clearly illustrates the significant performance enhancements associated with the use of GPT-4. Both AI conditions show clear superior performance to the control group not using GPT-4.

The version of the chart shown below has additions by one of the coauthors, Professor Ethan Mollick (Wharton), who put the red arrow comments not found in the published version. (Note the “y-axis” in the chart is the vertical scale labeled “Density.” In XY charts “Density” generally refers to distribution of variables, i.w. probability of data distribution. The horizontal “x-axis: is the overall quality performance measurement.)

Professor Mollick provides this helpful highlight of the main findings of the study, both quality and productivity:

[F]or 18 different tasks selected to be realistic samples of the kinds of work done at an elite consulting company, consultants using ChatGPT-4 outperformed those who did not, by a lot. On every dimension. Every way we measured performance. Consultants using AI finished 12.2% more tasks on average, completed tasks 25.1% more quickly, and produced 40% higher quality results than those without. Those are some very big impacts.

Centaurs and Cyborgs on the Jagged Frontier, (One Useful Thing, 9/16/23).

Preliminary Analysis of the Working Paper

I was surprised at first to see that the quality of the “some additional training group” did not go up more than the approximate 8% shown in the chart. In digging deeper I found a YouTube video by Professor Mollick on this study where he said at 19:14 that the training, which he created, only consisted of a five to ten minute seminar. In other words, very cursory and yet it still had an impact on performance.

Another thing to emphasize about the study is how carefully the tasks for the tests were selected and how realistic the challenges were. Again, here is a quote from Ethan Mollick‘s excellent article. Centaurs and Cyborgs on the Jagged Frontier, (One Useful Thing, 9/16/23). Also see Mollick’s interesting new book, Co-Intelligence: Living and Working with AI (4/2/24).

To test the true impact of AI on knowledge work, we took hundreds of consultants and randomized whether they were allowed to use AI. We gave those who were allowed to use AI access to GPT-4 . . . We then did a lot of pre-testing and surveying to establish baselines, and asked consultants to do a wide variety of work for a fictional shoe company, work that the BCG team had selected to accurately represent what consultants do. There were creative tasks (“Propose at least 10 ideas for a new shoe targeting an underserved market or sport.”), analytical tasks (“Segment the footwear industry market based on users.”), writing and marketing tasks (“Draft a press release marketing copy for your product.”), and persuasiveness tasks (“Pen an inspirational memo to employees detailing why your product would outshine competitors.”). We even checked with a shoe company executive to ensure that this work was realistic – they were. And, knowing AI, these are tasks that we might expect to be inside the frontier.

Most of the tasks listed for this particular test do not seem like legal work, but there are several general similarities. For example, the creative task of brainstorming of new ideas, the analytical tasks and the persuasiveness tasks. Legal professionals do not write inspirational memos to employees, like BCG consultants, but we do write memos to judges trying to persuade them to rule in our favor.

Another surprising finding of the Working Paper is that use of ChatGPT by BCG consultants on average reduced the range of ideas that the subjects generated. This is shown in the below Figure 1.

Figure 1. Distribution of Average Within Subject Semantic Similarity by experimental condition: Group A (Access to ChatGPT), Group B (Access to ChatGPT + Training), Group C (No access to ChatGPT), and GPT Only (Simulated ChatGPT Sessions).

We also observe that the GPT Only group has the highest degree of between semantic similarity, measured across each of the simulated subjects. These two results taken together point toward an interesting conclusion: the variation across responses produced by ChatGPT is smaller than what human subjects would produce on their own, and as a result when human subjects use ChatGPT there is a reduction in the variation in the eventual ideas they produce. This result is perhaps surprising. One would assume that ChatGPT, with its expansive knowledge base, would instead be able to produce many very distinct ideas, compared to human subjects alone. Moreover, the assumption is that when a human subject is also paired with ChatGPT the diversity of their ideas would increase.

While Figure 1 indicates access to ChatGPT reduces variation in the human-generated ideas, it provides no commentary on the underlying quality of the submitted ideas. We obtained evaluations of each subject’s idea list along the dimension of creativity, ranging from 1 to 10, and present these results in Table 1. The idea lists provided by subjects with access to ChatGPT are evaluated as having significantly higher quality than those subjects without ChatGPT. Taken in conjunction with the between semantic similarity results, it appears that access to ChatGPT helps each individual construct higher quality ideas lists on average; however, these ideas are less variable and therefore are at risk of being more redundant.

So there is hope for creative brainstormers, at least with GPT4 level of generative AI. Generative AI is clearly more redundant than humans. As quoted in my last article, Professor Mollick says they are a bit homogenous and same-y in aggregate. Losey, From Centaurs To Cyborgs: Our evolving relationship with generative AI (04/24/24). Great phrase that ChatGPT4 could never have come up with.

Also see: Mika Koivisto and Simone Grassini, Best humans still outperform artificial intelligence in a creative divergent thinking task (Nature, Scientific Reports, 2/20/24) (“AI has reached at least the same level, or even surpassed, the average human’s ability to generate ideas in the most typical test of creative thinking. Although AI chatbots on average outperform humans, the best humans can still compete with them.“); Losey, ChatGPT-4 Scores in the Top One Percent of Standard Creativity Tests (e-Discovery Team, 7/21/23) (“Generative Ai is still far from the quality of the best human artists. Not yet. … Still, the day may come when Ai can compete with the greatest human creatives in all fields. … More likely, the top 1% in all fields will be humans and Ai working together in a hybrid manner.”).

AI As a ‘Skill Leveler’

As mentioned, the improvement in quality was not consistent between subgroups. The consultants with the lowest pre-AI tests scores improved the most with AI. They became much better than they were before. The same goes for the middle of the pack pre-AI scorers. They also improved, but by a lesser amount. The consultants at the top end of pre-AI scores also improved, but by an even smaller amount than those behind them. Still, with their small AI improvements, the pre-AI winners maintained their leadership. The same consulting experts still outscored everyone. No one caught up with them. What are the implications of this finding on future work? On training programs? On hiring decisions?

Here is Professor Ethan Mollick’s take on the significance of this finding.

It (AI) works as a skill leveler. The consultants who scored the worst when we assessed them at the start of the experiment had the biggest jump in their performance, 43%, when they got to use AI. The top consultants still got a boost, but less of one. Looking at these results, I do not think enough people are considering what it means when a technology raises all workers to the top tiers of performance. It may be like how it used to matter whether miners were good or bad at digging through rock… until the steam shovel was invented and now differences in digging ability do not matter anymore. AI is not quite at that level of change, but skill leveling is going to have a big impact.

Ethan Mollick, Centaurs and Cyborgs on the Jagged Frontier: I think we have an answer on whether AIs will reshape work (One Useful Thing, 9/16/23).

My only criticism of Professor Mollick’s analysis is that it glosses over the differences that remained after AI between the very best, and the rest. In the field I know, law, not business consulting, the differences between the very good, the B or B+ lawyers, and great lawyers, the A or A+, is still very significant. All attorneys with skill levels in the B – A+ range can legitimately be considered top tier legal professionals, especially as compared to the majority of lawyers in the average and below average range. But the impact of these skill differences on client services can still be tremendous, especially in matters of great complexity or importance. Just watch when two top tier lawyers go against each another in court, one good and one truly great.

Further Analysis of Skill Leveling

What does the leveling phenomena of “average becoming good” mean to the future of work? Does it mean that every business consultant with ChatGPT will soon be able to provide top tier consulting advice. Will every business consultant on the street with ChatGPT soon be able to “pen an inspirational memo to employees detailing why your product would outshine competitors“? Will their lower priced memos be just as good as top tier BCG memos? Is generative AI setting the stage for a new type of John Henry moment for knowledge workers, as Professor Mollick suggests? Will this near leveling of the playing field hold true for all types of knowledge workers, not only business consultants, but also doctors and lawyers?

To answer these questions it is important to note that the results in this first study on business consultant work does not show a complete leveling. Not all of the consultants became John Henry superstars. Instead, the study showed the differences continued, but were less pronounced. The gap narrowed, but did not disappear. The race only became more competitive.

Moreover, the names of the individual winners and also-rans remained the same. It is just that the “losers” (seems like too harsh a term) now did not “lose” by as much. In the race to quality the same consultants were still leading, but the rest of the pack was not as far behind. Everyone got a boost, even the best. But will this continue as AI advances? Or eventually will some knowledge workers do far better with the AI steam hammers or shovels than others, no matter where they started out? Moreover, under what circumstances, including pricing differentials, do consumers choose the good professionals who are not quite as good as those on the medalist stand?

The study results show that the pre-AI winners, those at the very top of their fields before the generative AI revolution, were able to use the new AI tools as well as the others. For that reason, their quality and productivity was also enhanced. They still remained on top, still kept their edge. But in the future, assuming AI gets better, will that edge continue? Will there be new winners and also-rans? Or eventually will everyone tie for first, at least in so far as quality and productivity are concerned? Will all knowledge workers end up the same, all equal in quality and productivity.

That seems unlikely, no matter how good AI gets. I cannot see this happening anytime soon, at least in the legal field. (I assume the same is also true for the medical field.) In law the analysis and persuasion challenges are far greater than those in most other knowledge fields. The legal profession is far too complex for AI to create a complete leveling of performance, at least not in the foreseeable future. I expect the differentials among medical professionals will also continue.

Moreover, although not studied in this report, it seems obvious that some legal workers will become far better at using AI than others. In this first study of business consultants, all started on the same level of inexperience using generative AI. Only a few were given training. The training provided, only five to ten minutes, was still enough to move the needle. The control group with this almost trivial amount of training did perform better, although not enough to close the gap.

With significant training, or experience, the improvements should be much greater. Maybe quality will increase by 70%, instead of the 40% we saw with little or no training. Maybe productivity will increase by at least 50%, instead of just 12%. That is what I would expect based on my experience with lawyers since 2012 using predictive coding. After lawyer skill-sets develop for use of generative AI, all of the performance metrics may soar.

Conclusion

In this experiment where some professionals were given access to ChatGPT4 and some were not, a significant, but not complete leveling of performance was measured. It was not a complete leveling because the names at the very top of the leaderboard of quality and productivity remained the same. I believe this is because the test subjects were all ChatGPT virgins. They had not previously learned prompt engineering methods, even the beginning basics of Centaur or Cyborg approaches. It was all new to them.

As part of the experiment some were given ten minutes of basic training in prompt engineering and some were given none. In the next few years some professionals will receive substantial GPT training and attain mastery of the new AI tools. Many will not. When that happens, the names on the top of the leaderboard will likely change, and change dramatically.

History shows that times of great change are times of opportunity. The deck will be reshuffled. Who will learn and readily adapt to the AI enhancements and who will not? Which corporations and law firms will prosper in the age of generative AI, and which will fail? The only certainty here is the uncertainty of surprising change.

In the future every business may well have access to top tier business consultants. All may be able to pen an inspirational memo to employees. But will this near leveling between the best, and the rest, have the same impact on the legal profession? The medical profession? I think not. Especially as some in the profession gain skills in generative AI much faster than others. The competition between lawyers and law firms will remain, but the names on the top of the leader board will change.

From a big picture perspective the small differentials between good and great lawyers are not that important. Of far greater importance is the likely social impact of the near leveling of lawyers. The gain in skills of the vast majority of lawyers will make it possible, for the first time, for high quality legal services to become available to all.

Consumer law and other legal services could become available to everyone, at affordable rates, and without a big reduction in quality. In the future, as AI creates a more level playing field, the poor and middle class will have access to good lawyers too. These will be affordable good lawyers who, when ethically assisted by AI, are made far more productive. This can be accomplished by responsible use of AI. This positive social change seems likely. Equal justice for all will then become a common reality, not just an ideal.

Ralph Losey Copyright 2024. All Rights Reserved.


From Centaurs To Cyborgs: Our evolving relationship with generative AI

April 24, 2024

Centaurs are mythological creatures with a human’s upper body, and a horse’s lower body. They symbolize a union of human intellect and animal strength. In AI technology, Centaurs refers to a type of hybrid usage of generative AI that combines human and AI capabilities. It does so by maintaining a clear division of labor between the two, like a centaur’s divided body. The Cyborgs by contrast have no such clear division and the human and AI tasks are closely intertwined.

A centaur method is designed so there is one work task for the human and another for the AI. For example, creation of a strategy is typically a task done by the human alone. It is separate task for the AI to write an explanation of the strategy devised by the human. The lines between the tasks are clear and distinct, just like the dividing line between the human and horse in a Centaur.

This concept is shown by the above image. It was devised by Ralph Losey and then generated by his AI ChatGPT4 model, Visual Muse. The AI had no part in devising the strategy and no part in the idea of putting the image of a Centaur here. It was also Ralph sole idea to have the human half appear in robotic form and to use a watercolor style of illustration. The AI’s only task was to generate the image. That was the separate task of the AI. Unfortunately, it turns out AI is not good at making Centaurs, especially ones with a robot top, instead of a human head, like the following image.

It made this image after only a few tries. But the first image of the Centaur with a robot top was a struggle. I can usually generate the image I have in mind, often even better than what I first conceived, in just a few prompts. But here, with a half robot Centaur, it took 118 attempts to generate the desired image! I tried many, many different prompts. I even used two different image generative programs, Dall-E and Midjourney. I tried 96 times with Midjourney (it generates fast) and never could get it to make a Centaur with a robot top half. But it did make quite a few funny mistakes, and a few scary ones too. Shown below are a few of the 117 AI bloopers. I note that overall Dall-E did much better that Midjourney, which never did seem to “get it.” The one Dall-E example of a blooper is bottom right, pretty close. The rest are all by Midjourney. I especially like the robot head on the butt of the the sort-of robot horse. It is the bass-ackwards version of what I requested!

After 22 tries with Dall-E I finally got it to make the image I wanted.

The point of this story is that the Centaur method failed to make the Centaur. I was forced to work very closely and directly with the AI to get the image I wanted, I was forced to switch to the Cyborg method. I did not want to, but the Cyborg method was the only way I could get the AI to make a Centaur with a robotic top. Back and forth I went, 118 times. The irony is clear. But there is a deeper lesson here that emerged from the frustration, which I will come back to in the conclusion.

Background on the Centaur and Cyborg as Images of Hybrid Computer Use

The idea to use the Centaur symbol to describe an AI method is credited to chess grand master, Garry Kasparov. He is famous in AI history for his losing battle in 1997 with IBM’s Deep Blue, He retired from chess competition immediately thereafter. Kasparov returned a few years later with computer in hand, with the idea that man and computer could beat any computer alone. It worked, a redemption of sorts. Kasparov ended up calling this Centaur team chess, where human-machine teams play each other online. It is still actively played today. Many claim it is still played at a level beyond that of any supercomputer today, although this is untested. See e.g. The Real Threat From ChatGPT Isn’t AI…It’s Centaurs (PCGamer, 2/13/23).

The use of the term Centaur was expanded and explained by Harvard Professor, Soroush Saghafian, in his article Effective Generative AI: The Human-Algorithm Centaur (Harvard DASH, 10/2023). He explains the hybrid relationship as one where the unique powers of intuition of humans are added to those of artificial intelligence. In a medical study he did at his Harvard lab with the Mayo Clinic they analyzed the results of doctors using LLM AI in a centaur-type model. The goal was to try to reduce readmission risks for a patients who underwent organ transplants.

We found that combining human experts’ intuition with the power of a strong machine learning algorithm through a human-algorithm centaur model can outperform both the best algorithm and the best human experts. . . .

In this article, we focus on recent advancements in Generative AI, and especially in Large Language Models (LLMs). We first present a framework that allows understanding the core characteristics of centaurs. We argue that symbiotic learning and incorporation of human intuition are two main characteristics of centaurs that distinguish them from other models in Machine Learning (ML) and AI. 

Id. at pg. 2  

The Cyborg model is a slightly different in that man and machine work even more closely together. The concept of a cyborg, a mechanical man, also has its origins with the ancient Greek myths: Talos. He was supposedly a giant bronze mechanical man built by Hephaestus, the Greek god of invention, blacksmithing and volcanos. The Roman equivalent God was Vulcan, who was supposedly ugly, but there are no stories of his having pointy ears. You would think that techies might seize upon the name Vulcan, or Talos, to symbolize the other method of hybrid AI use, where tasks are closely connected. But they did not, they went with the much more modern day term – Cyborg.

The word was first coined in 1960 (before StarTrek) by two dreamy AI scientists who combined the root words CYBernetic and ORGanism to describe a being with both organic and biomechatronic body parts. Here is Ralph Losey’s image of a Cyborg, which, again ironically, he created quickly with a simple Centaur method in just a few tries. Obviously the internet, which trained these LLM AIs, has many more cyborg-like android images than centaurs.

More On the Cyborg Method

The Cyborg method supposedly has no clear cut divisions between human and AI work, like the Centaur. Instead, Cyborg work and tasks are all closely related, like a cybernetic organism. People and ChatGPTs usual say that the Cyborg approach involves a deep integration of AI into the human workflow. The goal is a blend where AI and human intelligences constantly interact and complement each other. In contrast to the Centaur method, the Cyborg does not distinctly separate tasks between AI and humans. For instance, in Cyborg a human might start a task, and AI might refine or advance it, or vice versa. This approach is said to be particularly valuable in dynamic environments where continuous adaptation and real-time collaboration between human and AI are crucial. See e.g. Center for Centaurs and Cyborgs OpenAI GPT version (Free GPT version by Community Builder that we recommend. Try asking it more about Cyborgs and Centaurs). Also see: Emily Reigart, A Cyborg and a Centaur Walk Into an Office (NAB Amplify, 9/24/23); Ethan Mollick, Centaurs and Cyborgs on the Jagged Frontier: I think we have an answer on whether AIs will reshape work (One Useful Thing, 9/16/23).

Ethan Mollick is a Wharton Professor who is heavily involved with hands-on AI research in the work environment. To quote the second to last paragraph of his article (emphasis added):

People really can go on autopilot when using AI, falling asleep at the wheel and failing to notice AI mistakes. And, like other research, we also found that AI outputs, while of higher quality than that of humans, were also a bit homogenous and same-y in aggregate. Which is why Cyborgs and Centaurs are important – they allow humans to work with AI to produce more varied, more correct, and better results than either humans or AI can do alone. And becoming one is not hard. Just use AI enough for work tasks and you will start to see the shape of the jagged frontier, and start to understand where AI is scarily good… and where it falls short.

Asleep at the Wheel

Obviously, falling asleep at the wheel is what we have seen in the hallucinating AI fake citations cases. Mata v. Avianca, Inc., 22-cv-1461 (S.D.N.Y. June 22, 2023) (first in a growing list of sanctioned attorney cases). Also see: Park v. Kim, 91 F.4th 610, 612 (2d Cir. 2024). But see: United States of America v. Michael Cohen (SDNY, 3/20/24) (Cohen’s attorney not sanctioned. “His citation to non-existent cases is embarrassing and certainly negligent, perhaps even grossly negligent. But the Court cannot find that it was done in bad faith.”)

These lawyers were not only asleep at the wheel, they had no idea what they were driving, nor that they needed a driving lesson. It is not surprising they crashed and burned. It is like the first automobile drivers who would instinctively pull back on the steering wheel in an emergency to get their horses to stop. That may be the legal profession’s instinct as well, to try to stop AI, to pull back from the future. But it is shortsighted, at best. The only viable solution is training and, perhaps, licensing of some kind. These horseless buggies can be dangerous.

Skilled legal professionals who have studied prompt engineering, either methodically or through a longer trial and error process, write prompts that lead to fewer mistakes. Strategic use of prompts can significantly reduce the number and type of mistakes. Still, surprise errors by generative AI cannot be eliminated altogether. Just look at the trouble I had generating a half robot Centaur. LLM language and image generators are masters of surprise. Still, with hybrid prompting skills the surprise results typically bring more delight than fright.

That was certainly the case in a recent study by Professor Ethan Mollick and several others on the impact of AI hybrid work. Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality (Harvard Business School, Working Paper 24-013). I will write a full article on this soon. As a quick summary, researchers from multiple schools collaborated with the Boston Consulting Group and found a surprisingly high increase in productivity by consultants using AI. The study was based on controlled tests of a AI hybrid team approach to specific consulting work tasks. The results also showed that, even though the specific work tasks tested were performed much faster, the quality was maintained, and for some consultants, increased significantly.

Although we do not have a formal study yet to prove this, it is the supposition of most everyone in the legal profession that is now using AI, that lawyers can also improve productivity and maintain quality. Of course, careful double-checking of AI work product is required to catch errors to maintain quality. This applies not only the obvious case hallucinations, but also to what Professor Mollick called AI’s tendency to be “homogenous and same-y in aggregate” writing. Also See: Losey, Stochastic Parrots: How to tell if something was written by an AI or a human? (common “tell” words used way too often by generative AIs). Lawyers who use AI attentively, without over-delegation to AI, can maintain high quality work, meet all of their ethical duties, and still increase productivity.

The hybrid approach to use of generative AI, both Centaur and Cyborg, have been shown to significantly enhance consulting work. Many legal professionals using AI are seeing the same results in legal work. Lawyers using AI properly can significantly increase productivity and maintain quality. For most of the Boston Consulting Group consultants tested, their quality of work actually went up. There were, however, a few exceptional outliers whose test quality was already at the top. The AI did not make the work of these elite few any better. The same may be true of lawyers.

Transition form Centaur to Cyborg

Experience shows that lawyers who do not use AI properly, typically by over-delegation and inadequate supervision, may increase productivity, but do so at the price of increased negligent output. That is too high a price. Moreover, legal ethics, including Model Rule 1.1, requires competence. I conclude, along with most everyone in the legal profession, that stopping the use of AI by lawyers is futile, but at the same time, we should not rush into negligent use of this powerful tool. Lawyers should go slow and delegate to AI on a very limited basis at first. That is the Centaur approach. Again, like most everyone else, my opinion is to start slow and begin to use AI in a piecemeal fashion. For that reason you should begin now and avoid death by committee, or as lawyers like to call it, paralysis by analysis.

Then, as your experience and competence grows, slowly increase your use of generative AI and experiment with applying it to more and more tasks. You will start to be more Cyborg like. Soon enough you will have the AI competitive edge that so many outside experts over-promise.

Vendors and outside experts can be a big help in implementing generative AI, but remember, this is your legal work. For software, look at the subscription license terms carefully. Note any gaps between what marketing promises and the superseding agreements deliver. Pick and choose your generative AI software applications carefully. Use the same care in picking the tasks to begin to implement official AI usage. You know your practice and capabilities better than any outside expert offering cookie-cutter solutions.

Use the same care and intelligence in selecting the best, most qualified people in your firm or group to train and investigate possible purchases. Here the super-nerds should rule, not the powerful personalities, nor even necessarily the best attorneys. New skill sets will be needed. Look for the fast learners and the AI enthusiasts. Start soon, within the next few months.

Conclusion

According to Wharton Professor Ethan Mollick, secret use and false claims of personal work product have already begun in many large corporations. In his YouTube at 53:30 he shares a funny story of a friend in a big bank. She secretly uses AI all of the time to do her work. Ironically, she was the person selected to write a policy to prohibit the use of AI. She did as requested, but did not want to be bothered to do it herself, so she directed a GPT on her personal phone do it. She sent the GPT written policy prohibiting use of GPTs to her corporate email account and turned it in. The clueless boss was happy, probably impressed by how well it was written. Mollick claims that secret, unauthorized use of AI in big corporations is widespread.

This reminds me of the time I personally heard the GC of a big national bank, now defunct, proudly say that he was going to ban the use of email by his law department. We all smiled, but did not say no to mister big. After he left, we LOL’ed about the dinosaur for weeks. Decades later I still remember it well.

So do not be foolish or left behind. Proceed expeditiously, but carefully. Then you will know for yourself, from first-hand experience, the opportunities and the dangers to look out for. And remember, no matter what any expert may suggest to the contrary, you must always supervise the legal work done in your name.

There is a learning curve in the careful, self-knowledge approach, but eventually the productivity will kick in, and with no loss of quality, nor embarrassing public mistakes. For most professionals, there should also be an increase in quality, not just quantity or speed of performance. In some areas of practice, there may be both a substantial improvement in productivity and quality. It all depends on the particular tasks and the circumstances of each project. Lawyers, like life, are complex and diverse with ever changing environments and facts.

My image generation failure is a good example. I expected a Centaur like delegation to AI would result in a good image of a Centaur with a robotic top half. Maybe I would need to make a few adjustments and tries, but I never would have guessed I would have to make 118 attempts before I got it right. My efforts with Visual Muse and Midjourney are typically full of pleasant surprises, with only a few frustrating failures. (Although the failure images are sometimes quite funny.) So I was somewhat surprised to have to spend an hour to bring my desired cyber Centaur to life. Somewhat, but not totally surprised. I know from experience that just happens sometimes with generative AI. It is the nature of the beast. Some uncertainty is a certainty.

As is often the case, the hardship did lead to a new insight into the relationship between the two types of hybrid AIs — Centaur and Cyborg. I realized they are not a duality, but more of a skill-set evolution. They have different timings, purposes and require different prompting skill levels. On a learning curve basis, we all start as Centaurs. With experience we slowly become more Cyborg like. We can step in with close Cyborg processes when the Centaur approach does not work well for some reason. We can cycle in and out between the two hybrid approaches.

There is a sequential reality to first use. Our adoption of generative AI should begin slowly, like a Centaur, not a Cyborg. It should be done with detachment and separation into distinct, easy tasks. Also you should start with the most boring repetitive tasks first. See eg. Ralph Losey’s GPT model, Innovation Interviewer (work in progress, but available at the ChatGPT store).

Our mantra as a beginner Centaur should be a constant whisper of trust, but verify. Check the AI work, learn the mistakes and impose policy and procedures to guard against them. That is what good Centaurs do. But as personal and group expertise grows, the hybrid relations will naturally grow stronger. We will work closer and closer with AI over time. It will be safe and ethical to speed up because we will learn its eccentricities, its strengths and weaknesses. We will begin to use AI in more and more work tasks. We will slowly, but surely, transform into a cyborg work style. Still, as legal professionals, our work will be ever mindful of our duties to client and courts.

More machine attuned than before, we will become like Cyborgs, but still remain human. We will step into a Cyborg mind-set to get the job done, but will bring our intuition, feelings and other special human qualities with us.

I agree with Ray Kurzweil that we will ultimately merge with AI, but disagree that it will come by nanobots in your blood or other physical alterations. I think it is much more likely to come from wearables, such as special glasses and AI connectivity devices. It will be more like the 2013 movie HER, which is Sam Altman’s favorite, with an AI operating system and constant companion cell-phone (the inseparable cell phone part has already come true). It will, I predict, be more like that, than the wearables shown in the Avengers movies, the Tony Stark flying Iron Man suit.

But probably it will look nothing like either of those Hollywood visions. The real future has yet to be invented. It is in your hands.

Ralph Losey Copyright 2024. — All Rights Reserved