Charting an AI “Middle Path”

Written by Daniel McCallum | September 9, 2024

I was recently asked why Unicon doesn’t specifically market a Generative AI-focused offering. Surely, especially given the endless stream of breathless headlines, there must be something we could package into an appealing GenAI consulting service. Of course, like many things, the answer to that question starts with “Yes, but…”. And in this case I think the parts that follow the “but” are both particularly important and particularly interesting.

That’s because, on the one hand, GenAI, unlike the preceding tech hype wave, is obviously and amazingly useful. Clearly this is a genuinely new, broadly relevant capability that anyone responsible for building or applying technology solutions simply cannot ignore. On the other hand, I’d be hard pressed to argue that “ignoring” is what’s going on. Certainly, if you operate within or around technology circles, you’ve observed substantial amounts of what can only be described as FOMO-based decision-making and pressure to do something, anything “with AI.” Goldman Sachs has noticed as well, with Jim Covello, Head of Global Equity Research, arguing that current and projected massive AI-centric capital outlays just don’t make economic sense.

So what is a technology decision maker to do, especially in a sector like education, where institutions face uniquely challenging resource constraints? Sure, GenAI may (eventually) offer huge efficiency and/or efficacy gains for both instructors and learners that leaders would be irresponsible to ignore. But at what cost, and with what risk? After all, even the GenAI industry leaders don’t seem to be succeeding in a traditional economic sense; and that’s before getting into any of the myriad near- and long-term ethical considerations.

With GenAI, as with most things, the middle path is the correct path. And I like the way we expressed this in a simple message we started broadcasting to our service teams back in early 2023: Be Curious, Be Careful. That is: take GenAI seriously, try it out, don’t let professional anxiety turn into Luddism, find incremental ways to let GenAI make you and your customers better at what you and they do; but also balance that enthusiasm and inquisitiveness with healthy skepticism, recognition of the risks of adventure, and relentless pragmatic insistence on solutions that work.

So in lieu of burdening you with another either breathless “if you’re not using AI in these 10 ways, you’re falling behind” FUD (fear, uncertainty, doubt) thread, or an earnest but wholly non-actionable AI ethics think piece, I wanted to provide some on-the-ground insight into how our “Be Curious, Be Careful” perspective can play out across a variety of dimensions, including GenAI tool/service selection, product management decisions, pedagogy, strategy discovery, and even a little bit of pure fun.

Be Curious

How could GenAI amplify what you’re already good at?

On customer projects, one of the areas where we’ve been happiest with GenAI-based solutions is content authoring and media creation. The GenAI fit has been perfect in our case because our teams already know what quality output looks like and how long it should take to produce, so it’s very easy to determine, even just intuitively, whether GenAI output will be acceptable, and whether the license fees and training time are a net accelerant or net drag. Two important expectation-setting qualifications, though:

For us, GenAI-based content creation is about talent amplification and improved learner experience, not workforce reduction - i.e. we’re not looking for GenAI authoring tools to replace subject matter experts nor instructional designers. If that’s your goal, in this or other domains, be prepared for disappointment.
We haven’t (yet) observed consistently lower overall course creation costs. Interestingly, though, while getting to a finished product still takes roughly the same amount of time and effort as it did pre-GenAI, our teams are much happier with the overall quality of that finished product, especially for video content.

We’ve observed similar GenAI-based amplification opportunities on other customer projects where extremely high-value, proprietary content already exists and the goal is to extend the reach of that content and offer more flexible means of access, e.g. domain-aware integrations with other content sources. In these cases, the engineering effort is much larger than on the content authoring projects described above, but the key point remains that GenAI works well in this scenario because the team already knows what “good and correct” looks like.

How could GenAI make it simpler for people to understand each other…

… via more learner-empathetic knowledge sharing? - Some significant fraction of professional communication between knowledge workers involves sharing links to reference and editorial materials, i.e. articles, books, podcasts, studies, etc. Certainly within Unicon we have several chat channels that consist primarily of link sharing. Receiving a link, though, creates a readership burden on the receiver, and what we’ve found is that people are much more engaged with linked content when a summary is provided along with the link. Of course LLMs tend to be rather good at text summarization, and a few of our staff have set up LLM-based tool chains to help simplify linked content summary generation, which in some cases has become the defacto standard expectation when sharing links. Of course, we can’t claim to have invented this approach. In fact, I recently received an invite to an online AI “book club” event that suggested a similar tactic, i.e. “Don’t have time to read the book? Have AI read it for you, then feel free to attend, armed with that summary!” Obviously, LLM-authored summaries are imperfect, so this type of knowledge sharing and consumption is unsuitable for high-stakes analysis. But most scenarios don’t call for that level of accuracy.

… via audience-aware editing? - In our advisory projects, staff are continually being asked to “pull up” their detailed analyses for executive-level consumption or otherwise tailor their messaging to specific audiences. Historically, this process tended to be dependent on editorial feedback from a deliverables review team. But recently one of our architects configured a custom, project-specific GPT that enabled a conversational solution whereby team members could input low-level analytic observations and end up with summary distillations in a consistent voice and format suitable for inclusion in an executive readout. I consider this a truly unique productivity tool that is really only possible due to the availability of consumer-grade LLMs since the output can be tailored to a wide variety of audiences and voices that a human editorial team may be ill-equipped to assess, let alone generate content for. Of course, this steps beyond the careful confidence boundaries drawn by the talent amplification use case discussed above, but in practice we believe the risks are minimal.

Are there tasks you wrote off as impossible or artificially low-priority because of scarce resources? Or tasks that don’t feel like they should require quite so much human intervention, but somehow still do?

One of the non-project GenAI impacts we’ve observed recently is an increased inclination to take on small- to medium-sized tasks that have real value, but which might have tended to be de-prioritized or avoided because, in the absence of LLM-automatable creativity and analytics, they simply took too much time and coordination. For example:

Sentiment analysis of and theme extraction from free-form responses to employee sentiment surveys. This example is particularly interesting because I would argue that I ended up trusting LLM-generated summaries more than I would have trusted human-generated summaries of the same content - partly from a completeness perspective, and partly because of emotional consistency and impartiality.
Company SWAG design - We’ve offered employees company-branded apparel in the past, but with fairly conventional and conservative designs. Now, with diffusion models being so capable, the simplicity of generating nearly endless branding design options seems to have cracked open a new level of interest in and enthusiasm for more whimsical, limited-run corporate apparel designs. This may sound boring, but I can assure you that as a leader it is a uniquely satisfying experience seeing staff spontaneously start sharing company SWAG designs on internal chat. And I’m sure it wouldn’t have happened without readily available GenAI tooling.
RFPs and other sales inquiries - Responding to RFPs demands large time commitments from our sales and services teams, especially because our proposals tend to be highly custom. Commercial RFP response automation tools are nothing new, but finding one that works well, especially for a services vendor, has always been a bit of a holy grail. But one of our staff had good luck with a skunkworks project that took a Retrieval Augmented Generation (RAG) approach for response authoring, so we’ve sponsored a special project to see if we can evolve a centralized proposal authoring service from that simple starting point. Even if it turns out to be less practically useful than we hoped, we’re excited to have identified a low-risk, high-interest engineering project that I’m confident will prove to be a much better hands-on GenAI training tool than any off-the-shelf professional development course.

Are there ways to improve learning that don’t require GenAI to solve truly hard problems?

I’m lifting this line of inquiry from a piece Glenda Morgan published in Phil Hill’s On EdTech newsletter in early 2023. Specifically her recommendation to “not dismiss the non-learning applications of generative AI because that is exactly where the best uses of it for learning are likely to spring.” That is, not all GenAI applications need to attempt to finally crack the tutoring automation problem, or otherwise function as pedagogical lightning bolts from the sky. Those are Hard Problems and as such should be grouped with exactly the kind of challenges that Goldman’s Covello is so skeptical GenAI will actually be able to work out while still making economic sense. But if you follow learning engineering communities, you’d be forgiven if you thought that not only GenAI is the only game in town, but most research and interest is focused on AI-guided instruction and interactive tutoring, a la Khanmigo. It could very well be, though, as Morgan suggests, that even just incrementally improved efficiency in less glitzy but must-have teaching tools like syllabi or short-answer grading could have much more substantial (and reliable) impacts by freeing up instructors rather than computers to focus on the really hard stuff.

Could AI invent a whiskey perfectly suited to your tastes?

I included this perspective as a reminder that not everything is about task efficiency and pedagogical efficacy. Yes, ThoughtWorks used AI to generate a whiskey recipe because its customer believed such whiskey would sell well. But the larger point is that GenAI can and should be seen as a source of fun and creative abundance that, while not without disruptive impacts and risks, doesn’t obviously replace human creative primacy. As noted in the ThoughtWorks case study: “The work of a Master Blender is not at risk,” [Angela D'Orazi, Master Blender & Chief Nose Officer at Mackmyra states]. “While the whisky recipe is created by AI, we still benefit from a person’s expertise and knowledge. We believe that the whisky is AI-generated, but human-curated. Ultimately, the decision is made by a person.”

Be Careful

Will your users tolerate unpredictable, inaccurate behavior?

GenAI solutions (as typically configured and used today) are inherently non-deterministic, i.e there is no guarantee, in general, that given a particular set of inputs, the GenAI will produce the same result. And after all, that’s partly the point: abundant generation, not repetition. But while this type of behavior is ideal in some contexts, e.g. ideating a cover image for your department’s annual Christmas card or interacting with a uniquely capable Clippy to help jump-start a creative writing task, it’s important to remember that users are accustomed to, and indeed need, deterministic software in many (many) contexts. A few examples to help drive home this point:

A Personal Khanmigo Anecdote - My wife recently took advantage of a promotional Khanmigo offer to see if our 13 yr-old son would find it helpful, and maybe even more appealing than face-to-face human mathematics tutoring. The answer was a resounding “no”. It took only a very small number of times where he perceived the AI as misunderstanding him or reacting too slowly to his prompts for him to become agitated. And once the AI came back with a wrong answer to a problem, it was all over. The absolute last thing he wants to be spending his time on is math homework, so a tool that isn’t totally reliable and which he perceives might not actually be saving him time, or which might be sending him down counterproductive paths, is simply a non-starter. Now, he’s my kid, so I tend to over-weight his perspective, but I still have a hard time believing there’s a huge cohort of kids out there looking for less than 100% accurate math advice.
CFA Exam Performance - If you’re interested in GenAI, you’ve probably seen the charts showing how amazingly well LLMs can score on various standardized tests. I won’t argue those scores are invalid, but I will argue they are point-in-time achievements that don’t necessarily say anything about likely performance on any other exams. I.e. these results do not necessarily evidence generalized reasoning ability that translates into high accuracy and performance on any given assessment. For example, one of our employees makes a point of attempting to put each major new LLM release through its paces by having it sit for the Chartered Financial Advisor (CFA) exam. Not only do LLMs tend to perform badly on these exams (even at Level 1, the simplest exam), but the trend is toward lower scores over time. Of course, this doesn’t mean that a determined team couldn’t train or tune a model to perform exceptionally well on a CFA exam. The point is that GenAI output accuracy is not a given, and in cases where users expect or rely on accuracy, project teams will need to be prepared for potentially significant investment to meet those users’ expectations.
Eating Boogers Boosts the Immune System? - Changing up Google search to use GenAI to serve answers rather than a list of candidate information sources makes a ton of sense on paper. After all, I am 100% willing to bet that answers rather than options are in fact what the vast majority of users are actually looking for when they enter their search terms. But of course, GenAI is only as good as whatever the underlying data plus a little bit of “smart randomness” can produce. So, as you would expect, there was some fun sport to be had at Google's expense when AI search was released to a broad audience and people started noticing answers that ranged from the absurd to the dangerous.

Will your users perceive GenAI as a positive attribute at all?

The GenAI (in)accuracy issue is a specific case of a more general problem around whether users perceive GenAI to be a desirable characteristic of a technical solution. I.e., by default, does consumer culture view AI-based tech as likely being somehow better than non-AI-based tech. Anecdotally, I’ll say “no”. A few examples:

Low demand/adoption - Despite the massive capital outlay cited in the Goldman report I keep coming back to, at least one study has shown that attaching “AI” to product marketing materials actually lowers customers’ interest relative to less buzzy labeling, e.g. “high tech.” Indeed, it is not unreasonable to perceive the supply-side AI emphasis as comically out of proportion with demand-side preferences. This was certainly reinforced to me personally when I sat down at brunch with my inlaws last weekend. The first question they asked me was “All the new Verizon phones say they ‘have AI.’ I don’t want it. How can I just take all the AI off?”
Buyer remorse - For those consumers who are inclined to spring for not just AI-labeled but AI-only products, satisfaction levels are not high.
Solving provider problems rather than customer problems - Customers can smell if GenAI is being rolled out to save costs rather than improve service. And customers can tell when AI chatbots are just summarizing knowledge base articles, a capability which is definitely not in the “improve service” category.
Abundance is not the same thing as quality - I heard a speaker recently characterize GenAI as a “cannon of mediocrity,” and it was only partially meant as a joke. While you can get high-quality output from a LLM, it’s still the case that there is no truly free lunch. And users can tell when that bill hasn’t been paid. As one prominent Cloud technologist put it: “The fundamental error all of these GenAI use cases make is in assuming that people will want to read something that other people couldn't be bothered to write.”

What you thought you knew about software development might not work.

Software development project success rates are notoriously poor; GenAI projects success rates might be even worse. And the latter finding is not surprising given the extreme recency of consumer-ready LLM capability emergence, which necessarily corresponds with a scarcity of deeply experienced technical and product design and management talent. But it’s also important to understand that between the “newness” of LLMs and their inherently unpredictable properties, leaders should absolutely expect exacerbated uncertainty around GenAI project delivery timelines, budgets, and impacts. One of the best pieces I’ve read on this is a blog entry from the LinkedIn engineering team. I think it’s fair to say they consider their project successful, but how they got there isn’t going to work consistently in more budget-constrained environments. Here you have a team working in one of the best resourced engineering environments imaginable reduced to essentially trial and error. Of course, managing a project as sequence of experiments rather than tasks requires a significant mindset shift for conventionally Agile engineering teams and managers. And some of the classic issues that often strain engineering-management relationships, including misunderstanding of early-stage velocity sustainability, become even worse in the context of applied GenAI development. From the LinkedIn engineering blog: “[B]uilding with generative AI … creates unattainable expectations, the initial pace created a false sense of ‘almost there,’ which became discouraging as the rate of improvement slowed significantly for each subsequent 1% gain.” And even a well-funded shop like LinkedIn faced challenges with the costs of the compute-intensive infrastructure associated with GenAI: “At the beginning we even had to set timetables for when it was ok to test the product or not, as it’d consume too many tokens and lock out developers from working.”

If you’re trying to solve a high-stakes problem, are GenAI risks acceptable?

I promised this wasn’t going to be another non-actionable AI responsible use think piece, but the big-picture human risks can’t be ignored. Even without the inherent unpredictability of GenAI as it exists today, we have to keep in mind that any purely algorithmic “solution” to problems that directly impact human quality of life are inherently fraught. This doesn’t mean GenAI should be ruled out for all “risky” scenarios, but human risk definitely needs to be at the forefront of technical decision making. For example, I would argue that AI-assisted admissions resource availability notifications and chatbot features (which have been shown to have beneficial practical impacts) are simply less riskful than anything having to do with teaching and learning, or of course job recruiting and higher ed admissions.

Final Thoughts

If you find yourself in a technology decision-making role, especially in education, this is a uniquely challenging time. It’s probably been a long time since you had relatively easy Postgres-vs-MongoDB debates. Instead you may be faced with newly high-stakes questions like whether strong GenAI tooling adoption rates among teachers and students of color is in fact the good thing it appears to be at first glance, or a uniquely subtle reinforcement of structural discrimination that may actually inhibit knowledge gap closure (registration required for Google Groups links). But difficult to navigate scenarios like that, while potentially overwhelming, are really just another reason why it’s important to both embrace and resist the GenAI hype wave. This is by far the most exciting and most dangerous thing going. And navigating a responsible path between those poles will require a level head. Part of the secret to that level-headedness is not forgetting about some very basic questions, like “What exactly are we trying to accomplish?” and “Is AI the most pragmatic way to achieve that goal?”. I think most people agree that failing to answer those questions, especially the first one, was a major ingredient in Los Angeles Unified’s recent AI fiasco. So again:

Be Curious - Stay abreast of the latest in GenAI tooling, methods, and research; understand at least some of how it works (Wolfram’s “What is ChatGPT Doing … and Why Does it Work” is excellent for this); Find out what your staff is excited about; Make resources available so they can engage their own curiosity; Revist problems you previously wrote off as too difficult; Think again about what you assumed you and your teams had already mastered - could you get even better, or make the work more enjoyable?; And of course… Have fun!
Be Careful - Remember, despite the media buzz, local pressures for your management, and whatever Google might be doing, GenAI isn’t the right or best solution for every problem; Ask hard questions about what happens if implementation doesn’t go as planned; If you do decide to take on a high-stakes problem, make sure you can pull the plug quickly if things go sideways in the field; Make sure you can set and enforce compute and API budgets, especially if you don’t host your own models; And maybe spend some time thinking about how your in-laws might react if presented with your team’s latest chatbot design.

Have questions about whether you’ve found the right balance? Unicon has exactly the type of level-headed people you need to help you think through it.

View full post