Stages of LLM Product Development

Where does it make sense to use LLMs and how will those LLM products mature over time?

Apr 21, 2023

We have been building a Chrome extension to learn how to develop and deploy LLM products. The goal for our extension is simple: help people for whom English is not a native language to express themselves in English adequately. As trilingual people ourselves, we often find ourselves thinking in our native tongue and then translating that to English on the fly as we talk or write something.

The three things that would help us are the ability to translate text from one language to English, make the text sound more formal, and summarize or compress a long piece of text. You can check out our extension here.

In this blog, I wanted to highlight the engineering and product management challenges of building an LLM product. The main takeaways are:

GPT products work best for generative tasks where either the cost of an incorrect generation is minimal, or it is easy to verify the generated text (ideally by the end-user)
GPT makes the data science, model building and R&D aspects of product development trivial
However that ‘trivialness’ is balanced out by the engineering complexity, costs and reliability of your product
As your LLM product progresses, its complexity increases, reliability drops and it makes more sense to move to a custom model

We also talk about the current backend architecture for our application and our vision of the progression of an LLM product.

What kind of products will GPT be good at?

LLMs excel in generative and creative applications where users provide a short context and some information, and the model can expand, compress, or tailor the output accordingly. This is why many LLM-based products cater to creators like writers and artists. Building such applications is relatively straightforward, requiring only well-crafted prompts, a user-friendly interface, and an API call to OpenAI.

There are not a lot of applications that require you to generate text or be creative. Most AI workloads require them to do classification or information extraction. LLMs are not particularly adept at either. Some startups attempt to create LLM-based tools for querying data or answering questions from lengthy documents, resulting in long, expensive prompts and inconsistent outputs.

Another concern when developing an application that relies on LLM-generated outputs is verifying the generated data's accuracy. Generative applications can produce large volumes of data, some of which may need to be corrected. In applications that help users query extensive documents, the answers generated by LLMs might be far from accurate, and human verification could prove challenging.

Considering the above three constraints, LLMs are perfect for generative applications where incorrect outputs have minimal consequences or can be quickly checked for errors. With this in mind, we created a product that generates small, easily verifiable text snippets, leaving the verification responsibility to our clients rather than us. Examples of such applications include creating SEO content, writing sales messages, and chatbots or assistants designed to answer general questions.

The Progression of an LLM Product

In this section will explore how we think LLM products will develop and grow as they add more features and functionalities.

Stage 1: Wrapper over prompts (prompt tuning)

In the first stage, many LLM products will serve as wrappers over prompts, providing users an easy way to use LLMs to solve problems. These applications often focus on content generation and management. The simplicity of this stage means that deployment is straightforward, but it also makes the product easy to replicate. Our application is in this stage, as are many others.

Stage 1.5: Integrating Personal Data and Documents

To stay competitive, it's essential to quickly progress to this stage, which involves integrating user data into your model for a more personalized experience. This integration will require setting up databases to store user data, vector databases to store embeddings, managing more complex prompts, and extending your backend.

Our extension aims to make the output sound more like the user's responses and enable users to query longer documents for better understanding.

Stage 2: Fine-tuned Models for Reliability, Scalability, and Trust

As API costs increase and prompts become more complex and less reliable, you'll need to fine-tune your models or train custom models for specific tasks or clients. We plan to integrate open-source LLM models and deploy our fine-tuned models for our Chrome extension.

Stage 3: Workflows and Chains - Toward Autonomy

In the next stage, you want to be able to automate client tasks and workflows. Here products will start to use chains where one LLM output serves as input for another. For example, our Chrome extension could take user input, create an article, and then generate social media posts and images from that article. So the output of the blog-generating LLM will be fed to the social media post generating LLMs.

Stage 4: Agentic Workflows - Integrating External APIs and Tools

Once you've established workflows, the next step is to integrate external APIs and tools. For instance, our product could automatically post generated social media content and monitor comments, crafting replies as needed. However, due to reliability concerns, companies may still want humans to verify outputs and LLM-performed tasks.

Stage 5: Truly Autonomous - Human Supervision, Not Creation

In the final stage, all reliability and trust concerns are resolved, allowing LLMs to perform tasks autonomously with only human supervision. An LLM could research trending topics, aggregate articles, create new content, and manage social media interactions, with humans providing only high-level directions. This marks the dawn of the age of LTMs (Large Thinking Models).

Chrome Extension Backend

As we embark on our journey to develop a cutting-edge LLM product, it's important to acknowledge the simplicity of our current backend infrastructure. In this section, we'll discuss how our Stage 1 product utilizes a minimalistic approach, avoiding complex packages and focusing on effective prompts.

The Power of Simplicity

While many LLM products might rely on sophisticated packages such as LangChain, LlamaIndex, GuardRails, Chroma, or PineCone, our backend takes a different approach. We've chosen to keep our system simple, focusing on a few carefully tested prompts rather than incorporating a myriad of external tools. This decision enables us to maintain a lean, easily manageable backend that efficiently serves our users' needs.

The Prompt-Centric Workflow

The heart of our Stage 1 product lies in the prompts that we have painstakingly crafted and tested. When a user submits a request, our system wraps the appropriate prompt around their input and sends the combined data to OpenAI. The LLM then generates the desired output, which is returned to the user.

Additionally, at this stage, it is important that our user experience does not hinder the creative workflow. This streamlined backend ensures that our application remains highly responsive, delivering results quickly and effectively. The simplicity of our backend provides a reliable and consistent user experience.

Navigating Stage 1 and Beyond

However, as we progress through the stages of LLM product development, our backend will inevitably evolve to accommodate more advanced features and capabilities. We have already started work on our Stage 2 features: integrating document search and learning from a user’s past messages.

If you want to join us in this journey we would love for your contributions! Our application was built in less than two weeks. Our front end was built entirely using chatgpt. Imagine what we can do with more help! If you want to know more about how to contribute, check it out here!

ScaleDown

Discussion about this post