Prompt Engineering - /s (strangemonad's notes)

Prompt engineering is the process of iteratively improving your prompt. Why? get and ai to produce what you want more effectively - Trial and error - Many ways to improve your prompt to get better output - LLMs aren't good at understanding **what** we want. We have to externalize what's normally implicit context between 2 humans. - e.g. if you paste a blob for context, instead of making the instruction something vague like "improve it" you'll get better results if you're more specific like "make the tone more professional" and "correct any typos". # Components of a good prompt Prompts usually have 6 key components (though all 6 aren't required in every prompt) - **Context** - data needed to follow the instruction - **Instruction** - what the chat completion should do - **Role** - the identity the ai should assume when generating completions - **Formatting** - output formatting is how to render the output. You can think of output formatting as a special type of instruction or modifying the instruction. e.g. output the examples as a table, csv, markdown, json - **Tone** - Similar to formatting but more focused on style rather than the structural format of the output e.g. make it humorous or sarcastic. - **Examples** - When it's hard to explicitly describe format or tone (eg follow the writing style in the example) # Understanding LLMs Understanding the underlying model architectures is key to understanding how to improve strategies for prompting. - If you're prompting a transformer LLM, the model can see the entire prompt at once - ie it can "remember" everything in the prompt (vs a human reading tokens sequentially and being limited by short term memory). Modern LLMs use rotary positional embeddings (RoPE) or learned positional embeddings and can focus on any part of the input (as learned to be appropriate during training). This makes attention in modern LLMs dynamic. Architecture approaches like mixture of experts (MoE) with different attention heads learning to focus on different parts of the input (e.g. heads focused on local vs global context) ## Implications on prompt design - Front-loading important context still works well. Earlier tokens can likely influence global context attention heads. - mid-sequence content is **not** neglected. Newer models are good at incorporating mid-sequence content and reasoning over long spans. - **Prompt structure** often matters more than token position alone. You can mix in json context input for better results in some settings https://chatgpt.com/share/6832814f-bb34-8008-b984-66db79774746 ## Hallucinations Since models will always respond with something, it's key to fact check the results # Prompt Strategies for Reasoning - Zero shot chain of thought prompting - Few shot prompting - Few shot chain of thought prompting ## Zero shot chain of thought - Nothing more than ending the prompt with "let's go step by step" or some similar instruction. - Originally instigated by observing that LLMs are bad at math and evaluating different prompting techniques that would allow the LLM to perform better "natively" (ie in it's own auto-regressive regime) vs needing to make a tool call. - Created before ChatGPT existed but can still be useful on modern models (though they're getting better). - Instead of the LLM just suggesting the answer as it's next completion tokens, the "think step by step" instruction causes the LLM to "show its work" and use the output context as a sort of mental math scratch-pad. e.g. "what's 312189 * 32194094. Think step by step." ## Few shot prompting - Monkey see, monkey do. The LLM will continue the context in a way that best-matches the input prompt examples - A formal version of providing examples and context about the desired goal. eg: """ REVIEW: Great product, 10/10: LABEL: positive REVIEW: Didn't work very well: LABEL: negative REVIEW: super helpful, worth it: LABEL: positive It doesn't work!": """ ## Few shot chain of thought prompting - Combines both CoT and Few shot - In the examples provided, also provide reasoning - e.g. business lead classification ``` **Prompt:** You are a helpful assistant that classifies business leads into one of three categories: **Hot**, **Warm**, or **Cold**. - **Hot**: The lead shows strong intent to buy soon, has a clear budget, and decision-making authority. - **Warm**: The lead is interested and may have a budget, but isn't ready to purchase immediately. - **Cold**: The lead shows vague interest or is just gathering information. Analyze the information step-by-step before making a classification. --- **Example 1** **Lead Info**: "Hi, I'm looking to purchase software for my 10-person marketing team within the next month. I have a budget of around $5,000, and I’m the CMO." **Reasoning**: - The lead has a clear timeline (“within the next month”). - The budget is defined ($5,000). - They are the decision-maker (CMO). → **Classification: Hot** --- **Example 2** **Lead Info**: "We're considering upgrading our project management tool sometime next year. Just exploring options right now." **Reasoning**: - Timeline is vague (“sometime next year”). - No budget mentioned. - Just researching. → **Classification: Cold** --- **Example 3** **Lead Info**: "Our IT team is looking into potential vendors for cloud storage. We’re not in a rush, but we do have a rough budget of $10,000. Final decisions are made by the CIO." **Reasoning**: - Some budget is defined. - No urgency (“not in a rush”). - The lead may not be the final decision-maker. → **Classification: Warm** --- **Now classify this lead:** **Lead Info**: "I'm researching CRM platforms for my sales department. We plan to implement something by Q4. I have a shortlist and will present it to our VP of Sales next month." ``` # Prompting strategies for complex problems - Generated Knowledge - Least to Most - Emotional Prompting Useful when few shot and CoT aren't successful. ## Generated Knowledge Prompting - A mechanism to load relevant info into context - In this case, since we're not doing RAG, it's serving the purpose of activating portions of pre-trained memory. - enter the question / instruction in the prompt but also instruct it not to answer the question (ie the question is context rather than the instruction) then instruct it to list some relevant facts. Then, using the generated output, ask it the question. ## Least to Most Prompting - Break a problem into sub-problems and solve them one by one and then combine the final answer. - CoT is just prompting to think step by step, leaving the LLM to complete tokens how it sees fit vs Least-to-Most specifically instructs the LLM to break-down and solve sub problems - Both techniques can be combined. ## Emotional prompting - Add emotional encouragement in the prompt e.g. "this is important to my career" or "believe in your abilities" # Conflicting responses Sometimes answers are conflicting ## Self-consistency prompting - Ask the same question multiple times - Then ask to reconcile the differences and summarize a final answer - Sort of like generated knowledge where the previous attempts to answer serve as the facts input into the context and activation of pre-trained memory. - This approach is definitely more time / resource intensive and there isn't always a clear right answer among all the responses. ## Role prompting - Roles can be used to evoke style and tone but can also be used to activate specific pre-trained knowledge as part of the context e.g. "You're an expert historian and geographer." ## Multi-role prompting - combines role prompting with self-consistency. You are providing answers from different points of view. # Dealing with hallucinations - RAG - Self-evaluation prompting - ask the question, get the response, then ask "are you sure". # Resources - learngprompting.org [Prompt Engineering Guide: The Ultimate Guide to Generative AI](https://learnprompting.org/docs/introduction) - https://learnprompting.thinkific.com/ - DAIR.ai - promptingguide.ai [Prompt Engineering Guide | Prompt Engineering Guide](https://www.promptingguide.ai/)