Wed, Mar 5, 2025. Reliably creating interactive tutorials is hard today. Claude 3.7 Sonnet ran out of tokens when I tried creating an interactive tutorial on diffraction. Cursor got the tokens but failed to get the application right after 3 attempts. This is not yet reliable, and when it does become reliable, education will change a fair bit. #tts#impossible
Tue, Oct 22, 2024. Gemini sort-of supports diarization. Ref. I tried it and it's OK but not perfect. #impossible
LLMs cannot diarize reliably yet. (Gemini just guesses the speaker differences.)
Human in the Loop is about humans evaluating model outputs. That's different from AI in the loop, human in the center, where AI accelerates human output (like Github Copilot)
Operations
CHECK EMBEDDINGS DRIFT over time. Users might be input-ing different things than before.
LOG AND REVIEW everything.
Instructor coaxes structured output from LLM APIs.
IMPLICIT FEEDBACK collection is easy. Just let users edit stuff.
Tactical
Try n-shot prompting (n=5-12) before bigger models.
Always structure for output: Markdown, XML/HTML tags.
Combine RAG with Keyword search. It reduces user frustration in edge cases.
Prefer multiple small prompts to one big prompt. Do X. Then Y. Then Z.
Jitter prompts for diversity beyond temperature.
LLM-as-judge works better when comparing outputs (not rating 1 output). Keep length similar (LLMs prefer wordiness). Swap order and compare. Allow for ties. Ask for reason FIRST.
"Hermes performed significantly better for charters with well-defined metadata and a relatively smaller number of tables."
"We collect feedback on the accuracy of the returned query from stakeholders directly within the Slack bot."
How I use AI and "Replacing my right hand with AI"
EMBED in every app/workflow. E.g. Auto-fix spellings. Auto-review code. Auto-ask LLM on errors and apply patch! Auto-search for answer, assess, continue.
PERSIST. Stick with the LLM to the end. Don't fix it yourself. It's faster.
INTERVENE FAST. If an LLM can't solve it by itself in 2 tries, it needs in-depth help.
APP-IFY one-off tasks. Disposable tools. "Write web-app to convert JSON to tab-delimited." "Extract fields as a table." "Diff JSON."
BEST language/frameworks preferred. CUDA in Python. Rust. C. Raspberry Pi. Arduino. Bluetooth. Modern ESM/JS.
TEACH examples. "Here's the LLM Foundry API." "Here's how to use gramex.data."
DUMP entire code. Models can handle it. Refactoring to SQLAlchemy 2, Pandas 2. API Documentation. Test case generation.
ASK for features & packages. Docker without root access. GPU access inside docker. Windows CLI-only C++ compiler.
TEST CASE writing.
SPEC IN DETAIL. Use these libraries. Write like this: code example.
SPEC USAGE in detail.
"I will just pipe it into sqlite", or "I will just run ffmpeg -i filename [YOUR OPTIONS].
Describe the UI, API input/output, data structure, and internal data structure.