#embeddings | Anand S - Things I Learned

Fri, Nov 21, 2025. OpenRouter supports embedding models. BGE base seems pareto optimal with 0.5 cents / MTok and a good MTEB ranking. #embeddings #gpu #models

Sat, Nov 8, 2025. OpenRouter supports embedding models using an OpenAI-like API #embeddings #future #models

Sun, Nov 2, 2025. ⭐ One approach to giving memory ("episodic memory") to coding agents is to allow them to search their logs.This gives them access to past discussions about a repo or other repos. #ai-coding #code-agents #embeddings #future #github

Tue, Oct 7, 2025. Brain coding is the new term for human coding - as opposed to vibe-coding (AI codes, human doesn't review code) and AI coding (AI codes, human reviews code). #ai-coding #code-agents #dev #embeddings #future

Wed, Sep 10, 2025. With embeddings, atomic labels + hierarchy beat instruction-heavy prompts. Prefer short, concrete sub-labels (e.g., “promotion,” “job security,” “flexibility”) that roll up to a parent "career" rather than a composite instruction like “Total Rewards and Career Growth”. Embedding similarity is not smart enough to figure this out. #ai-coding #embeddings #future #markdown

Sat, Sep 6, 2025. Searching embeddings of text summaries of images improves vision search a lot. Jason Liu #embeddings #image-generation

Mon, Aug 11, 2025. Apple's Embedding Atlas (Demo - slow, needs WebGPU) is an embeddings visualizer, like #embeddings #gpu #markdown Tensorflow Projector or Mantis (Demo).

Sun, Aug 10, 2025. Some ways to inject graph structure into topic similarities to, for example, cluster threaded discussions. # #embeddings
- Start with a graph similarity matrix S, like #
  - a regularized graph Laplacian (based on degree - adjacency matrix)
  - a similarity matrix like graph2vec from Graph Kernel
  - a node-embedding karateclub.
- Option 1: "Smoothen" the embedding matrix multiplying it with S (i.e. spread each document towards neighbors), then calculate similarities
- Option 2: Take the weighted average of S and the embedding similarity matrix

Mon, Jul 28, 2025. Prompt ablation is a neat way of figuring out the importance of each token in a prompt. using embeddings: #ai-coding #embeddings
- Calculate the embedding of the prompt
- Remove each token, calculate the embedding, and its distance from the original embedding
- Tokens with high distance have high importance

Mon, Jul 21, 2025. ⭐ DuckDB supports joins based on embedding similarity and even hybrid similarity! Ref #embeddings

Wed, May 21, 2025. Voyage 3.5 embeddings outperforms OpenAI-v3-large by 8.26% with 2.2x lower costs. voyage-3.5-lite offers 6.34% better at 6.5x lower cost. Both have 1.5x smaller embedding dimension. The first 200 million tokens are free. #embeddings

Tue, Apr 22, 2025. "Evaluation in the loop" or "Evals-in-the-loop" is a new term I learnt. SignalBloom's Hallucination Bechmark #ai-coding #embeddings #future #optimization #prompt-engineering

Sun, Mar 16, 2025. Since Gemini Flash 2.0 is now an image GENERATION model, interactive VISUAL fiction is now a cool possibility. People are using it in interesting ways: #embeddings #future #image-generation Interleaved storytelling, Memes, Surrealism.

Fri, Mar 14, 2025. gemini-embedding-exp-03-07 leads the MTEB and is currently the top embedding model by a big margin. #embeddings

Sat, Feb 22, 2025. The Nomic Embed v2 blog post has an excellent visualization for embedding quality. It takes all Wikipedia disambiguation articles and shows them on a Nomic Atlas, embedded via Nomic Embed v2. It lets you toggle to OpenAI text-ada-002 which moves the topics far away. Visually, this is very convincing. #embeddings

Thu, Jan 30, 2025. You can use embeddings as the input to a classical ML classifier. This can improve classification a lot. Nomic #embeddings

Tue, Dec 24, 2024. DuckDB's VSS extension HNSW index + Embeddings (2K chunks of 512 dimensions) takes up roughly 2.5X the size of the original data. Embedding 554 files of ~4,456 KB took 710 seconds. Creating the index took 660 seconds. The resulting DB was 18.1 MB. #embeddings

Sun, Dec 8, 2024. When embedding using a SentenceTransformer.encode(docs) it's best if we embed with smaller docs and call it multiple times (rather than embedding more at once). On Colab T4, for gte-base-en-v1.5, when embedding 1,000 docs of up to 8K chars each, here is the TOTAL time it took, based on batch sizes (lower is better) #embeddings
- 1 doc per call: 10s
- 2 docs per call: 13s
- 4 docs per call: 19s
- 8 docs per call: 23s
- 16 docs per call: 32s
- 32 docs per call: 40s

Sun, Dec 8, 2024. Running embeddings without a GPU is extremely slow. It takes ~2.4 seconds per string. #embeddings #gpu

Sun, Oct 27, 2024. Via Soumya Ranjan #embeddings
- Vision embedding is useful in agile modeling
- Vision embedding models with SAM, Grounding Dino by meta, Alibaba does good stuff
- Vision embedding is more useful in batch than real time
- Embedding subtraction with vision embedding models like Dino

Sun, Oct 27, 2024. Cohere Multimodal Embed v3 is available on Azure. #cloud #embeddings

Sat, Oct 26, 2024. Meta has a bunch of image embedding models: #embeddings #image-generation
- DINOv2 creates image embeddings (Apr 2023)
- ImageBind is an embedding model for text, images, audio, and more (Jun 2023)

Sun, Oct 6, 2024. Late chunking is an interesting approach to adding context to embeddings. (I don't understand it, but it's cheap and effective.) #embeddings #markdown

Sun, Oct 6, 2024. DeepInfra offers embedding models as APIs at about 0.5 to 1 cent per MTok in an OpenAI compatible API. #embeddings It also supports text-to-image models like flux.dev and speech recognition models like Whisper.

Thu, Oct 3, 2024. Matching addresses with just embeddings works well. Combine it with simple hard rules. Ref #embeddings

Sun, Sep 22, 2024. Anthropic's Introducing Contextual Retrieval says: #embeddings
- Use BM25 in addition to embeddings to match rare terms (e.g. identifiers)
- Add a context to each chunk's metadata (generate it with a cheap LLM) and pass it to the summarizing LLM
- Reranking helps with cost AND accuracy. Use Cohere or Voyage

Sat, Aug 10, 2024. Embedding models can be fine-tuned. Example: #embeddings #future #gpu #markdown #models #optimization #todo

Sat, Jun 8, 2024. Looks like GPT-4o is using CNNs to create vector embeddings of images, with images gridded into a 1x1, 2x2, etc. PLUS OCR. Ref #chatgpt #embeddings #gpu #image-generation

Wed, May 29, 2024. Some audio embedding models: #embeddings #speech-to-text #voice-cloning unoti/voice-embeddings, retkowsky/audio_embeddings, pyannote/embedding (for speaker similarity), and more.

Wed, Apr 24, 2024. Embeddings can be averaged. So, to embed large documents, average the embeddings of their chunks! OpenAI suggests this. #embeddings

Sat, Mar 30, 2024. typesense supports embeddings natively. #embeddings

Sat, Mar 30, 2024. Binary embeddings are good enough. Cohere releases binary embeddings. #embeddings #gpu

Tue, Feb 13, 2024. Embeddings in random forest are very effective at classification -- much better than dot product. #embeddings

Sun, Jan 28, 2024. OpenAI releases text-embedding-3-large which can be truncated. The embedding values have descending importance, so picking the first n is a good approximation. Also, gpt-3.5-turbo-0125 is 50% cheaper. #embeddings #gpu

Sun, Jan 7, 2024. Voyage AI Embeddings have a higher quality, similar price compared to OpenAI embeddings. There's a clear benefit to replacing text-embedding-3-large with voyage-3-lite. There's a 200 MTok free tier currently. #embeddings

Fri, Dec 22, 2023. Name2Vec is a potential embedding for names. #embeddings #future

Sun, Dec 3, 2023. Meta released SeamlessExpressive which preserves emotions in speech-to-speech translations #embeddings #future #speech-to-text #tts #voice-cloning

Sat, Nov 25, 2023. Flat indexing of chunks is not the only way to store embeddings. LlamaIndex allows you to create hierarchies that you can traverse for retrieval #embeddings

Thu, Nov 23, 2023. Embeddings can help just re-rank regular search results. Ref #embeddings #search