How to Perform Semantic Analysis with OpenAI and Gemini Embeddings

Turn text into understanding. Now.

Both platforms offer state-of-the-art embeddings. OpenAI launches text-embedding-3-small and text-embedding-3-large. Google responds with gemini-embedding-exp-03-07. The race is on. Quality soars. Prices drop. Developers win.

Start Here: OpenAI’s Power Move

text-embedding-3-small processes 62,500 pages per dollar.

Fast. Cheap. Accurate enough. This model handles 8,192 tokens, delivers 1,536 dimensions, and achieves 62.3% on MTEB benchmarks. Most projects start here.

Want more precision? text-embedding-3-large doubles the dimensions to 3,072. Performance jumps to 64.6%. Cost increases to $0.13 per million tokens. Still processes 9,615 pages per dollar. The choice becomes clear: volume or accuracy.

from openai import OpenAI
client = OpenAI()

# One line. One embedding. Done.
embedding = client.embeddings.create(
    input="Your text here",
    model="text-embedding-3-small"
).data[0].embedding

But here’s what changes everything. Dimensions are adjustable. Need smaller vectors? Add one parameter. Storage costs plummet. Speed increases. Accuracy holds surprisingly well. No other provider offers this flexibility at scale.

# Compress to 512 dimensions
compressed = client.embeddings.create(
    input="Your text here",
    model="text-embedding-3-small",
    dimensions=512  # Magic happens here
).data[0].embedding

Gemini’s Counter-Strike

Google built something different.

gemini-embedding-exp-03-07 isn’t just another embedding model. It’s multilingual mastery. It’s code comprehension. It’s retrieval perfection. The model accepts 8,192 tokens and offers three dimension options: 768, 1,536, or 3,072. You choose upfront. No compression later.

Rate limits bite hard. Experimental status shows. But performance? State-of-the-art across every benchmark that matters. When accuracy is everything, when languages mix freely, when code meets prose, Gemini delivers.

For production? Use text-embedding-004. Stable. Proven. Limited to 2,048 tokens and 768 dimensions, but reliable at 1,500 requests per minute. It outperforms comparable models consistently. The workhorse for Google Cloud natives.

import google.generativeai as genai

# Flexible dimensions with experimental model
result = genai.GenerativeModel('gemini-embedding-exp-03-07').embed_content(
    content="Your multilingual text here",
    task_type="RETRIEVAL_DOCUMENT",
    output_dimensionality=1536
)

Real Implementation Patterns

Batch processing saves money. Always.

Don’t embed one document. Embed hundreds. OpenAI accepts arrays up to 2,048 elements per request. Gemini handles batches through Vertex AI. Time saved compounds. Costs drop dramatically. Latency becomes manageable. Your pipeline scales.

def intelligent_batch_embed(texts, model="text-embedding-3-small"):
    embeddings = []
    batch_size = 100 if model.startswith("text-embedding") else 50
    
    for i in range(0, len(texts), batch_size):
        batch = texts[i:i+batch_size]
        response = client.embeddings.create(input=batch, model=model)
        embeddings.extend([d.embedding for d in response.data])
    
    return embeddings

Semantic search requires strategy.

Store embeddings in purpose-built databases. Pinecone. Weaviate. Qdrant. ChromaDB. Each optimizes differently. Pinecone excels at scale. Weaviate integrates everything. Qdrant runs anywhere. ChromaDB stays simple. Choose based on your constraints, not their marketing.

The search itself? Cosine similarity usually wins. Sometimes dot product suffices. Rarely, Euclidean distance helps. But the real optimization happens before similarity calculation. Index properly. Partition intelligently. Cache aggressively.

Advanced Patterns That Actually Work

Hybrid strategies multiply power.

Use OpenAI for general text. Switch to Gemini for code blocks. Average results for consensus. This isn’t theoretical. Production systems do this daily. The overhead pays off when accuracy matters.

Dynamic dimensions solve real problems.

Start with high dimensions during development. Measure actual performance. Reduce dimensions gradually. Find the sweet spot. OpenAI makes this trivial with their dimensions parameter. Gemini requires choosing upfront, so prototype carefully.

Cross-language search breaks barriers.

Gemini’s multilingual strength shines here. Embed English queries. Search Spanish documents. Find Chinese results. The semantic space transcends language. One model, many tongues, unified understanding.

Cost Reality Check

OpenAI pricing favors volume:

text-embedding-3-small: $0.02 per million tokens
text-embedding-3-large: $0.13 per million tokens
Adjust dimensions to reduce storage costs
Batch everything possible

Gemini pricing rewards commitment:

First 1 billion characters free monthly (experimental)
Production models follow Google Cloud pricing
Rate limits enforce reasonable usage
Enterprise agreements unlock scale

Performance Truth

MTEB benchmarks tell part of the story. Real applications tell the rest. text-embedding-3-large achieves 64.6%. Impressive. But gemini-embedding-exp-03-07 claims state-of-the-art status across multiple dimensions. More impressive? Maybe.

Your data matters more than benchmarks. Your use case trumps evaluations. Your constraints define success. Test both. Measure everything. Let results guide decisions.

Implementation Wisdom

Token limits shape architecture. 8,192 tokens seems generous until you process legal documents. Chunk thoughtfully. Maintain context. Overlap slightly. Never break mid-sentence. Consider sliding windows for long texts.

Error handling prevents disasters. APIs fail. Networks hiccup. Rate limits hit. Implement exponential backoff. Add jitter to prevent thundering herds. Log everything. Monitor patterns. Alert on anomalies.

Embedding drift is real. Models update. Vectors shift. Similarities change. Document your model versions. Timestamp your embeddings. Plan re-embedding cycles. Version your vector databases.

Task parameters matter. Gemini’s RETRIEVAL_DOCUMENT vs RETRIEVAL_QUERY changes results. OpenAI optimizes automatically but understanding helps. Match task types to use cases. Consistency beats optimization.

The Decision Framework

Choose OpenAI when: Speed matters most. English dominates. Costs need control. Dimensions must flex. Stability trumps features. You’re already using GPT models.

Choose Gemini when: Languages vary. Code understanding is critical. You’re in Google’s ecosystem. Experimental features excite you. State-of-the-art performance justifies restrictions.

Choose both when: Accuracy is paramount. Budget allows. Different content types exist. Redundancy matters. You need validation.

Start Now

Pick one. Build something. Measure results. Iterate quickly. The best embedding model is the one you ship with.

# Your journey starts here
text = "What matters most to your users?"
embedding = get_embedding(text)
# Now find what's similar

Simple beginning. Powerful results. Semantic understanding awaits. Try my version of semantic analysis using embeddings.

Stop reading. Start building. Text becomes vectors. Vectors become insights. Insights become value.

This is how modern search works. This is how AI understands. This is how you compete.

Make it happen.