If you’ve ever wanted to extract structured data from unstructured text — think resumes, invoices, or customer support emails — LangExtract
is a Python library that makes it dead simple.
Backed by Google’s Gemini models, LangExtract lets you define what you want to extract (a schema or prompt), and it handles the rest. This post focuses on how to use LangExtract effectively with two real examples: a job application email and a service invoice.
Project Overview (Very Brief)
Before we dive in, here’s the basic layout of the demo project:
langextract-demo/
├── langextract_demo/
│ ├── __main__.py
│ ├── extractor_job.py
│ └── extractor_invoice.py
├── data/
│ ├── job_application_email.txt
│ └── invoice.txt
├── .env
Now, let’s explore how to use the library itself.
How LangExtract Works
The core idea is simple: LangExtract takes a prompt_description
, a few examples, and your input text — then sends it to Gemini via a structured API.
You get back typed, structured Extraction
results with class labels and attributes. Think of it like “few-shot prompting as code.”
Define the Prompt
invoice_prompt = \"\"\"Extract invoice number, date, recipient, itemized services,
total amount due, and due date.
Use exact text spans — do not infer or reword.\"\"\"
Provide Few-Shot Examples
Few-shot examples help steer the model. You give it one or more annotated examples like this:
lx.data.ExampleData(
text=\"Invoice #INV-10123\\nDate: August 5, 2025\\n...\",
extractions=[
lx.data.Extraction(\"invoice_number\", \"INV-10123\"),
lx.data.Extraction(\"date\", \"August 5, 2025\", attributes={\"type\": \"issue\"}),
...
]
)
Load Your Input Text
with open(\"data/invoice.txt\") as f:
invoice_text = f.read()
Run the Extraction
This is where the magic happens:
result = lx.extract(
text_or_documents=invoice_text,
prompt_description=invoice_prompt,
examples=[invoice_example],
model_id=\"gemini-2.5-pro\",
api_key=os.getenv(\"GOOGLE_API_KEY\")
)
And now you can iterate over structured results:
for item in result.extractions:
print(f\"- Class: {item.extraction_class}\")
print(f\" Text: {item.extraction_text}\")
print(f\" Attributes: {item.attributes}\")
Job Application Example
You can do the same for emails or resumes:
email_schema_prompt = \"\"\"Extract candidate name, email, phone,
experience, skills, expected salary, and availability.\"\"\"
Just swap the prompt, example, and input text — the rest stays the same.
When to Use LangExtract
LangExtract is ideal for:
- Parsing job applications or CVs
- Extracting line items from invoices
- Summarizing customer support emails
- Metadata extraction from legal or medical text
Get the Full Example Project
The full demo project with Docker and Makefile support is on GitHub:
github.com/sprusov/langextract-demo
What Will You Build?
LangExtract makes structured extraction feel like cheating. What domains would you apply this to — HR, finance, legal, healthcare?
Share your ideas or fork the repo and experiment!