docs: update README with historical context, semantic search, gemma tools, current stats
This commit is contained in:
@@ -1,6 +1,6 @@
|
||||
# Simon
|
||||
|
||||
**Simon** is the Freiberg family's AI historian — a conversational interface to a genealogical database of 1,296 people spanning three centuries, from 18th-century Palatinate Germany to present-day America.
|
||||
**Simon** is the Freiberg family's AI historian — a conversational interface to a genealogical database of 1,311 people spanning three centuries, from 18th-century Palatinate Germany to present-day America.
|
||||
|
||||
He's named after **Simon Freiberg I** (1782–1864), the patriarch who was born Sedrel Moses in Steinbach am Donnersberg and adopted the surname Freiberg during the Napoleonic decrees. Ask Simon who he is, and he'll tell you about the man, not the chatbot.
|
||||
|
||||
@@ -26,12 +26,41 @@ The backend was designed for AI from the start. After building it, Seth realized
|
||||
|
||||
Simon runs on [Gemma 4 (26B)](https://ai.google.dev/gemma), Google's open-weight language model, hosted locally. When you ask a question, he searches the family database, looks up relationships, pulls life events and dates, and composes a response. All of this happens on private infrastructure — no data leaves the family's network.
|
||||
|
||||
He has six tools:
|
||||
|
||||
- **find_by_name** — looks up a person by name, handling nicknames, typos, and partial matches
|
||||
- **search_by_topic** — hybrid BM25 + semantic search across the tree for thematic queries (places, occupations, eras)
|
||||
- **lookup_person** — retrieves full details for a specific person (facts, citations, family)
|
||||
- **find_relationship** — traces the path between two people in the tree
|
||||
- **get_stats** — tree-wide statistics
|
||||
- **get_historical_context** — retrieves sourced historical background entries relevant to a person, place, or era. Returns the context along with the list of family members it applies to.
|
||||
|
||||
The search tools use a hybrid ranking system that combines traditional keyword matching (BM25) with semantic similarity (cosine over 1024-dimensional embeddings from bge-large-en-v1.5). This means Simon finds relevant results even when the question uses different words than the database — asking about "the liquor business" finds people tagged with "distillery" and "wholesale liquor."
|
||||
|
||||
He has two modes:
|
||||
|
||||
**Historian** — the default. Ask about anyone in the tree and Simon looks them up. Direct, factual, no filler. He knows the difference between what the records say and what's uncertain.
|
||||
|
||||
**Interview** — when a family member identifies themselves, Simon offers to switch into interview mode. In this mode, he becomes an oral history collector. He asks follow-up questions, prompts your memory using what's in the database, and captures everything. These conversations are logged so they can be reviewed and — where corroborated — added to the family record. Living family members are the richest source we have.
|
||||
|
||||
## Historical Context
|
||||
|
||||
Simon draws on a growing library of historical context entries — sourced background articles about the eras, places, and events that shaped the family. When you ask "why did the Freibergs come to Cincinnati?" or "what was the whiskey industry like in the 1860s?", Simon retrieves relevant context entries with citations and shows which family members they apply to.
|
||||
|
||||
These entries are generated autonomously by **gemma-context**, a local LLM tool that clusters persons by geography, era, and thematic connections (occupations, immigration patterns, religious communities), then searches Wikipedia and other public sources to synthesize grounded historical summaries. Every entry is tagged for human review before it's considered authoritative.
|
||||
|
||||
Topics covered include German-Jewish immigration patterns, the Cincinnati whiskey trade, Napoleonic civil registration decrees in the Palatinate, Jewish religious communities, and more — each linked to the specific people in the tree it applies to.
|
||||
|
||||
## Data Quality
|
||||
|
||||
Two local LLM tools run continuously to maintain data quality:
|
||||
|
||||
**gemma-audit** scans the entire database for issues: facts without source citations, internal contradictions (date math errors, timeline impossibilities), research tasks that are already answered by existing data, and potential duplicate person records. It produces a findings report that a verification agent then reviews and acts on — linking existing sources, contesting bad facts, or filing research tasks for anything that needs external investigation.
|
||||
|
||||
**gemma-context** populates the historical context library described above. It clusters persons by surname, geography, era, and thematic signals (occupations, migration patterns, religious affiliations, census records, even the Napoleonic surname decrees), generates targeted web search queries, and synthesizes the results into sourced context entries.
|
||||
|
||||
Both tools run on local GPUs (a 3090 Ti and a V100) using Google's Gemma 4 model. No data leaves the network. No API calls to external AI services.
|
||||
|
||||
## The Family
|
||||
|
||||
The Freibergs arrived in Cincinnati in the 1840s from the Palatinate region of what is now southwestern Germany. What followed is a distinctly American story:
|
||||
@@ -43,7 +72,7 @@ The Freibergs arrived in Cincinnati in the 1840s from the Palatinate region of w
|
||||
- **Stella Freiberg** was a founding leader of the National Federation of Temple Sisterhoods and a pillar of the Cincinnati Symphony Orchestra.
|
||||
- **David Shire**, connected through the Scheuer/Shire branch, won an Academy Award for the song *"It Goes Like It Goes"* from the film *Norma Rae*.
|
||||
|
||||
The tree today includes 1,296 people, 2,195 relationships, 222 sources, and 1,016 citations across more than a dozen interconnected families.
|
||||
The tree today includes 1,311 people, 2,198 relationships, 222 sources, and 1,016 citations across more than a dozen interconnected families.
|
||||
|
||||
## Origins
|
||||
|
||||
|
||||
Reference in New Issue
Block a user