Research Spike

Chronicle

A digital historical archive platform that tracks international incidents by combining factual institutional sources with oral history from social media. First case study: Iran's Woman, Life, Freedom movement.

March 2026 | Research Phase | v0.1 Spike
4
Source Tiers
10
Case Events
15+
APIs Evaluated
5
Pipeline Stages
What This Spike Proves

Chronicle's methodology is portable, technically feasible, and addresses a real gap in digital archiving. The Iran case study demonstrates that systematic capture and comparison of divergent narratives produces historical understanding that no single source type achieves alone. An MVP is buildable in 8-12 weeks at under $200/month infrastructure cost.

The Core Problem

Historical truth is not singular. Every international incident generates multiple, often contradictory narratives across institutional media, local reporting, government statements, and citizen testimony. Traditional archives privilege institutional sources. Social media captures what institutional media cannot: the lived experience of events, the immediate emotional register, the details that editorial processes filter out.

Chronicle treats social media and citizen journalism as first-class historical data, subject to rigorous but distinct verification protocols. A shaky mobile phone video from a Tehran protest carries evidentiary weight that no wire service summary can replicate.

Source Classification and Verification

Chronicle uses a four-tier source taxonomy that ranks sources not by inherent trustworthiness, but by institutional accountability and editorial oversight. A Tier 4 source (citizen video) may capture ground truth that Tier 1 sources miss entirely. The tier system determines verification protocols, not value.

Tier 1: Institutional Record Tier 1

Wire services (Reuters, AP, AFP), official government statements, UN reports and resolutions, ICJ rulings, treaty texts.

Cross-reference across 2+ independent sources
Tier 2: Established Journalism Tier 2

Major international outlets (BBC, Al Jazeera, NYT), established investigative organizations (Bellingcat, OCCRP), specialist publications.

Document outlet, byline, sourcing methodology
Tier 3: Regional / Independent Tier 3

Local news outlets, independent journalist platforms, specialist blogs, diaspora media (Iran International, Manoto TV).

Verify track record, funding model, independence
Tier 4: Citizen Testimony Tier 4

Social media posts (Twitter/X, Telegram, TikTok), citizen journalism, oral history interviews, community forum discussions.

Geolocation, metadata extraction, identity protection

Confidence Levels

Each archived claim receives a corroboration score based on source independence, diversity, temporal consistency, and methodological transparency.

LevelCriteriaMeaning
Confirmed 3+ independent sources across 2+ tiers High confidence, multiple corroboration
Probable 2 independent sources, or multiple within one tier with transparency Likely accurate, further corroboration welcome
Reported Single-source or same-tier-only corroboration Documented but flagged for verification
Contested Sources actively contradict each other Contradictions become part of the record

Data Pipeline and Infrastructure

Chronicle's architecture follows five stages: Ingest, Classify, Verify, Store, Visualise. Each stage is independently scalable and failure-tolerant.

01
Ingest
Kafka/Redis message queue, batch + real-time, raw data lake
02
Classify
NER, topic classification, geolocation, translation
03
Verify
Cross-reference scoring, temporal checks, deduplication
04
Store
PostgreSQL, full-text search, S3 object storage
05
Visualise
Static HTML, interactive timeline, map views

API Landscape

APICoverageCostUse for Chronicle
GDELT300M+ events since 1979, 100+ languages, 15-min updatesFree (BigQuery free tier)Primary event data source
MediaCloud50,000+ sources, attention trackingFree for researchMedia attention analysis
Event Registry300,000+ sources, auto event clustering$499-999/moDeduplication support
NewsAPI150,000+ sourcesFree (limited) / $449/moSupplementary ingest
ACLEDReal-time geo-referenced conflict dataFree for academic useVerification layer
PlatformAccessCostIran Relevance
Twitter/XPro tier: 1M tweets/month, full archive$5,000/moCritical: primary diaspora platform, Mahsa Amini movement
TelegramNo official API; Telethon (MTProto)Free (engineering cost)Critical: primary platform inside Iran
Reddit100 req/min with OAuthFree for researchSecondary: English-language discussion
TikTokResearch API (institutional access)Free (approval required)Growing: diaspora content
Meta (FB/IG)Content Library API (CrowdTangle sunset)Free (approval required)Secondary: users shifting to Telegram/X
SourceAccessContent
UN Digital LibraryFree, structuredResolutions, OHCHR reports, Special Rapporteur on Iran
ICCWeb scraping (no API)Case documents, related investigations
Government feedsRSS / scrapingUS State Dept, UK FCO, EU, IRNA

Infrastructure Cost Comparison

ComponentSelf-HostedCloud (AWS)
Compute$50-100/mo (Hetzner)$150-300/mo (EC2)
DatabaseIncluded (PostgreSQL)$50-100/mo (RDS)
Object Storage$5/mo (MinIO)$15-23/mo per TB
SearchIncluded (Meilisearch)$80+/mo (OpenSearch)
Message QueueIncluded (Redis)$50+/mo (ElastiCache)
Total$50-150/mo$350-600/mo

Information Architecture and Reader Journeys

Chronicle serves four distinct personas, each with different expertise and tolerance for complexity. The archive must serve all four without forcing any into a workflow designed for another.

Academic Researcher

Needs: comprehensive source access, citation-ready exports (BibTeX, RIS, Chicago), provenance metadata, side-by-side source comparison.

Entry: Google Scholar link, institutional database, colleague's syllabus.

Pain points: Archives that bury primary sources behind editorial summaries. Interfaces that hide metadata. Citation formats requiring manual reformatting.

Investigative Journalist

Needs: fast timeline navigation, source verification indicators, narrative evolution tracking, downloadable source material.

Entry: Direct search for a specific incident, or timeline browsing around a known date.

Pain points: Archives without clear chronological ordering. No way to see what changed between early and late reporting.

General Public / Students

Needs: context, guided narrative, accessible language. Visual timeline exploration without needing to understand source taxonomy.

Entry: Social media link, news article embed, search engine result.

Pain points: Archives that assume domain expertise. No clear starting point for unfamiliar users.

Human Rights Investigator

Needs: evidence chains, geolocation data, chain-of-custody documentation, structured exports (JSON, CSV) for integration with investigation tools.

Entry: Direct navigation or API query from existing investigation tools.

Pain points: Archives without original metadata. No chain-of-custody. Export formats incompatible with investigative toolchains.

Four Navigation Modes

ModeMental ModelBest For
By Timeline"What happened when?"Journalists, general readers
By Event"Tell me about this specific incident"Researchers, investigators
By Source Type"What did [source category] report?"Academics, media analysts
By Narrative Theme"How did this story evolve?"Long-form readers, students

Design System and Components

The visual language is built on three references: Linear (clean surfaces, subtle borders, focused interactions), Vercel (precise typography, generous whitespace, monospace accents), and NYT interactive features (layered information reveal, scroll-driven narrative).

Source Tier Visual Language

The credibility gradient is the most important visual system. Colors follow a traffic light metaphor: green (verified) to red (scrutinize). This is a methodological transparency tool, not a value judgment on oral history.

Tier 1
Wire / Official
Tier 2
Major Outlet
Tier 3
Local / Indep.
Tier 4
Social / Oral

Component Samples

Below are live rendered components from the Chronicle design system, demonstrating how event cards, source cards, and oral history elements appear in context.

Event Card

2022-09-16
Death of Mahsa Amini in Morality Police Custody
Tehran, Iran
47 sources
3 narratives

Oral History Element

"They are shooting at us from the rooftops. My neighbor was hit. We cannot leave."
Shared by @anonymous, Sep 21 2022 via Twitter (now deleted)
Geolocation confirmed (Sanandaj)   Corroborated by 2 independent sources   Archived Sep 21 2022 via Internet Archive

Iran: Why This Context

Iran serves as Chronicle's first case study because it presents every challenge a digital historical archive must solve, concentrated in a single context. The country generates a dense, multilingual, multi-platform information ecosystem where state media, international press, diaspora outlets, and citizen journalists produce fundamentally divergent accounts of the same events.

If the Chronicle methodology works for Iran, it works anywhere.

Government internet shutdowns erase digital evidence in real time. Platform algorithms amplify certain narratives while burying others. State-sponsored disinformation campaigns operate alongside genuine grassroots testimony.

Three Intersecting Crises (2022-2025)

Domestic Legitimacy Crisis

The Woman, Life, Freedom movement: 500+ killed (Iran Human Rights), 22,000+ arrested (Amnesty International). The most sustained anti-government protests since 1979. By mid-2023, street protests diminished under repression, but underlying grievances continued in labor strikes and civil disobedience.

Nuclear Standoff

Uranium enrichment reached 84% purity (near weapons-grade) by early 2023, confirmed by IAEA. JCPOA revival talks stalled. Monitoring cameras decommissioned June 2022. The nuclear question became inseparable from regional security.

Regional Proxy Conflicts

Iran's support for Hamas, Hezbollah, Houthis, and Iraqi militias placed Tehran at the center of multiple conflicts. Culminated in Iran's unprecedented direct missile and drone attack on Israel in April 2024, the first open military strike between the two states.

Source Density: September 2022 Peak

Sep 2022
312 src
Oct 2022
248 src
Nov 2022
189 src
Dec 2022
134 src
Jan 2023
98 src
Feb 2023
76 src

Key Events

Ten documented events demonstrating how Chronicle maps incidents across source tiers, identifies narrative divergences, and documents engagement distortions.

2022-09-16
Death of Mahsa (Jina) Amini
22-year-old Kurdish-Iranian woman died in Kasra Hospital after detention by morality police. Her death triggered the Woman, Life, Freedom movement, the largest anti-government protests since 1979.
humanitarian 47 sources Confirmed
2022-09-30
Bloody Friday in Zahedan
Security forces opened fire on protesters at Makki Mosque. At least 96 killed (Iran Human Rights), making it the single deadliest day of the 2022-2023 protests. Received dramatically less international coverage than Tehran protests despite higher death toll.
military_action protest Confirmed
2022-09-21
Internet Shutdowns Begin
Near-total internet shutdowns documented by NetBlocks, Cloudflare Radar, and OONI. Mobile data cut nationwide, fixed-line throttled. The most direct threat to Chronicle's methodology: when the state eliminates Tier 4 sources in real time.
policy humanitarian Confirmed
2022-12-08
Execution of Mohsen Shekari
First known execution of a protester. Convicted of "moharebeh" (enmity against God) in proceedings Amnesty International documented as a "sham trial": no legal counsel, forced confessions, under one hour.
legal humanitarian Contested
2023-2024
Expansion of Morality Police Enforcement
Return of morality police patrols (July 2023), AI-powered surveillance cameras for hijab detection, more restrictive legislation (September 2023). Iranian women created a citizen counter-surveillance network via Telegram.
policy ongoing Confirmed
2024-03-01
Contested Parliamentary Elections
41% turnout, the lowest in the Islamic Republic's history. Guardian Council disqualified the majority of reformist candidates. Social media documented both crowded and empty polling stations, often filming the same stations at different times of day.
election Contested
2024-04-13
Iran's Missile and Drone Attack on Israel
"Operation True Promise": 300+ drones, cruise missiles, and ballistic missiles. First open military strike between the two states. Satellite imagery showed some impact craters at Nevatim airbase. Neither "total success" nor "total failure" is accurate.
military_action diplomatic Contested
2024-05-19
Death of President Raisi in Helicopter Crash
President Raisi's helicopter crashed near Azerbaijan border in fog. Three competing narratives: mechanical failure (most supported by Tier 1-2), deliberate sabotage, or pilot error compounded by weather. Public reaction was genuinely split: state media showed massive funerals, social media documented celebrations.
event diplomatic Probable
2023-2024
Labor Protests and Teacher Strikes
Recurring strikes across Tehran, Isfahan, Ahvaz, Tabriz. ILO cited persistent violations. Teachers' union leaders imprisoned. Received a fraction of social media engagement compared to Mahsa Amini protests despite affecting more people over longer periods.
protest humanitarian Confirmed
2023-2024
Houthi Attacks on Red Sea Shipping
Iran-linked Houthi attacks on commercial shipping. UN Panel of Experts documented Iranian weapons transfers. The Iran connection illustrates interpretive contradiction: all agree Houthis conduct attacks; sources diverge on Iran's operational role.
military_action geopolitical Confirmed

Cross-Event Engagement Distortion

Comparing engagement patterns across all ten events reveals systematic distortions that any Iran-focused archive must account for.

Narrative Comparison: Mahsa Amini

This demonstrates how Chronicle renders divergent accounts of the same event, applying the Rashomon Protocol.

● Official Account (Tier 1-2)

State TV (IRIB): "Ms. Amini suffered a heart attack at the guidance patrol station. She had pre-existing heart conditions."

IRNA, Fars News confirm official statement. Medical report released by coroner's office (2022-10-07) concludes "heart failure."

Key claim

Cause of death: pre-existing heart condition. No evidence of physical abuse during custody.

Assessment: State produced no independent medical evidence. No access granted for independent investigation.

● Citizen Accounts (Tier 3-4)

Family testimony (BBC Persian): "She had no pre-existing conditions." Amini's cousin posted video from outside Kasra Hospital. Initially 342 likes; later 45,200.

1500tasvir (Telegram) circulated photos showing bruising. CT scan images shared by anonymous hospital staff showed skull fracture.

Key claim

Cause of death: blunt force trauma. Beaten in custody van. Delayed medical attention (2-hour gap documented).

Assessment: Probable Leaked CT scan consistent with head trauma. Multiple independent witness accounts. Family testimony. Hospital admission timeline confirmed by Tier 1 sources.

Systematic Distortion Patterns

Events generating English-language viral content (Amini's death, Israel strike, Raisi's crash) received 10-100x more global engagement than events documented primarily in Farsi (labor strikes, Zahedan massacre, morality police enforcement). Chronicle's correction: weight source diversity and language diversity equally.

Replicability: Iran Validates the Methodology

The Iran case study validates core design choices:

Build Assessment

Buildable Now

Core Archive

PostgreSQL + Static Site

Schema, timeline visualization, source linking. Straightforward web development with mature tools.

4-8 weeks, solo developer

Data Integration

GDELT + ACLED Ingest

Both offer well-documented, free APIs with structured data. Ingest pipelines buildable in days.

1-2 weeks

Search

PostgreSQL Full-Text

Covers 80% of search needs without additional infrastructure. Multilingual with custom dictionaries.

1-2 weeks

Needs Development

Highest Effort

Social Media Ingest

Twitter/X API costs are high ($5K/mo), platform reliability declining. Telegram scraping requires careful engineering. This is the highest-effort component.

4-6 weeks + ongoing maintenance

NLP Challenge

Persian Text Pipeline

Off-the-shelf NER has limited Persian accuracy. Fine-tuning requires labeled training data that does not exist yet.

6-8 weeks including data labeling

Core Value Prop

Verification Pipeline

Automated cross-referencing and confidence scoring. Requires temporal reasoning, independence checking, and disinformation handling.

4-6 weeks for V1, ongoing refinement

Technical Risks

  1. API instability: Twitter/X pricing has changed repeatedly. Mitigation: treat as one input among many, archive aggressively.
  2. Persian NLP accuracy: low-resource language processing. Mitigation: human-in-the-loop verification for Persian content.
  3. Data volume surges: conflict events can generate millions of posts in weeks. Mitigation: Kafka/Redis buffering with raw data lake as safety net.
  4. Legal and ethical risks: consent, source safety, data retention. Mitigation: anonymization features, consultation with digital rights organizations (EFF, Access Now).
MVP Recommendation

Phase 1 (8-12 weeks, 2-3 developers, under $200/mo): GDELT + ACLED ingest, PostgreSQL storage, static HTML report, basic search, manual curation interface.

Phase 2: Telegram integration (most relevant for Iran, cheapest to access).

Phase 3: Twitter/X integration, contingent on API pricing stability. NLP pipeline evolving alongside data: start manual, build training data, automate incrementally.