How I Built Free Apify Actors to Scrape Congressional Stock Trading Data Directly from Government Sources

From Usahobs, the free encyclopedia of technology

Quick Facts

Introduction

Accessing the stock trading activities of U.S. Congress members has become a hot topic for transparency, but many aggregators charge hefty fees for data that is actually public domain. After relying on QuiverQuant's API for my personal trading dashboard—paying $30 per month for inconsistent endpoints and occasional errors during market hours—I decided to take matters into my own hands. Over two weekends, I built two Apify actors that scrape Senate and House Periodic Transaction Reports (PTRs) directly from official government websites, providing clean JSON data at roughly one-tenth the cost of commercial alternatives.

How I Built Free Apify Actors to Scrape Congressional Stock Trading Data Directly from Government Sources — Source: dev.to

Why I Built This

The STOCK Act of 2012 requires every member of Congress to disclose most stock trades within 45 days. The data is freely available from two government sources:

Senate: efdsearch.senate.gov – a Django-based search portal for all Senate disclosure forms.
House: disclosures-clerk.house.gov – a daily-updated ZIP file containing all House PTRs.

Despite being public, most resellers like QuiverQuant charge monthly subscriptions for cleaned and normalized data. My personal dashboard needed per-transaction granularity and stable uptime, which would have required a higher-tier paid plan. Instead, I built my own pipeline using Apify's serverless platform, eliminating subscription costs and giving me full control over the data format.

Output Schema

Both actors normalize every transaction into a consistent JSON object. Here is the structure:

id – SHA-256 hash of politician|type|transaction_date|ticker|amount_min|amount_max|owner. This allows idempotent syncing to a database without duplicates.
politician – Full name of the member (e.g., "Mark Alford").
transaction_date – Date of the trade (YYYY-MM-DD).
filing_date – Date the report was filed (YYYY-MM-DD).
ticker – Stock ticker symbol (e.g., "AMZN"). Returns null for bonds, municipal securities, or structured notes.
asset_name – Full description of the asset (e.g., "Amazon.com, Inc. – Common Stock").
asset_type – Type of asset: Stock, Bond, Municipal Security, etc.
type – buy or sell (House entries map "Purchase" to buy, "Sale (Full)" and "Sale (Partial)" to sell).
amount_min – Lower bound of the transaction value in cents, parsed from standard brackets (e.g., 1001 for $1,001).
amount_max – Upper bound (e.g., 15000 for $15,000). Set to null for “Over $X” unbounded disclosures.
owner – Who owns the asset: self, joint, spouse, or child.

This schema is identical for both Senate and House actors, making it easy to combine datasets or run them independently.

How the Actors Work

Senate Actor

The Senate disclosure system at efdsearch.senate.gov is a Django application that imposes several obstacles:

Akamai bot protection: Direct HTTP requests (even with cURL) return a 403 Forbidden. Apify's default datacenter proxy pool is also blocked. The actor uses a single residential exit IP for the entire session, pinned via a unique session ID.
Terms-acceptance gate: Every session must first POST prohibition_agreement=1 to /search/home/. Without this, all disclosure URLs fail. The session cookie expires quickly and silently, so the actor refreshes it when needed.
Two-stage data flow: The search endpoint (/search/report/data/) returns a list of filings (each representing a PTR document). Each filing has a link to a separate detail page containing the actual transactions. The actor crawls the search results, then parses each detail page to extract individual trades.

The actor handles these challenges seamlessly, outputting clean JSON as described above.

House Actor

The House provides a simpler structure: a daily ZIP file containing XML files for each disclosure period. The actor:

Downloads the ZIP from disclosures-clerk.house.gov.
Extracts the XML files, which contain all transactions in a standardized schema.
Parses the XML and normalizes it into the same JSON output schema as the Senate actor.

Because the House data is already machine-readable, the actor runs faster and requires fewer workarounds than the Senate version.

Using the Actors

Both actors are available on Apify:

Senate actor: apify.com/seralifatih/congress-trading-pipeline
House actor: apify.com/seralifatih/congress-trading-pipeline-1

You can run either one independently or chain them together to get a complete dataset. Each run returns an array of transaction objects, ready for ingestion into your own database or analysis pipeline. The output schema ensures consistency across sources.

Conclusion

By building these two Apify actors, I eliminated the need for expensive third-party APIs while gaining full control over data quality and format. The total cost per run is roughly one-tenth of QuiverQuant's monthly fee, and the data is directly sourced from official government websites—guaranteeing accuracy and timeliness. If you're a developer or investor interested in congressional trading transparency, these actors provide a free, open alternative to commercial data feeds.

Categories: 8 Things You Need to Know About Dark and Darker's Legal Victory Over Nexon Anatomy of a Supply Chain Attack: How Hackers Weaponized LiteLLM to Steal Your Data AI-Powered Vulnerability Discovery: A Practical Guide to Using GPT-5.5 and Claude Mythos May 2026 Servicing Updates: Enhanced Security and Stability for .NET and .NET Framework FCC Extends Security Update Waivers for Foreign Drones and Routers Through 2029 to Mitigate Cybersecurity Risks