How I use LLMs and Firecrawl to dig into company culture

tl;dr: I explored company culture by feeding their blog posts to an LLM. When web search broke, I scraped them with Firecrawl and re-ran the analyses. The workflow I built is really useful when you're looking at a new company.

When you're considering working at another company, knowing its culture is crucial. Pick the right one and you will thrive and become a multiplier in the team.

One way I found to get a feel for a company's culture is to collect blog posts from that company and ask an AI to analyze them, focusing on the people and culture:

  • Author's role, tone, experience, and inferred psychology

  • What the article suggests about the company culture

  • 3 main ideas, why they matter, and what they reveal psychologically

These analyses often reinforce what I already think about the company. Sometimes they highlight important traits I was missing.

Here's an example that reinforced my perception of Cursor ↗:

Highly ambitious, research-heavy, and growth-focused, with pride in scale and a belief in "magical" product experiences.

Screenshot from Cursor AI blog detailing Past, Present, and Future of fundraising, highlighting Series D funding round, AI developer tools roadmap, and growth plans.

To run these analyses, I use this prompt and pass in the post URLs of the company I'm interested in. If you want to try it, you can copy-paste it into ChatGPT or run my Python script that uses the OpenAI Responses API with its built-in web search tool.

Get my thoughts on working with AI

No spam. Unsubscribe anytime.

Amazing, right?

Yes, but sometimes there are blog posts that can't be analyzed this way. Either they are not indexed yet, or they cannot be retrieved.

For instance, with the Perplexity API—sonar-pro model—I got:

I cannot complete this analysis as requested. The search results provided do not contain the content from the specific articles you've listed.

And with the OpenAI API—gpt-5-search-api model—I got:

Here are the analyses for the subset of articles I was able to access; for the others there was an access error so some info is "Unknown."

In these cases, I use a different strategy.

I retrieve the blog posts as Markdown and pass them as context to the LLM along with the prompt. This way, I can still get a complete analysis.

I do this with the Firecrawl ↗ and Bright Data ↗ services. You give them a URL, and they give you back the web page as Markdown.

So far, Firecrawl has given me LLM-ready Markdown files, with only the content the LLM needs. But sometimes it trims information like the author's name and post date.

Bright Data has always given me complete data, but often much more than the LLM needs, sometimes doubling the number of characters compared to Firecrawl.

There are always tradeoffs. That's how it is.

Check How I uncovered Zapier's best AI automation articles from 2025 with LLMs (part 2) to see another way I use to scrape web pages into Markdown, this time without using any third-party API services.

Company blog posts analyzer prompt

Perform this analysis for every article listed below.  Do not skip any
of them.

For each article I provide, do the following and present the results
in a separate section per article:

- Author: (or "Unknown" if not stated)
- Date: publication date (or "Unknown")
- Role: author's role (job title or inferred role)
- Style: Style, tone, experience level, and inferred author psychology
  (1 sentence)
- Company culture: What does the article suggest about the company
  culture?  (1 sentence)
- Summary: summarize the article as 3 actionable items, each formatted
  as:

  - What: [max 12 words]
  - Why: [max 12 words]
  - Psychology: [max 12 words]

If information is missing or ambiguous, state that explicitly and
briefly explain your inference, if any.

Analysis with web search tool

If you want to try the script yourself:

  1. Add your OpenAI API key to the .env file:

    OPENAI_API_KEY=...
  2. Update the URLS variable.

  3. Run the following commands:

    $ uv init
    $ uv add openai python-dotenv
    $ uv run company_culture_with_web_search.py
  4. The blog posts analysis is saved to the summaries_with_web_search.md file.

company_culture_with_web_search.py

# company_culture_with_web_search.py
import logging
from dotenv import load_dotenv
from openai import OpenAI

URLS = [
    "https://cursor.com/blog/series-d",
    "https://cursor.com/blog/productivity",
    "https://cursor.com/blog/enterprise",
]

logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")

load_dotenv()

client = OpenAI()

base_prompt = """For each article I provide, do the following and present the results in a separate section per article:

- Author: (or "Unknown" if not stated)
- Date: publication date (or "Unknown")
- Role: author's role (job title or inferred role)
- Style: Style, tone, experience level, and inferred author psychology (1 sentence)
- Company culture: What does the article suggest about the company culture?  (1 sentence)
- Summary: summarize the article as 3 actionable items, each formatted as:

  - What: [max 12 words]
  - Why: [max 12 words]
  - Psychology: [max 12 words]

If information is missing or ambiguous, state that explicitly and briefly explain your inference, if any.
"""

prompt_with_urls = (
    "Perform this analysis for every article listed below.  Do not skip any of them.\n\n"
    + base_prompt
    + "\n\n"
    + "\n".join(URLS)
)

response = client.responses.create(
    model="gpt-5.1",
    tools=[{"type": "web_search"}],
    include=["web_search_call.action.sources"],
    input=prompt_with_urls,
)

with open("summaries_with_web_search.md", "w", encoding="utf-8") as f:
    f.write(response.output_text)

# Log URLs that aren't among the sources used by the web_search tool.
sources = []
for item in response.output:
    if item.type == "web_search_call":
        if action := getattr(item, "action", None):
            for s in getattr(action, "sources", []):
                if url := getattr(s, "url", None):
                    sources.append(url)

all_urls_included = True
for url in URLS:
    if url not in sources:
        all_urls_included = False
        logging.info(f"URL not included in web search sources: {url}")

if all_urls_included:
    logging.info("All URLs were included in web search sources.")

Analysis with Markdown posts as model input

If you want to try the script yourself:

  1. Add your Firecrawl and OpenAI API keys to the .env file:

    FIRECRAWL_API_KEY=...
    OPENAI_API_KEY=...
  2. Update the URLS variable.

  3. Run the following commands:

    $ uv init
    $ uv add openai python-dotenv firecrawl-py
    $ uv run company_culture.py
  4. The blog posts are saved to the articles.md files.

  5. The blog posts analysis is saved to the summaries.md file.

company_culture.py

# company_culture.py
import os
import logging
from dotenv import load_dotenv
from firecrawl import Firecrawl
from openai import OpenAI

URLS = [
    "https://cursor.com/blog/series-d",
    "https://cursor.com/blog/productivity",
    "https://cursor.com/blog/enterprise",
]

logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")

load_dotenv()

# 1) Download articles in markdown format with Firecrawl
api_key = os.getenv("FIRECRAWL_API_KEY")
firecrawl = Firecrawl(api_key=api_key)

# Ensure articles.md is empty
with open("articles.md", "w", encoding="utf-8"):
    pass

with open("articles.md", "a", encoding="utf-8") as f:
    for url in URLS:
        logging.info(f"Downloading {url}")
        try:
            page = firecrawl.scrape(url, formats=["markdown"])
            f.write("<|article_begin|>\n")
            f.write(page.markdown)
            f.write("\n<|article_end|>\n\n")
        except Exception as e:
            logging.error(f"Downloading {url}: {e}")

# 2) Summarize, focusing on company culture

client = OpenAI()

base_prompt = """For each article I provide, do the following and present the results in a separate section per article:

- Author: (or "Unknown" if not stated)
- Date: publication date (or "Unknown")
- Role: author's role (job title or inferred role)
- Style: Style, tone, experience level, and inferred author psychology (1 sentence)
- Company culture: What does the article suggest about the company culture?  (1 sentence)
- Summary: summarize the article as 3 actionable items, each formatted as:

  - What: [max 12 words]
  - Why: [max 12 words]
  - Psychology: [max 12 words]

If information is missing or ambiguous, state that explicitly and briefly explain your inference, if any.
"""

with open("articles.md", "r", encoding="utf-8") as f:
    articles = f.read()

prompt_with_articles = (
    "Perform this analysis for every article in the following Markdown snippets.  Do not skip any.  Each article is wrapped in `<|article_begin|>...<|article_end|>`.\n\n"
    + base_prompt
    + "\n\n"
    + articles
)

response = client.responses.create(
    model="gpt-5.1",
    input=prompt_with_articles,
)

with open("summaries.md", "w", encoding="utf-8") as f:
    f.write(response.output_text)

Retrieving posts with Bright Data

To retrieve the blog posts with Bright Data:

  1. Add the brightdata-sdk dependency:

    $ uv add brightdata-sdk
  2. Add your Bright Data API key to the .env file:

    BRIGHTDATA_API_TOKEN=...
  3. Replace the section "Download articles in markdown format with Firecrawl" in company_culture.md file with:

    from brightdata import bdclient
    
    api_token = os.getenv("BRIGHTDATA_API_KEY")
    brightdata = bdclient(api_token=api_token)
    
    # Ensure articles.md is empty
    with open("articles.md", "w", encoding="utf-8"):
        pass
    
    with open("articles.md", "a", encoding="utf-8") as f:
        for url in URLS:
            logging.info(f"Downloading {url}")
            try:
                page = brightdata.scrape(
                    url=url,
                    data_format="markdown",
                    async_request=False,
                    timeout=60,
                )
                f.write("<|article_begin|>\n")
                f.write(page)
                f.write("\n<|article_end|>\n\n")
            except Exception as e:
                logging.error(f"Downloading {url}: {e}")

Example: Past, Present, and Future (Series D) from Cursor blog

This is the analysis I got for Cursor recent blog post Past, Present, and Future (Series D) ↗.

## Past, Present, and Future (Series D)
URL: https://cursor.com/blog/series-d

- **Author:** Cursor Team (collective/byline, no individual named) ([cursor.com](https://cursor.com/blog/series-d))
- **Date:** Nov 13, 2025 ([cursor.com](https://cursor.com/blog/series-d))
- **Role:** Company leadership / communications team representing Cursor as a whole (inferred from “Cursor Team” byline and funding‑round content). ([cursor.com](https://cursor.com/blog/series-d))
- **Style:** Vision‑driven, confident, and celebratory, written from an experienced, ambitious, long‑term–oriented founder/exec mindset.
- **Company culture:** Highly ambitious, research‑heavy, and growth‑focused, with pride in scale and a belief in “magical” product experiences. ([cursor.com](https://cursor.com/blog/series-d))

**Summary (3 actionable items)**

1.
   - **What:** Communicate a bold, long‑term product vision to your stakeholders.
   - **Why:** Clear ambition attracts talent, investors, and early, committed users.
   - **Psychology:** Appeals to dreamers seeking meaning, not just incremental improvement.

2.
   - **What:** Use major funding milestones to reaffirm strategy and momentum.
   - **Why:** Milestones legitimize progress and reset confidence internally and externally.
   - **Psychology:** Reassures risk‑averse stakeholders that they’re backing a winner.

3.
   - **What:** Invest deeply in research and differentiated core technology.
   - **Why:** Proprietary capabilities create durable advantage and justify aggressive valuations. ([cursor.com](https://cursor.com/blog/series-d?utm_source=openai))
   - **Psychology:** Signals conviction and patience for long‑horizon, compounding payoffs.

References

That's all I have for today! Talk soon 👋

Recent posts
latestCodex: the relentless teammate reviewing every OpenAI PR
See how OpenAI's Codex coding agent takes on Rust rewrites
misc
Did you ask AI?
It's time to build a "just ask AI" habit
How I crafted TL;DRs with LLMs and modernized my blog (part 5)
See how impressed I was by GPT-4.1's meta descriptions
prompt
Curious about the tools I use?