How I implemented real-time file summaries using Python and OpenAI API

tl;dr: I wrote my first AI-powered auto-summarizer in Python for new text files. After sorting out OpenAI API updates, I added parallel event processing. Watching real-time summaries was super fun!

Yesterday, I implemented my first AI Automation. Yeah! Nothing fancy! I just wanted some code that runs and uses an LLM API.

I asked o4-mini (an OpenAI LLM) for a "simple working example of AI automation." It returned an auto_summarize.py script that watches the incoming directory for new .txt files. When it finds one, it summarizes the contents and writes the result to the outgoing directory using the OpenAI API.

It almost worked, but the script used an old version of openai library which led to the following error:

You tried to access openai.ChatCompletion, but this is no longer
supported in openai>=1.0.0 - see the README at
https://github.com/openai/openai-python for the API. a

You can run `openai migrate` to automatically upgrade your codebase to
use the 1.0.0 interface.

I knew how to fix it, but I tried openai migrate command out of curiosity. This didn't worked.

Get my latest AI automation experiments — free

No spam. Unsubscribe anytime.

I wanted to manage the project and dependencies with uv, but o4-mini didn't know about it. So I switched to GPT-4.1 (another OpenAI LLM), which gave a satisfying answer. In the end, I followed the official uv docs ↗ directly.

Basic script

After tweaking the code, here's the Python script I ended up with.

It uses gorakhargosh/watchdog ↗, specifically the Observer class, to monitor the ingoing directory. It creates a TextFileSummaryHandler class that inherits from FileSystemEventHandler and overrides on_created to summarize new .txt files.

auto_summarize.py

import os
import time
from openai import OpenAI
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler

IN_DIR  = "incoming"
OUT_DIR = "outgoing"

class TextFileSummaryHandler(FileSystemEventHandler):
    """Generate summaries for newly created `.txt` files.
    Use the OpenAI API to create a summary, saving the output
    in the `OUT_DIR` directory using the original filename
    with `.txt` replaced by `_summary.txt`."""

    def __init__(self, client):
        super().__init__()
        self.client = client

    def summary(self, text):
        resp = self.client.chat.completions.create(
            model="gpt-4.1",
            messages=[
                {"role": "system", "content": "You are a helpful assistant that summarizes text."},
                {"role": "user",   "content": text}
            ],
            temperature=0.3,
            max_completion_tokens=200)

        return resp.choices[0].message.content

    def on_created(self, event):
        if event.is_directory or not event.src_path.endswith(".txt"):
            return

        filepath = event.src_path
        filename = os.path.basename(filepath)
        print(f"Detected {filename}, summarizing…")

        with open(filepath, "r", encoding="utf-8") as f:
            text = f.read()

        summary = self.summary(text)

        out_path = os.path.join(OUT_DIR, filename.replace(".txt", "_summary.txt"))
        with open(out_path, "w", encoding="utf-8") as f:
            f.write(summary)

        print(f"Summary written to {out_path}")

if __name__ == "__main__":
    os.makedirs(IN_DIR, exist_ok=True)
    os.makedirs(OUT_DIR, exist_ok=True)

    client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
    event_handler = TextFileSummaryHandler(client)
    observer = Observer()
    observer.schedule(event_handler, IN_DIR, recursive=False)
    observer.start()
    print(f"Watching '{IN_DIR}/' for new .txt files. Press Ctrl+C to stop.")

    try:
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        pass
    finally:
        observer.stop()
        observer.join()

Set up the project and start monitoring

To try it, first export the OpenAI API key like this:

$ export OPENAI_API_KEY='<your_openai_api_key>'

Then, to set up the project and start monitoring, in the directory containing auto_summarize.py script, run:

$ uv init
$ uv add openai watchdog
$ uv run auto_summarize.py
Watching 'incoming/' for new .txt files. Press Ctrl+C to stop.

In another terminal, create the incoming/foo.txt file like this:

$ echo 'The sky is red.  The sea is green.' > incoming/foo.txt

Back in the first terminal, you'll see:

Watching 'incoming/' for new .txt files. Press Ctrl+C to stop.
Detected foo.txt, summarizing…
Summary written to outgoing/foo_summary.txt

Finally, the outgoing/foo_summary.txt file might contain:

The sky is described as red and the sea as green.

Processing Files in Parallel

Well.

I immediately thought: are the newly created files actually processed in parallel? The great GPT-4.1 will tell you no:

No, by default, the FileSystemEventHandler.on_created method in watchdog is not called in parallel. All event handling is done sequentially, in the same observer thread.

Details:

  • watchdog uses a single background thread (the Observer) to monitor filesystem events.

  • When files are created, the observer invokes the handler's methods (on_created, on_modified, etc.) one after another, not concurrently.

  • If you drop many files quickly, the events are processed in order. If on_created takes a long time (e.g., calling a slow AI API), later events have to wait.

This is true. But how can we actually check this claim?

We could look at the source code of the watchdog package. Or, we could just drop a bunch of files into the incoming directory and see how long it takes for them all to be processed. If the handling is sequential, the total time should be the number of files times the OpenAI API call latency.

If you want to try it yourself, run:

for i in {1..5};do echo 'foo bar baz' > incoming/foo-$i.txt; done

If you'd rather not use up your API credits, you can replace this line (before creating those few files):

summary = self.summary(text)

with these lines:

time.sleep(2)
summary = "The summary."

So, how can we handle these events in parallel? Well, we can use ThreadPoolExecutor class like this:

import os
import time
from openai import OpenAI
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
from concurrent.futures import ThreadPoolExecutor

IN_DIR  = "incoming"
OUT_DIR = "outgoing"

class TextFileSummaryHandler(FileSystemEventHandler):
    """Generate summaries for newly created `.txt` files.
    Use the OpenAI API to create a summary, saving the output
    in the `OUT_DIR` directory using the original filename
    with `.txt` replaced by `_summary.txt`."""

    def __init__(self, client, executor):
        super().__init__()
        self.client = client
        self.executor = executor

    def summary(self, text):
        resp = self.client.chat.completions.create(
            model="gpt-4.1",
            messages=[
                {"role": "system", "content": "You are a helpful assistant that summarizes text."},
                {"role": "user",   "content": text}
            ],
            temperature=0.3,
            max_completion_tokens=200)

        return resp.choices[0].message.content

    def process_file(self, filepath):
        filename = os.path.basename(filepath)
        print(f"Detected {filename}, summarizing…")

        with open(filepath, "r", encoding="utf-8") as f:
            text = f.read()

        summary = self.summary(text)

        out_path = os.path.join(OUT_DIR, filename.replace(".txt", "_summary.txt"))
        with open(out_path, "w", encoding="utf-8") as f:
            f.write(summary)

        print(f"Summary written to {out_path}")

    def on_created(self, event):
        if event.is_directory or not event.src_path.endswith(".txt"):
            return

        self.executor.submit(self.process_file, event.src_path)

if __name__ == "__main__":
    os.makedirs(IN_DIR, exist_ok=True)
    os.makedirs(OUT_DIR, exist_ok=True)

    client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
    executor = ThreadPoolExecutor(max_workers=4)
    event_handler = TextFileSummaryHandler(client, executor)
    observer = Observer()
    observer.schedule(event_handler, IN_DIR, recursive=False)
    observer.start()
    print(f"Watching '{IN_DIR}/' for new .txt files. Press Ctrl+C to stop.")

    try:
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        pass
    finally:
        observer.stop()
        observer.join()
        executor.shutdown()

That's all I have for today! Talk soon 👋

Recent posts
latestHow I crafted TL;DRs with LLMs and modernized my blog (part 5)
See how impressed I was by GPT-4.1's meta descriptions
prompt
How I crafted TL;DRs with LLMs and modernized my blog (part 4)
Check how I optimized images for better blog performance
code
How I explored Google Sheets to Gmail automation through Zapier before building it in Python (part 1)
See how I built my first Zapier Gmail alert from Sheets updates
no-code
How I realized AI automation is all about what you automate
Check my favorite CRM AI automation (so far) from Zapier blog
misc
Curious about the tools I use?