I made a CLI to enrich French-English phrasebooks with AI translations, audio, and images
There's no French-to-English vocabulary set that's perfect for you.
They're always too broad, yet still incomplete. They include too many rare sentences you'll never use. And they miss too many simple, key ones.
That's why these learning materials often get paired with phrasebooks. They're collections of sentences you run into and want to remember.
While phrasebooks work well, they're limited to what you put in.
But does it have to be that way?
The other day, I thought:
-
What if I could enrich each phrasebook item with a couple of related sentences that highlight a grammar point, useful nouns or verbs, or alternative phrasing.
-
What if all these sentences were paired with the matching audio.
-
What if they all came with an illustrative image.
That would be the perfect dataset I need to improve my English. Something tailored to my needs.
I don't think I would've had this idea before the AI boom. Because to make it happen, I'd need to hire an English teacher to come up with these related sentences. I'd need an English speaker to record them. And I'd need a designer to make the images.
Today, none of that is required. You can automate this phrasebook enrichment with AI. A few API calls to the OpenAI API ↗ and you're done. More precisely, for each record in your phrasebook, you'll do:
-
One call to the gpt-5.2 model ↗ to generate a couple of related translations,
-
Three calls to the gpt-4o-mini-tts model ↗ to turn English sentences into audios,
-
Three calls to the gpt-image-1.5 model ↗ to generate images that describe your sentences.
You wrap this in a nice CLI program that manages
these API calls. It stores the enriched translations in a table,
and the audios and images in a
media directory. And
you're done.
That's exactly what I did with the phrasebook-fr-to-en ↗ program.
It's a Python CLI you can install with uv ↗ like this:
$ uv tool install phrasebook-fr-to-en
Once you've generated an OpenAI API key on the
OpenAI platform ↗
and got verified, you can export that key. Then you call the CLI
with your phrasebook which must be TSV file (TAB separation) with
the 3 columns date,
french and
english. Let's call it
my-phrasebook.tsv.
$ export OPENAI_API_KEY=<your-api-key>
$ phrasebook-fr-to-en my-phrasebook.tsv
You'll get the enriched data saved in
enriched_phrasebook.tsv. And
all the audios and images saved in the
media directory. They sit
next to your original phrasebook, which stays unchanged.
For instance, with a record in the original
phrasebook containing the English sentences
"We need to get dressed. Put on your coat.", phrasebook-fr-to-en would
generate 2 new related translations like these:
-
"Dépêche-toi de t'habiller, on va être en retard." -> "Hurry up and get dressed—we're going to be late."
-
"N'oublie pas ton écharpe : il fait froid dehors." -> "Don't forget your scarf—it's cold outside."
It would also generate 3 audios and 3 images. Here are the ones that match the original sentence "We need to get dressed. Put on your coat.".
Don't you think this is super cool?
We live in an era where learning no longer has to be top-down. What do you want to learn? How do you want to learn it? Go build your own learning material. Boom. That's it. With AI, the sky's the limit.
No spam. Unsubscribe anytime.
In the next posts, I'll share some lessons learned while
building this
phrasebook-fr-to-en CLI.
Stay tuned.
That's all I have for today! Talk soon 👋
For Anki users
The enriched table created from the original phrasebook includes
two columns,
anki_audio and
anki_img. They're
really
handy for
Anki ↗
users. (Anki is a flashcard program that helps you remember things.
It's great for learning a language.)
They contain formatted fields for audio and image that you can use directly in your Anki decks:
[sound:phrasebook-fr-to-en-1.mp3]
<img src="phrasebook-fr-to-en-1.png">
This way, you can import
enhanced_phrasebook.tsv
directly into Anki.
No changes are needed to get audio played and images
displayed.
Note that this only works:
-
If you enable "Allow HTML" option when importing the enriched file,
-
If you copy the audios and images from the
mediadirectory to your Ankicollection.mediadirectory.
See Anki docs:
Creating a phrasebook with ChatGPT
There's many ways to
create the original phrasebook you can then use
with phrasebook-fr-to-en.
I describe here how I do it with ChatGPT.
It's not fancy. It just works.
-
I have a chat where I asked ChatGPT to just translate to English what I dictate in French.
-
Each time I want to remember something, I take my phone and dictate it in French.
-
Then every day I ask ChatGPT to give me the list of the last translated sentences, after the last list, in order of appearance, in TSV format (TAB-separated), with 2 columns: french and english. At first I had to explain this in detail. Now I just say: "make the list".
-
Then I copy and paste it into my phrasebook on my computer, adding a first column with the date.
phrasebook-fr-to-en code
This software has 100% test coverage.
Prompts
-
Instruction to generate related translations:
# Role and Objective You are a bilingual (French/English) teacher specializing in practical language learning. Your task is to help expand a French-to-English phrasebook by creating relevant sentence pairs and highlighting key language aspects. # Instructions - For each prompt, you will get a French sentence and its English translation. - Your tasks: 1. Generate exactly two related English sentences, each with its French translation. 2. Use these to show: - An English grammar point, or - Useful nouns, verbs, or - Alternative phrasing (formality, slang, etc.). 3. Ensure all English examples are natural and suitable for daily use. # Context - The learner is a native French speaker advancing in English. - The goal is to create a learner-friendly, practical phrasebook. -
Instruction to generate audio from English sentences:
Speak in a neutral General American accent at a natural conversational pace. Use clear, natural intonation with a neutral tone, and avoid emotional coloring, character impressions, and whispering. -
Prompt to generate images:
Create a clean, minimal flat vector illustration representing: "{english}". Use 2-4 simple objects/characters maximum, solid colors, white background. No text, no letters, no numbers, no icons that resemble writing.
cli.py
from __future__ import annotations
import os
import base64
import logging
import time
from pathlib import Path
from typing import TYPE_CHECKING, Any, Annotated, Iterator
import contextlib
import typer
from watchdog.events import FileSystemEvent, FileSystemEventHandler
from watchdog.observers import Observer
__version__ = "0.3.0"
if TYPE_CHECKING:
from pydantic import BaseModel
import pandas as pd
from openai import OpenAI
logger = logging.getLogger(__name__)
MEDIA_PREFIX = "phrasebook-fr-to-en-"
# See https://docs.ankiweb.net/importing/text-files.html#importing-media
ANKI_AUDIO_TEMPLATE = "[sound:{}]"
ANKI_IMG_TEMPLATE = '<img src="{}">'
ENRICHED_COLUMNS: list[str] = [
"french",
"english",
"anki_audio",
"anki_img",
"generated_from",
"id",
"audio_filename",
"img_filename",
"date",
]
PHRASEBOOK_COLUMNS: list[str] = ["date", "french", "english"]
app = typer.Typer(pretty_exceptions_enable=False)
def enriched_path_func(phrasebook_path: Path) -> Path:
return (phrasebook_path.parent / "enriched_phrasebook.tsv").absolute()
def phrasebook_dir_func(phrasebook_path: Path) -> Path:
return phrasebook_path.parent.absolute()
def media_dir_func(phrasebook_path: Path) -> Path:
return phrasebook_path.parent.absolute() / "media"
def read_phrasebook(phrasebook_path: Path) -> pd.DataFrame:
import pandas as pd
try:
df = pd.read_csv(phrasebook_path, sep="\t", dtype="string")
except FileNotFoundError as err:
raise err
except Exception as err:
raise ValueError(f"Invalid file {phrasebook_path}: {err}")
if list(df.columns) != PHRASEBOOK_COLUMNS:
raise ValueError(
f"Invalid header in {phrasebook_path}. Expected {PHRASEBOOK_COLUMNS}, got {list(df.columns)}"
)
return df
def read_enriched(enriched_path: Path) -> pd.DataFrame:
import pandas as pd
if not enriched_path.exists():
return pd.DataFrame(columns=pd.Index(ENRICHED_COLUMNS), dtype="string")
try:
df = pd.read_csv(enriched_path, sep="\t", dtype="string")
except Exception as err:
raise ValueError(f"Invalid file {enriched_path}: {err}")
if list(df.columns) != ENRICHED_COLUMNS:
raise ValueError(
f"Invalid header in {enriched_path}. Expected {ENRICHED_COLUMNS}, got {list(df.columns)}"
)
df["id"] = df["id"].astype("Int64")
df["generated_from"] = df["generated_from"].astype("Int64")
return df
@contextlib.contextmanager
def log_request_info_when_api_error_raised() -> Iterator[None]:
from openai import APIError
try:
yield
except APIError as exc:
logger.error(f"{exc.request!r}")
# It's safe to log httpx headers because its repr
# sets 'authorization' to '[secure]' hidding our API key
logger.error(f"Request headers - {exc.request.headers!r}")
logger.error(f"Request body - {exc.request.content.decode()}")
raise exc
def generate_translations(
record_original: tuple[str, str, str], client: OpenAI
) -> list[tuple[str, str]]:
from pydantic import BaseModel
class Translation(BaseModel):
french: str
english: str
class Translations(BaseModel):
# DON'T USE: conlist(tuple[str, str], min_length=2, max_length=2)
# This broke OpenAI API which generated outputs with 128,000 tokens.
# Mostly, whitespaces and newlines.
translations: list[Translation]
_, french, english = record_original
model = "gpt-5.2"
instructions = """# Role and Objective
You are a bilingual (French/English) teacher specializing in practical language learning. Your task is to help expand a French-to-English phrasebook by creating relevant sentence pairs and highlighting key language aspects.
# Instructions
- For each prompt, you will get a French sentence and its English translation.
- Your tasks:
1. Generate exactly two related English sentences, each with its French translation.
2. Use these to show:
- An English grammar point, or
- Useful nouns, verbs, or
- Alternative phrasing (formality, slang, etc.).
3. Ensure all English examples are natural and suitable for daily use.
# Context
- The learner is a native French speaker advancing in English.
- The goal is to create a learner-friendly, practical phrasebook."""
input_msg = f"{french} -> {english}"
logger.info(f"Generating translations for record {record_original}")
attempt = 1
translations = []
while not translations:
with log_request_info_when_api_error_raised():
response = client.responses.parse(
model=model,
instructions=instructions,
input=input_msg,
text_format=Translations,
max_output_tokens=256,
)
# If we decided to use gpt-5-nano with the same low max_output_tokens,
# tokens would be consumed by the reasoning, we would get
# no text output, and this would result in output_parsed being None.
if not response.output_parsed:
if attempt < 3:
logger.info(
f"No translations were returned by the model at attempt {attempt}."
)
attempt += 1
continue
else:
raise ValueError(
f"No translations were returned by the model.\nResponse: {response.to_json()}"
)
translations = response.output_parsed.translations
if (tlen := len(translations)) < 2:
if attempt < 3:
logger.info(
f"Wrong number of translations returned by the model at attempt {attempt}."
)
attempt += 1
translations = []
continue
else:
raise ValueError(
(
f"Wrong number of translations: {tlen}. 2 were expected.\n"
f"Response: {response.to_json()}"
)
)
logger.info(
f"Translations generated for record {record_original} using model {model} and input '{input_msg}'"
)
return [(t.french, t.english) for t in translations][:2]
def generate_audio(record: dict[str, Any], media_dir: Path, client: OpenAI) -> None:
id_record = record["id"]
input_msg = record["english"]
audio_path = media_dir / record["audio_filename"]
media_dir.mkdir(parents=True, exist_ok=True)
logger.info(f"Generating audio '{input_msg}' for record {id_record}")
with log_request_info_when_api_error_raised():
with client.audio.speech.with_streaming_response.create(
model="gpt-4o-mini-tts",
voice="cedar",
input=input_msg,
instructions="Speak in a neutral General American accent at a natural conversational pace. Use clear, natural intonation with a neutral tone, and avoid emotional coloring, character impressions, and whispering.",
) as response:
response.stream_to_file(audio_path)
logger.info(f"Audio has been generated: {audio_path}.")
return None
def generate_img(record: dict[str, Any], media_dir: Path, client: OpenAI) -> None:
id_record = record["id"]
english = record["english"]
img_path = media_dir / record["img_filename"]
media_dir.mkdir(parents=True, exist_ok=True)
logger.info(f"Generating image '{english}' for record {id_record}")
prompt = (
f'Create a clean, minimal flat vector illustration representing: "{english}".\n'
"Use 2-4 simple objects/characters maximum, solid colors, white background.\n"
"No text, no letters, no numbers, no icons that resemble writing.\n"
)
with log_request_info_when_api_error_raised():
response = client.images.generate(
model="gpt-image-1.5",
prompt=prompt,
size="1024x1024",
quality="low",
output_format="png",
)
image_base64 = response.data[0].b64_json
image_bytes = base64.b64decode(image_base64)
with open(img_path, "wb") as f:
f.write(image_bytes)
logger.info(f"Image has been generated: {img_path}.")
return None
def next_id(enriched_df: pd.DataFrame) -> int:
if enriched_df.empty:
return 1
max_id = int(enriched_df["id"].max())
return max_id + 1
def build_record(
record_id: int,
generated_from: Any,
date: str,
french: str = "",
english: str = "",
) -> dict[str, Any]:
audio_filename = f"{MEDIA_PREFIX}{record_id}.mp3"
img_filename = f"{MEDIA_PREFIX}{record_id}.png"
return {
"id": record_id,
"french": french,
"english": english,
"anki_audio": ANKI_AUDIO_TEMPLATE.format(audio_filename),
"anki_img": ANKI_IMG_TEMPLATE.format(img_filename),
"generated_from": generated_from,
"audio_filename": audio_filename,
"img_filename": img_filename,
"date": date,
}
def enrich_record(
record_original: tuple[str, str, str],
next_id: int,
media_dir: Path,
client: OpenAI,
) -> list[dict[str, Any]]:
import pandas as pd
date, french, english = record_original
try:
translations = generate_translations(record_original, client)
except Exception:
logger.exception(
f"Failed to generate translations while processing record {record_original}"
)
return []
id_record_original = next_id
new_records: list[dict[str, Any]] = [
build_record(
record_id=id_record_original,
french=french,
english=english,
generated_from=pd.NA,
date=date,
)
]
for i in range(len(translations)):
new_records.append(
build_record(
record_id=id_record_original + i + 1,
french=translations[i][0],
english=translations[i][1],
generated_from=id_record_original,
date=date,
)
)
try:
for record in new_records:
generate_audio(record, media_dir, client)
except Exception:
logger.exception(
f"Failed to generate audios while processing record {record_original}"
)
return []
try:
for record in new_records:
generate_img(record, media_dir, client)
except Exception:
logger.exception(
f"Failed to generate images while processing record {record_original}"
)
return []
return new_records
def save_new_records(
new_records: list[dict[str, Any]], enriched_df: pd.DataFrame, enriched_path: Path
) -> pd.DataFrame:
import pandas as pd
new_df = pd.DataFrame(
new_records, columns=pd.Index(ENRICHED_COLUMNS), dtype="string"
)
new_df["id"] = new_df["id"].astype("Int64")
new_df["generated_from"] = new_df["generated_from"].astype("Int64")
updated = (
pd.concat([enriched_df, new_df], ignore_index=True)
if not enriched_df.empty
else new_df
)
updated.to_csv(enriched_path, sep="\t", index=False)
return updated
def enrich_phrasebook(phrasebook_path: Path, client: OpenAI) -> bool:
media_dir = media_dir_func(phrasebook_path)
enriched_path = enriched_path_func(phrasebook_path)
try:
phrasebook_df = read_phrasebook(phrasebook_path)
enriched_df = read_enriched(enriched_path)
except Exception as err:
logger.exception(err)
return False
existing_english: set[str] = (
set(enriched_df["english"].dropna().to_list())
if not enriched_df.empty
else set()
)
for record_original in phrasebook_df.itertuples(index=False, name=None):
_, _, english = record_original
if english in existing_english:
logger.info(f"Skip existing record: {record_original}")
continue
new_records = enrich_record(
record_original, next_id(enriched_df), media_dir, client
)
if not new_records:
return False
try:
enriched_df = save_new_records(new_records, enriched_df, enriched_path)
except Exception:
logger.exception(
f"Failed to save enriched records from record {record_original} in file {enriched_path}"
)
return False
existing_english.add(english)
logger.info(f"Record has been enriched: {record_original} -> {enriched_path}")
return True
def watch_phrasebook(phrasebook_path: Path, client: OpenAI) -> None:
class Handler(FileSystemEventHandler):
def on_modified(self, event: FileSystemEvent) -> None:
if event.src_path == str(phrasebook_path):
enrich_phrasebook(phrasebook_path, client)
observer = Observer()
observer.schedule(
Handler(), str(phrasebook_dir_func(phrasebook_path)), recursive=False
)
observer.start()
logger.info(f"Start watching file {phrasebook_path}")
try:
while True:
time.sleep(1)
finally: # pragma: no cover
observer.stop()
observer.join()
def version_callback(version: bool):
if version:
print(f"phrasebook-fr-to-en {__version__}")
raise typer.Exit()
def setup_logging(log_file: Path | None = None):
log_format = "%(asctime)s %(levelname)s %(name)s %(message)s"
log_datefmt = "%Y-%m-%d %H:%M:%S"
if log_file:
log_file.parent.mkdir(parents=True, exist_ok=True)
logging.basicConfig(
format=log_format, datefmt=log_datefmt, filename=log_file.absolute()
)
else:
logging.basicConfig(format=log_format, datefmt=log_datefmt)
logger.setLevel(logging.INFO)
@app.command()
def run(
file: Annotated[
Path,
typer.Argument(
help=(
"Filename of the phrasebook to be enriched. It must be a TSV format file (TAB separation) with the columns: date, french, english. For instance:\n\n\n\n"
"date french english\n\n"
"2025-12-15 J'aime l'eau. I like water.\n\n"
"2025-12-16 Il fait froid. It is cold.\n\n"
)
),
],
watch: Annotated[
bool,
typer.Option(
"--watch",
help="Watch the original phrasebook file for changes, and enrich any new record added to it.",
),
] = False,
log_file: Annotated[
Path | None,
typer.Option(
help="Log to this file if provided. Default is stderr.",
),
] = None,
version: Annotated[
bool,
typer.Option("--version", callback=version_callback, is_eager=True),
] = False,
) -> None:
# We escape \[sound:...] because this is a reserved syntax for Rich.
# If we don't, [sound:...] is removed from the help message.
# But when we do this we get a SyntaxWarning, so we use raw string
# r"""...""".
r"""
Enrich French to English phrasebooks with OpenAI API.
-----
[IMPORTANT] This program uses the OpenAI API with the following models:
- https://platform.openai.com/docs/models/gpt-5.2
- https://platform.openai.com/docs/models/gpt-4o-mini-tts
- https://platform.openai.com/docs/models/gpt-image-1.5
To use it, you need to register with OpenAI, be verified as an organization (required for the image model), and create an API key: https://platform.openai.com.
Once you've done this, set OPENAI_API_KEY as an environment variable before you run the program, like this:
$ export OPENAI_API_KEY=<your-api-key>
-----
This program takes a TSV (TAB separation) file as input. Each row is a French to English translation. It uses the following columns: date, french, english.
For each translation (each row), two new related translations are generated. The goal is to show:
- An English grammar point, or
- Useful nouns, verbs, or
- Alternative phrasing (formality, slang, etc.).
These new translations, along with the original, are saved in the file "enriched_phrasebook.tsv". It sits next to your phrasebook file. Records in your original phrasebook whose english field matches a record in the enriched phrasebook are skipped.
Your original phrasebook is left unchanged.
For all translations (original and AI generated), an English audio and an image are generated. They are saved in a "media" directory next to the original phrasebook file.
For instance, if "my-phrasebook.tsv" contains the following record
(columns separated by tabs)
date french english
2025-12-15 Montez les escaliers. Climb the stairs.
and you run the following commands:
$ export OPENAI_API_KEY=<your-api-key>
$ phrasebook-fr-to-en my-phrasebook.tsv
This will produce the file "enriched_phrasebook.tsv" with AI generated translations. It has the following columns: french, english, anki_audio, anki_img, generated_from, id, audio_filename, img_filename, date.
french english anki_audio anki_img generated_from id audio_filename img_filename date
Montez les escaliers. Climb the stairs. \[sound:phrasebook-fr-to-en-1.mp3] "<img src=""phrasebook-fr-to-en-1.png"">" 1 phrasebook-fr-to-en-1.mp3 phrasebook-fr-to-en-1.png 2025-11-15
Prenez les escaliers, s'il vous plaît. Please take the stairs. \[sound:phrasebook-fr-to-en-2.mp3] "<img src=""phrasebook-fr-to-en-2.png"">" 1 2 phrasebook-fr-to-en-2.mp3 phrasebook-fr-to-en-2.png 2025-11-15
Montez deux étages et tournez à gauche. Go up two floors and turn left. \[sound:phrasebook-fr-to-en-3.mp3] "<img src=""phrasebook-fr-to-en-3.png"">" 1 3 phrasebook-fr-to-en-3.mp3 phrasebook-fr-to-en-3.png 2025-11-15
This also generates 3 audios and 3 images.
Your directory then looks like this:
.
├── enriched_phrasebook.tsv
├── my-phrasebook.tsv
└── media
├── phrasebook-fr-to-en-1.mp3
├── phrasebook-fr-to-en-1.png
├── phrasebook-fr-to-en-2.mp3
├── phrasebook-fr-to-en-2.png
├── phrasebook-fr-to-en-3.mp3
└── phrasebook-fr-to-en-3.png
For Anki users. Did you notice the columns "anki_audio" and "anki_img"? They contain formatted fields for audio and image that you can use directly in your Anki decks:
\[sound:phrasebook-fr-to-en-1.mp3]
<img src="phrasebook-fr-to-en-1.png">
This way you can import "enhanced_phrasebook.tsv" directly into Anki. No changes are needed to get audio played and images displayed.
Note that this only works:
1) If you enable "Allow HTML" option when importing the enriched file,
2) If you copy the audios and images from the "media" directory to your Anki "collection.media" directory.
See Anki docs:
- https://docs.ankiweb.net/importing/text-files.html#importing-media
- https://docs.ankiweb.net/files.html
"""
from openai import OpenAI, APIError
setup_logging(log_file)
if not os.getenv("OPENAI_API_KEY"):
logger.error("Set OPENAI_API_KEY environment variable to run the app.")
raise typer.Exit(code=1)
client = OpenAI()
phrasebook_path = file.absolute()
if not enrich_phrasebook(phrasebook_path, client):
raise typer.Exit(code=1)
if watch: # pragma: no cover
watch_phrasebook(phrasebook_path, client)
return
def main() -> None: # pragma: no cover
app()
if __name__ == "__main__": # pragma: no cover
main()
test_cli.py
import os
import contextlib
import threading
import time
from pathlib import Path
import logging
from typing import Any
from unittest.mock import Mock
import pytest
import pandas as pd
import phrasebook_fr_to_en.cli as cli
from typer.testing import CliRunner
import re
import httpx
from respx import MockRouter
from openai import OpenAI, APIError
from pydantic import TypeAdapter, conlist
from dotenv import load_dotenv
from mutagen.mp3 import MP3
from mutagen import MutagenError
from PIL import Image
if os.getenv("OPENAI_LIVE") == "1":
load_dotenv() # For OPENAI_API_KEY variable
runner = CliRunner()
## Utils
@contextlib.contextmanager
def disable_log_capture():
from pytest import MonkeyPatch
logger = logging.getLogger()
with MonkeyPatch().context() as mp:
mp.setattr(logger, "disabled", False)
mp.setattr(logger, "handlers", [])
mp.setattr(logger, "level", logging.NOTSET)
yield
def is_mp3(path: Path) -> bool:
try:
MP3(path)
return True
except MutagenError:
return False
def is_png(path: Path) -> bool:
try:
with Image.open(path) as im:
return im.format == "PNG"
except Exception:
return False
## Test `cli.enrich_record`
def test_enrich_record_ok(monkeypatch: pytest.MonkeyPatch, tmp_path: Path) -> None:
mock_generate_translations = Mock(return_value=[("fr1", "en1"), ("fr2", "en2")])
mock_generate_audio = Mock(return_value=None)
mock_generate_img = Mock(return_value=None)
monkeypatch.setattr(cli, "generate_translations", mock_generate_translations)
monkeypatch.setattr(cli, "generate_audio", mock_generate_audio)
monkeypatch.setattr(cli, "generate_img", mock_generate_img)
record_original = ("2025-01-01", "bonjour", "hello")
# We don't use client because generate function are mocked,
# but we still have to pass as argument of `enrich_record`
client = OpenAI(api_key="foo-api-key")
phrasebook_dir = tmp_path # We don't use it neither but must pass it as argument
records = cli.enrich_record(record_original, 10, phrasebook_dir, client)
assert mock_generate_audio.call_count == 3
assert mock_generate_img.call_count == 3
assert records == [
{
"id": 10,
"french": "bonjour",
"english": "hello",
"anki_audio": "[sound:phrasebook-fr-to-en-10.mp3]",
"anki_img": '<img src="phrasebook-fr-to-en-10.png">',
"generated_from": pd.NA,
"audio_filename": "phrasebook-fr-to-en-10.mp3",
"img_filename": "phrasebook-fr-to-en-10.png",
"date": "2025-01-01",
},
{
"id": 11,
"french": "fr1",
"english": "en1",
"anki_audio": "[sound:phrasebook-fr-to-en-11.mp3]",
"anki_img": '<img src="phrasebook-fr-to-en-11.png">',
"generated_from": 10,
"audio_filename": "phrasebook-fr-to-en-11.mp3",
"img_filename": "phrasebook-fr-to-en-11.png",
"date": "2025-01-01",
},
{
"id": 12,
"french": "fr2",
"english": "en2",
"anki_audio": "[sound:phrasebook-fr-to-en-12.mp3]",
"anki_img": '<img src="phrasebook-fr-to-en-12.png">',
"generated_from": 10,
"audio_filename": "phrasebook-fr-to-en-12.mp3",
"img_filename": "phrasebook-fr-to-en-12.png",
"date": "2025-01-01",
},
]
def test_enrich_record_translation_error(
monkeypatch: pytest.MonkeyPatch, tmp_path: Path, caplog: pytest.LogCaptureFixture
) -> None:
mock_generate_translations = Mock(
side_effect=RuntimeError("Failed in `generate_translations`")
)
mock_generate_audio = Mock(return_value=None)
mock_generate_img = Mock(return_value=None)
monkeypatch.setattr(cli, "generate_translations", mock_generate_translations)
monkeypatch.setattr(cli, "generate_audio", mock_generate_audio)
monkeypatch.setattr(cli, "generate_img", mock_generate_img)
record_original = ("2025-01-01", "bonjour", "hello")
# We don't use client because generate function are mocked,
# but we still have to pass as argument of `enrich_record`
client = OpenAI(api_key="foo-api-key")
phrasebook_dir = tmp_path # We don't use it neither but must pass it as argument
records = cli.enrich_record(record_original, 10, phrasebook_dir, client)
assert mock_generate_translations.called
assert "Failed to generate translations while processing record" in caplog.text
assert not mock_generate_audio.called
assert not mock_generate_img.called
assert records == []
def test_enrich_record_audio_error(
monkeypatch: pytest.MonkeyPatch, tmp_path: Path, caplog: pytest.LogCaptureFixture
) -> None:
mock_generate_translations = Mock(return_value=[("fr1", "en1"), ("fr2", "en2")])
mock_generate_audio = Mock(side_effect=RuntimeError("Failed in `generate_audio`"))
mock_generate_img = Mock(return_value=None)
monkeypatch.setattr(cli, "generate_translations", mock_generate_translations)
monkeypatch.setattr(cli, "generate_audio", mock_generate_audio)
monkeypatch.setattr(cli, "generate_img", mock_generate_img)
record_original = ("2025-01-01", "bonjour", "hello")
# We don't use client because generate function are mocked,
# but we still have to pass as argument of `enrich_record`
client = OpenAI(api_key="foo-api-key")
phrasebook_dir = tmp_path # We don't use it neither but must pass it as argument
records = cli.enrich_record(record_original, 10, phrasebook_dir, client)
assert mock_generate_translations.called
assert mock_generate_audio.called
assert "Failed to generate audios while processing record" in caplog.text
assert not mock_generate_img.called
assert records == []
def test_enrich_record_img_error(
monkeypatch: pytest.MonkeyPatch, tmp_path: Path, caplog: pytest.LogCaptureFixture
) -> None:
mock_generate_translations = Mock(return_value=[("fr1", "en1"), ("fr2", "en2")])
mock_generate_audio = Mock(return_value=None)
mock_generate_img = Mock(side_effect=RuntimeError("Failed in `generate_img`"))
monkeypatch.setattr(cli, "generate_translations", mock_generate_translations)
monkeypatch.setattr(cli, "generate_audio", mock_generate_audio)
monkeypatch.setattr(cli, "generate_img", mock_generate_img)
record_original = ("2025-01-01", "bonjour", "hello")
# We don't use client because generate function are mocked,
# but we still have to pass as argument of `enrich_record`
client = OpenAI(api_key="foo-api-key")
phrasebook_dir = tmp_path # We don't use it neither but must pass it as argument
records = cli.enrich_record(record_original, 10, phrasebook_dir, client)
assert mock_generate_translations.called
assert mock_generate_audio.called
assert mock_generate_img.called
assert "Failed to generate images while processing record" in caplog.text
assert records == []
## Test `cli.app`
def test_app_help():
result = runner.invoke(cli.app, ["--help"], catch_exceptions=False)
assert result.exit_code == 0, result.output
assert "Enrich French to English phrasebooks with OpenAI API." in result.output
# help of `file` argument
assert re.search(r"file.*Filename of the phrasebook to be enriched.", result.output)
result = runner.invoke(cli.app, catch_exceptions=False)
assert result.exit_code == 2, result.output
assert "Missing argument 'FILE'." in result.output
def test_app_version() -> None:
result = runner.invoke(cli.app, ["--version"], catch_exceptions=False)
assert result.exit_code == 0, result.output
assert "phrasebook-fr-to-en " in result.output
def test_app_log_file_option(
tmp_path: Path, monkeypatch: pytest.MonkeyPatch, caplog: pytest.LogCaptureFixture
) -> None:
monkeypatch.delenv("OPENAI_API_KEY", raising=False)
# We exit with status 1 because OPENAI_API_KEY not set.
# And we log the error to stderr (by default) because we don't
# provide a --log-file.
with disable_log_capture():
result = runner.invoke(cli.app, ["some-filename"], catch_exceptions=False)
assert result.exit_code == 1, result.output
assert "Set OPENAI_API_KEY environment variable to run the app." in result.output
# We exit with status 1 because OPENAI_API_KEY not set.
# And we log the error in log_file file using --log-file option
log_file = tmp_path / "logs"
with disable_log_capture():
result = runner.invoke(
cli.app,
["--log-file", str(log_file), "some-filename"],
catch_exceptions=False,
)
assert result.exit_code == 1, result.output
assert (
"Set OPENAI_API_KEY environment variable to run the app." not in result.output
)
assert (
"Set OPENAI_API_KEY environment variable to run the app."
in log_file.read_text()
)
def test_app_errors(
tmp_path_factory: pytest.TempPathFactory,
monkeypatch: pytest.MonkeyPatch,
caplog: pytest.LogCaptureFixture,
):
# OPENAI_API_KEY must be set to run the app
monkeypatch.delenv("OPENAI_API_KEY", raising=False)
result = runner.invoke(cli.app, ["some-filename"], catch_exceptions=False)
assert result.exit_code == 1, result.output
assert "Set OPENAI_API_KEY environment variable to run the app." in caplog.text
caplog.clear()
# OPENAI_API_KEY must be set to run the app
# As we mock `generate_...` functions, we don't hit OpenAI API,
# so we don't have to use a real API key
monkeypatch.setenv("OPENAI_API_KEY", "foo-api-key")
# Phrasebook file doesn't exist
tmp_path = tmp_path_factory.mktemp("phrasebook")
phrasebook_path = tmp_path / "do-not-exist"
result = runner.invoke(cli.app, [str(phrasebook_path)], catch_exceptions=False)
assert result.exit_code == 1, result.output
assert re.search(r"No such file or directory.*do-not-exist", caplog.text)
caplog.clear()
# Phrasebook file empty
tmp_path = tmp_path_factory.mktemp("phrasebook")
phrasebook_path = tmp_path / "phrasebook.tsv"
phrasebook_path.touch()
result = runner.invoke(cli.app, [str(phrasebook_path)], catch_exceptions=False)
assert result.exit_code == 1, result.output
assert "Invalid file" in caplog.text
assert "No columns to parse from file" in caplog.text # pandas error msg
caplog.clear()
# Phrasebook file exists but with wrong header fields
tmp_path = tmp_path_factory.mktemp("phrasebook")
phrasebook_path = tmp_path / "phrasebook.tsv"
phrasebook_path.write_text("wrong_field_name\tfrench\tenglish")
result = runner.invoke(cli.app, [str(phrasebook_path)], catch_exceptions=False)
assert result.exit_code == 1, result.output
assert "Invalid header" in caplog.text
assert "Expected ['date', 'french', 'english']" in caplog.text
assert "got ['wrong_field_name', 'french', 'english']" in caplog.text
caplog.clear()
# enriched_phrasebook.tsv file empty
tmp_path = tmp_path_factory.mktemp("phrasebook")
phrasebook_path = tmp_path / "phrasebook.tsv"
phrasebook_path.write_text("date\tfrench\tenglish\n2025-12-15\tfr-foo\ten-foo")
enriched_path = cli.enriched_path_func(phrasebook_path)
enriched_path.touch()
result = runner.invoke(cli.app, [str(phrasebook_path)], catch_exceptions=False)
assert result.exit_code == 1, result.output
assert "Invalid file" in caplog.text
assert "No columns to parse from file" in caplog.text # pandas error msg
caplog.clear()
# enriched_phrasebook.tsv exists but with wrong header fields
tmp_path = tmp_path_factory.mktemp("phrasebook")
phrasebook_path = tmp_path / "phrasebook.tsv"
phrasebook_path.write_text("date\tfrench\tenglish\n2025-12-15\tfr-foo\ten-foo")
enriched_path = cli.enriched_path_func(phrasebook_path)
enriched_path.write_text("foo\tbar\tbaz")
result = runner.invoke(cli.app, [str(phrasebook_path)], catch_exceptions=False)
assert result.exit_code == 1, result.output
assert "Invalid header" in caplog.text
assert (
"Expected ['french', 'english', 'anki_audio', 'anki_img', 'generated_from', 'id', 'audio_filename', 'img_filename', 'date']"
in caplog.text
)
assert "got ['foo', 'bar', 'baz']" in caplog.text
caplog.clear()
# `enrich_record` failed in some way so returns None and we exit
tmp_path = tmp_path_factory.mktemp("phrasebook")
phrasebook_path = tmp_path / "phrasebook.tsv"
phrasebook_path.write_text("date\tfrench\tenglish\n2025-12-15\tfr-foo\ten-foo")
mock_enrich_record = Mock(return_value=None)
monkeypatch.setattr(cli, "enrich_record", mock_enrich_record)
result = runner.invoke(cli.app, [str(phrasebook_path)], catch_exceptions=False)
assert result.exit_code == 1, result.output
assert mock_enrich_record.called
caplog.clear()
# `save_new_records` raising an error we exit
tmp_path = tmp_path_factory.mktemp("phrasebook")
phrasebook_path = tmp_path / "phrasebook.tsv"
phrasebook_path.write_text("date\tfrench\tenglish\n2025-12-15\tfr-foo\ten-foo")
mock_enrich_record = Mock(
return_value=True
) # To continue in `cli.enrich_phrasebook` function. Not that in real this should be list of records
mock_save_new_records = Mock(
side_effect=RuntimeError("Failed in `save_new_records`")
)
monkeypatch.setattr(cli, "enrich_record", mock_enrich_record)
monkeypatch.setattr(cli, "save_new_records", mock_save_new_records)
result = runner.invoke(cli.app, [str(phrasebook_path)], catch_exceptions=False)
assert mock_enrich_record.called
assert mock_save_new_records.called
assert result.exit_code == 1, result.output
assert "Failed to save enriched records from record" in caplog.text
@pytest.mark.parametrize(
"phrasebook_content,translations,enriched_content,enriched_expected,logs",
[
# 1 record to be enriched + enriched_phrasebook.tsv doesn't exist
(
"date\tfrench\tenglish\n2025-12-15\tfr1\ten1",
[[("fr2", "en2"), ("fr3", "en3")]],
None,
[
("fr1", "en1", "[sound:phrasebook-fr-to-en-1.mp3]", "<img src=\"phrasebook-fr-to-en-1.png\">", pd.NA, 1, "phrasebook-fr-to-en-1.mp3", "phrasebook-fr-to-en-1.png", "2025-12-15"),
("fr2", "en2", "[sound:phrasebook-fr-to-en-2.mp3]", "<img src=\"phrasebook-fr-to-en-2.png\">", 1, 2, "phrasebook-fr-to-en-2.mp3", "phrasebook-fr-to-en-2.png", "2025-12-15"),
("fr3", "en3", "[sound:phrasebook-fr-to-en-3.mp3]", "<img src=\"phrasebook-fr-to-en-3.png\">", 1, 3, "phrasebook-fr-to-en-3.mp3", "phrasebook-fr-to-en-3.png", "2025-12-15"),
],
["Record has been enriched: ('2025-12-15', 'fr1', 'en1')"],
),
# 1 record / enriched file 3 records (corresponding to 1 record from a previous run)
# english NOT in phrasebook_content
# should create 3 new enriched records and keep the original 3
(
"date\tfrench\tenglish\n2025-12-15\tfr4\ten4",
[[("fr5", "en5"), ("fr6", "en6")]],
(
"french\tenglish\tanki_audio\tanki_img\tgenerated_from\tid\taudio_filename\timg_filename\tdate\n"
'fr1\ten1\t[sound:phrasebook-fr-to-en-1.mp3]\t"<img src=""phrasebook-fr-to-en-1.png"">"\t\t1\tphrasebook-fr-to-en-1.mp3\tphrasebook-fr-to-en-1.png\t2025-12-01\n'
'fr2\ten2\t[sound:phrasebook-fr-to-en-2.mp3]\t"<img src=""phrasebook-fr-to-en-2.png"">"\t1\t2\tphrasebook-fr-to-en-2.mp3\tphrasebook-fr-to-en-2.png\t2025-12-01\n'
'fr3\ten3\t[sound:phrasebook-fr-to-en-3.mp3]\t"<img src=""phrasebook-fr-to-en-3.png"">"\t1\t3\tphrasebook-fr-to-en-3.mp3\tphrasebook-fr-to-en-3.png\t2025-12-01'
),
[
("fr1", "en1", "[sound:phrasebook-fr-to-en-1.mp3]", "<img src=\"phrasebook-fr-to-en-1.png\">", pd.NA, 1, "phrasebook-fr-to-en-1.mp3", "phrasebook-fr-to-en-1.png", "2025-12-01",),
("fr2", "en2", "[sound:phrasebook-fr-to-en-2.mp3]", "<img src=\"phrasebook-fr-to-en-2.png\">", 1, 2, "phrasebook-fr-to-en-2.mp3", "phrasebook-fr-to-en-2.png", "2025-12-01"),
("fr3", "en3", "[sound:phrasebook-fr-to-en-3.mp3]", "<img src=\"phrasebook-fr-to-en-3.png\">", 1, 3, "phrasebook-fr-to-en-3.mp3", "phrasebook-fr-to-en-3.png", "2025-12-01"),
("fr4", "en4", "[sound:phrasebook-fr-to-en-4.mp3]", "<img src=\"phrasebook-fr-to-en-4.png\">", pd.NA, 4, "phrasebook-fr-to-en-4.mp3", "phrasebook-fr-to-en-4.png", "2025-12-15",),
("fr5", "en5", "[sound:phrasebook-fr-to-en-5.mp3]", "<img src=\"phrasebook-fr-to-en-5.png\">", 4, 5, "phrasebook-fr-to-en-5.mp3", "phrasebook-fr-to-en-5.png", "2025-12-15"),
("fr6", "en6", "[sound:phrasebook-fr-to-en-6.mp3]", "<img src=\"phrasebook-fr-to-en-6.png\">", 4, 6, "phrasebook-fr-to-en-6.mp3", "phrasebook-fr-to-en-6.png", "2025-12-15"),
],
["Record has been enriched: ('2025-12-15', 'fr4', 'en4')"],
),
# 1 record / enriched file 3 records (corresponding to 1 record from a previous run)
# 'en1' english field is present in both phrasebook_content and enriched_content
# should not create new enriched records
# "Skip..." in the logs
(
"date\tfrench\tenglish\n2025-12-15\tfr_whatever\ten1",
None, # No translations generated
(
"french\tenglish\tanki_audio\tanki_img\tgenerated_from\tid\taudio_filename\timg_filename\tdate\n"
'fr1\ten1\t[sound:phrasebook-fr-to-en-1.mp3]\t"<img src=""phrasebook-fr-to-en-1.png"">"\t\t1\tphrasebook-fr-to-en-1.mp3\tphrasebook-fr-to-en-1.png\t2025-12-01\n'
'fr2\ten2\t[sound:phrasebook-fr-to-en-2.mp3]\t"<img src=""phrasebook-fr-to-en-2.png"">"\t1\t2\tphrasebook-fr-to-en-2.mp3\tphrasebook-fr-to-en-2.png\t2025-12-01\n'
'fr3\ten3\t[sound:phrasebook-fr-to-en-3.mp3]\t"<img src=""phrasebook-fr-to-en-3.png"">"\t1\t3\tphrasebook-fr-to-en-3.mp3\tphrasebook-fr-to-en-3.png\t2025-12-01'
),
[
("fr1", "en1", "[sound:phrasebook-fr-to-en-1.mp3]", "<img src=\"phrasebook-fr-to-en-1.png\">", pd.NA, 1, "phrasebook-fr-to-en-1.mp3", "phrasebook-fr-to-en-1.png", "2025-12-01"),
("fr2", "en2", "[sound:phrasebook-fr-to-en-2.mp3]", "<img src=\"phrasebook-fr-to-en-2.png\">", 1, 2, "phrasebook-fr-to-en-2.mp3", "phrasebook-fr-to-en-2.png", "2025-12-01"),
("fr3", "en3", "[sound:phrasebook-fr-to-en-3.mp3]", "<img src=\"phrasebook-fr-to-en-3.png\">", 1, 3, "phrasebook-fr-to-en-3.mp3", "phrasebook-fr-to-en-3.png", "2025-12-01"),
],
["Skip existing record: ('2025-12-15', 'fr_whatever', 'en1')"],
),
# phrasebook_content 3 records / no enriched file
# should create 9 enriched records
(
"date\tfrench\tenglish\n"
"2025-12-15\tfr1\ten1\n"
"2025-12-16\tfr4\ten4\n"
"2025-12-17\tfr7\ten7",
[
[("fr2", "en2"), ("fr3", "en3")],
[("fr5", "en5"), ("fr6", "en6")],
[("fr8", "en8"), ("fr9", "en9")],
],
None,
[
("fr1", "en1", "[sound:phrasebook-fr-to-en-1.mp3]", "<img src=\"phrasebook-fr-to-en-1.png\">", pd.NA, 1, "phrasebook-fr-to-en-1.mp3", "phrasebook-fr-to-en-1.png", "2025-12-15"),
("fr2", "en2", "[sound:phrasebook-fr-to-en-2.mp3]", "<img src=\"phrasebook-fr-to-en-2.png\">", 1, 2, "phrasebook-fr-to-en-2.mp3", "phrasebook-fr-to-en-2.png", "2025-12-15"),
("fr3", "en3", "[sound:phrasebook-fr-to-en-3.mp3]", "<img src=\"phrasebook-fr-to-en-3.png\">", 1, 3, "phrasebook-fr-to-en-3.mp3", "phrasebook-fr-to-en-3.png", "2025-12-15"),
("fr4", "en4", "[sound:phrasebook-fr-to-en-4.mp3]", "<img src=\"phrasebook-fr-to-en-4.png\">", pd.NA, 4, "phrasebook-fr-to-en-4.mp3", "phrasebook-fr-to-en-4.png", "2025-12-16"),
("fr5", "en5", "[sound:phrasebook-fr-to-en-5.mp3]", "<img src=\"phrasebook-fr-to-en-5.png\">", 4, 5, "phrasebook-fr-to-en-5.mp3", "phrasebook-fr-to-en-5.png", "2025-12-16"),
("fr6", "en6", "[sound:phrasebook-fr-to-en-6.mp3]", "<img src=\"phrasebook-fr-to-en-6.png\">", 4, 6, "phrasebook-fr-to-en-6.mp3", "phrasebook-fr-to-en-6.png", "2025-12-16"),
("fr7", "en7", "[sound:phrasebook-fr-to-en-7.mp3]", "<img src=\"phrasebook-fr-to-en-7.png\">", pd.NA, 7, "phrasebook-fr-to-en-7.mp3", "phrasebook-fr-to-en-7.png", "2025-12-17"),
("fr8", "en8", "[sound:phrasebook-fr-to-en-8.mp3]", "<img src=\"phrasebook-fr-to-en-8.png\">", 7, 8, "phrasebook-fr-to-en-8.mp3", "phrasebook-fr-to-en-8.png", "2025-12-17"),
("fr9", "en9", "[sound:phrasebook-fr-to-en-9.mp3]", "<img src=\"phrasebook-fr-to-en-9.png\">", 7, 9, "phrasebook-fr-to-en-9.mp3", "phrasebook-fr-to-en-9.png", "2025-12-17"),
],
[
"Record has been enriched: ('2025-12-15', 'fr1', 'en1')",
"Record has been enriched: ('2025-12-16', 'fr4', 'en4')",
"Record has been enriched: ('2025-12-17', 'fr7', 'en7')",
],
),
# phrasebook_content 3 records / enriched file 6 records (corresponding to 2 records from a previous run)
# english NOT in phrasebook_content
# should create 9 enriched records and keep the original 6
(
"date\tfrench\tenglish\n"
"2025-12-15\tfr7\ten7\n"
"2025-12-16\tfr10\ten10\n"
"2025-12-17\tfr13\ten13",
[
[("fr8", "en8"), ("fr9", "en9")],
[("fr11", "en11"), ("fr12", "en12")],
[("fr14", "en14"), ("fr15", "en15")],
],
(
"french\tenglish\tanki_audio\tanki_img\tgenerated_from\tid\taudio_filename\timg_filename\tdate\n"
'fr1\ten1\t[sound:phrasebook-fr-to-en-1.mp3]\t"<img src=""phrasebook-fr-to-en-1.png"">"\t\t1\tphrasebook-fr-to-en-1.mp3\tphrasebook-fr-to-en-1.png\t2025-12-01\n'
'fr2\ten2\t[sound:phrasebook-fr-to-en-2.mp3]\t"<img src=""phrasebook-fr-to-en-2.png"">"\t1\t2\tphrasebook-fr-to-en-2.mp3\tphrasebook-fr-to-en-2.png\t2025-12-01\n'
'fr3\ten3\t[sound:phrasebook-fr-to-en-3.mp3]\t"<img src=""phrasebook-fr-to-en-3.png"">"\t1\t3\tphrasebook-fr-to-en-3.mp3\tphrasebook-fr-to-en-3.png\t2025-12-01\n'
'fr4\ten4\t[sound:phrasebook-fr-to-en-4.mp3]\t"<img src=""phrasebook-fr-to-en-4.png"">"\t\t4\tphrasebook-fr-to-en-4.mp3\tphrasebook-fr-to-en-4.png\t2025-12-02\n'
'fr5\ten5\t[sound:phrasebook-fr-to-en-5.mp3]\t"<img src=""phrasebook-fr-to-en-5.png"">"\t4\t5\tphrasebook-fr-to-en-5.mp3\tphrasebook-fr-to-en-5.png\t2025-12-02\n'
'fr6\ten6\t[sound:phrasebook-fr-to-en-6.mp3]\t"<img src=""phrasebook-fr-to-en-6.png"">"\t4\t6\tphrasebook-fr-to-en-6.mp3\tphrasebook-fr-to-en-6.png\t2025-12-02'
),
[
("fr1", "en1", "[sound:phrasebook-fr-to-en-1.mp3]", "<img src=\"phrasebook-fr-to-en-1.png\">", pd.NA, 1, "phrasebook-fr-to-en-1.mp3", "phrasebook-fr-to-en-1.png", "2025-12-01",),
("fr2", "en2", "[sound:phrasebook-fr-to-en-2.mp3]", "<img src=\"phrasebook-fr-to-en-2.png\">", 1, 2, "phrasebook-fr-to-en-2.mp3", "phrasebook-fr-to-en-2.png", "2025-12-01",),
("fr3", "en3", "[sound:phrasebook-fr-to-en-3.mp3]", "<img src=\"phrasebook-fr-to-en-3.png\">", 1, 3, "phrasebook-fr-to-en-3.mp3", "phrasebook-fr-to-en-3.png", "2025-12-01",),
("fr4", "en4", "[sound:phrasebook-fr-to-en-4.mp3]", "<img src=\"phrasebook-fr-to-en-4.png\">", pd.NA, 4, "phrasebook-fr-to-en-4.mp3", "phrasebook-fr-to-en-4.png", "2025-12-02",),
("fr5", "en5", "[sound:phrasebook-fr-to-en-5.mp3]", "<img src=\"phrasebook-fr-to-en-5.png\">", 4, 5, "phrasebook-fr-to-en-5.mp3", "phrasebook-fr-to-en-5.png", "2025-12-02",),
("fr6", "en6", "[sound:phrasebook-fr-to-en-6.mp3]", "<img src=\"phrasebook-fr-to-en-6.png\">", 4, 6, "phrasebook-fr-to-en-6.mp3", "phrasebook-fr-to-en-6.png", "2025-12-02",),
("fr7", "en7", "[sound:phrasebook-fr-to-en-7.mp3]", "<img src=\"phrasebook-fr-to-en-7.png\">", pd.NA, 7, "phrasebook-fr-to-en-7.mp3", "phrasebook-fr-to-en-7.png", "2025-12-15"),
("fr8", "en8", "[sound:phrasebook-fr-to-en-8.mp3]", "<img src=\"phrasebook-fr-to-en-8.png\">", 7, 8, "phrasebook-fr-to-en-8.mp3", "phrasebook-fr-to-en-8.png", "2025-12-15"),
("fr9", "en9", "[sound:phrasebook-fr-to-en-9.mp3]", "<img src=\"phrasebook-fr-to-en-9.png\">", 7, 9, "phrasebook-fr-to-en-9.mp3", "phrasebook-fr-to-en-9.png", "2025-12-15"),
("fr10", "en10", "[sound:phrasebook-fr-to-en-10.mp3]", "<img src=\"phrasebook-fr-to-en-10.png\">", pd.NA, 10, "phrasebook-fr-to-en-10.mp3", "phrasebook-fr-to-en-10.png", "2025-12-16"),
("fr11", "en11", "[sound:phrasebook-fr-to-en-11.mp3]", "<img src=\"phrasebook-fr-to-en-11.png\">", 10, 11, "phrasebook-fr-to-en-11.mp3", "phrasebook-fr-to-en-11.png", "2025-12-16"),
("fr12", "en12", "[sound:phrasebook-fr-to-en-12.mp3]", "<img src=\"phrasebook-fr-to-en-12.png\">", 10, 12, "phrasebook-fr-to-en-12.mp3", "phrasebook-fr-to-en-12.png", "2025-12-16"),
("fr13", "en13", "[sound:phrasebook-fr-to-en-13.mp3]", "<img src=\"phrasebook-fr-to-en-13.png\">", pd.NA, 13, "phrasebook-fr-to-en-13.mp3", "phrasebook-fr-to-en-13.png", "2025-12-17"),
("fr14", "en14", "[sound:phrasebook-fr-to-en-14.mp3]", "<img src=\"phrasebook-fr-to-en-14.png\">", 13, 14, "phrasebook-fr-to-en-14.mp3", "phrasebook-fr-to-en-14.png", "2025-12-17"),
("fr15", "en15", "[sound:phrasebook-fr-to-en-15.mp3]", "<img src=\"phrasebook-fr-to-en-15.png\">", 13, 15, "phrasebook-fr-to-en-15.mp3", "phrasebook-fr-to-en-15.png", "2025-12-17"),
],
[
"Record has been enriched: ('2025-12-15', 'fr7', 'en7')",
"Record has been enriched: ('2025-12-16', 'fr10', 'en10')",
"Record has been enriched: ('2025-12-17', 'fr13', 'en13')",
],
),
# phrasebook_content 3 records / enriched file 6 records (corresponding to 2 records from a previous run)
# 'en1' english field is present in both phrasebook_content and enriched_content
# should create only 6 new enriched records for the other 2 phrasebook records
# "Skip..." in the logs
(
"date\tfrench\tenglish\n"
"2025-12-15\tfr_whatever\ten1\n"
"2025-12-16\tfr7\ten7\n"
"2025-12-17\tfr10\ten10",
[
[("fr8", "en8"), ("fr9", "en9")],
[("fr11", "en11"), ("fr12", "en12")],
],
(
"french\tenglish\tanki_audio\tanki_img\tgenerated_from\tid\taudio_filename\timg_filename\tdate\n"
'fr1\ten1\t[sound:phrasebook-fr-to-en-1.mp3]\t"<img src=""phrasebook-fr-to-en-1.png"">"\t\t1\tphrasebook-fr-to-en-1.mp3\tphrasebook-fr-to-en-1.png\t2025-12-01\n'
'fr2\ten2\t[sound:phrasebook-fr-to-en-2.mp3]\t"<img src=""phrasebook-fr-to-en-2.png"">"\t1\t2\tphrasebook-fr-to-en-2.mp3\tphrasebook-fr-to-en-2.png\t2025-12-01\n'
'fr3\ten3\t[sound:phrasebook-fr-to-en-3.mp3]\t"<img src=""phrasebook-fr-to-en-3.png"">"\t1\t3\tphrasebook-fr-to-en-3.mp3\tphrasebook-fr-to-en-3.png\t2025-12-01\n'
'fr4\ten4\t[sound:phrasebook-fr-to-en-4.mp3]\t"<img src=""phrasebook-fr-to-en-4.png"">"\t\t4\tphrasebook-fr-to-en-4.mp3\tphrasebook-fr-to-en-4.png\t2025-12-02\n'
'fr5\ten5\t[sound:phrasebook-fr-to-en-5.mp3]\t"<img src=""phrasebook-fr-to-en-5.png"">"\t4\t5\tphrasebook-fr-to-en-5.mp3\tphrasebook-fr-to-en-5.png\t2025-12-02\n'
'fr6\ten6\t[sound:phrasebook-fr-to-en-6.mp3]\t"<img src=""phrasebook-fr-to-en-6.png"">"\t4\t6\tphrasebook-fr-to-en-6.mp3\tphrasebook-fr-to-en-6.png\t2025-12-02'
),
[
("fr1", "en1", "[sound:phrasebook-fr-to-en-1.mp3]", "<img src=\"phrasebook-fr-to-en-1.png\">", pd.NA, 1, "phrasebook-fr-to-en-1.mp3", "phrasebook-fr-to-en-1.png", "2025-12-01",),
("fr2", "en2", "[sound:phrasebook-fr-to-en-2.mp3]", "<img src=\"phrasebook-fr-to-en-2.png\">", 1, 2, "phrasebook-fr-to-en-2.mp3", "phrasebook-fr-to-en-2.png", "2025-12-01",),
("fr3", "en3", "[sound:phrasebook-fr-to-en-3.mp3]", "<img src=\"phrasebook-fr-to-en-3.png\">", 1, 3, "phrasebook-fr-to-en-3.mp3", "phrasebook-fr-to-en-3.png", "2025-12-01",),
("fr4", "en4", "[sound:phrasebook-fr-to-en-4.mp3]", "<img src=\"phrasebook-fr-to-en-4.png\">", pd.NA, 4, "phrasebook-fr-to-en-4.mp3", "phrasebook-fr-to-en-4.png", "2025-12-02",),
("fr5", "en5", "[sound:phrasebook-fr-to-en-5.mp3]", "<img src=\"phrasebook-fr-to-en-5.png\">", 4, 5, "phrasebook-fr-to-en-5.mp3", "phrasebook-fr-to-en-5.png", "2025-12-02",),
("fr6", "en6", "[sound:phrasebook-fr-to-en-6.mp3]", "<img src=\"phrasebook-fr-to-en-6.png\">", 4, 6, "phrasebook-fr-to-en-6.mp3", "phrasebook-fr-to-en-6.png", "2025-12-02",),
("fr7", "en7", "[sound:phrasebook-fr-to-en-7.mp3]", "<img src=\"phrasebook-fr-to-en-7.png\">", pd.NA, 7, "phrasebook-fr-to-en-7.mp3", "phrasebook-fr-to-en-7.png", "2025-12-16"),
("fr8", "en8", "[sound:phrasebook-fr-to-en-8.mp3]", "<img src=\"phrasebook-fr-to-en-8.png\">", 7, 8, "phrasebook-fr-to-en-8.mp3", "phrasebook-fr-to-en-8.png", "2025-12-16"),
("fr9", "en9", "[sound:phrasebook-fr-to-en-9.mp3]", "<img src=\"phrasebook-fr-to-en-9.png\">", 7, 9, "phrasebook-fr-to-en-9.mp3", "phrasebook-fr-to-en-9.png", "2025-12-16"),
("fr10", "en10", "[sound:phrasebook-fr-to-en-10.mp3]", "<img src=\"phrasebook-fr-to-en-10.png\">", pd.NA, 10, "phrasebook-fr-to-en-10.mp3", "phrasebook-fr-to-en-10.png", "2025-12-17"),
("fr11", "en11", "[sound:phrasebook-fr-to-en-11.mp3]", "<img src=\"phrasebook-fr-to-en-11.png\">", 10, 11, "phrasebook-fr-to-en-11.mp3", "phrasebook-fr-to-en-11.png", "2025-12-17"),
("fr12", "en12", "[sound:phrasebook-fr-to-en-12.mp3]", "<img src=\"phrasebook-fr-to-en-12.png\">", 10, 12, "phrasebook-fr-to-en-12.mp3", "phrasebook-fr-to-en-12.png", "2025-12-17"),
],
[
"Skip existing record: ('2025-12-15', 'fr_whatever', 'en1')",
"Record has been enriched: ('2025-12-16', 'fr7', 'en7')",
"Record has been enriched: ('2025-12-17', 'fr10', 'en10')",
],
),
],
ids=[
"1_record_no_enriched_file",
"1_record_enriched_file_3_records_english_not_in_phrasebook_creates_3_keeps_3",
"1_record_enriched_file_3_records_english_same_skips_creates_0",
"3_records_no_enriched_file_creates_9",
"3_records_enriched_file_6_records_english_not_in_phrasebook_creates_9_keeps_6",
"3_records_enriched_file_6_records_first_english_same_skips_1_creates_6",
],
) # fmt: skip
def test_app_records_saved(
tmp_path_factory: pytest.TempPathFactory,
monkeypatch: pytest.MonkeyPatch,
caplog: pytest.LogCaptureFixture,
phrasebook_content,
translations,
enriched_content,
enriched_expected,
logs,
):
caplog.set_level(logging.INFO, logger="phrasebook_fr_to_en.cli")
# OPENAI_API_KEY must be set to run the app
# As we mock `generate_...` functions, we don't hit OpenAI API,
# so we don't have to use a real API key
monkeypatch.setenv("OPENAI_API_KEY", "foo-api-key")
mock_generate_translations = Mock(side_effect=translations)
mock_generate_audio = Mock(return_value=None)
mock_generate_img = Mock(return_value=None)
monkeypatch.setattr(cli, "generate_translations", mock_generate_translations)
monkeypatch.setattr(cli, "generate_audio", mock_generate_audio)
monkeypatch.setattr(cli, "generate_img", mock_generate_img)
tmp_path = tmp_path_factory.mktemp("phrasebook")
phrasebook_path = tmp_path / "phrasebook.tsv"
phrasebook_path.write_text(phrasebook_content)
enriched_path = cli.enriched_path_func(phrasebook_path)
if enriched_content:
enriched_path.write_text(enriched_content)
result = runner.invoke(cli.app, [str(phrasebook_path)], catch_exceptions=False)
assert result.exit_code == 0, result.output
enriched_df = pd.read_csv(enriched_path, sep="\t", dtype="string")
# Match the dtypes produced by save_new_records
enriched_df["id"] = enriched_df["id"].astype("Int64")
enriched_df["generated_from"] = enriched_df["generated_from"].astype("Int64")
enriched_df_expected = pd.DataFrame(
enriched_expected,
columns=pd.Index(cli.ENRICHED_COLUMNS),
dtype="string",
)
enriched_df_expected["id"] = enriched_df_expected["id"].astype("Int64")
enriched_df_expected["generated_from"] = enriched_df_expected[
"generated_from"
].astype("Int64")
pd.testing.assert_frame_equal(enriched_df, enriched_df_expected, check_dtype=True)
for log in logs:
assert log in caplog.text
## Test `cli.watch_phrasebook`
def test_watch_phrasebook(
tmp_path_factory: pytest.TempPathFactory,
monkeypatch: pytest.MonkeyPatch,
caplog: pytest.LogCaptureFixture,
) -> None:
caplog.set_level(logging.INFO, logger="phrasebook_fr_to_en.cli")
translations: list[list[tuple[str, str]]] = [
[("fr1-a", "en1-a"), ("fr1-b", "en1-b")],
[("fr2-a", "en2-a"), ("fr2-b", "en2-b")],
[("fr3-a", "en3-a"), ("fr3-b", "en3-b")],
]
mock_generate_translations = Mock(side_effect=translations)
mock_generate_audio = Mock(return_value=None)
mock_generate_img = Mock(return_value=None)
monkeypatch.setattr(cli, "generate_translations", mock_generate_translations)
monkeypatch.setattr(cli, "generate_audio", mock_generate_audio)
monkeypatch.setattr(cli, "generate_img", mock_generate_img)
tmp_path = tmp_path_factory.mktemp("phrasebook")
phrasebook_path = tmp_path / "phrasebook.tsv"
phrasebook_path.write_text("date\tfrench\tenglish\n2025-12-15\tfr1\ten1")
enriched_path = cli.enriched_path_func(phrasebook_path)
# We don't use client because generate function are mocked,
# but we still have to pass as argument of `watch_phrasebook`
client = OpenAI(api_key="foo-api-key")
# We start watching
# `enriched_path` file doesn't exist
thread = threading.Thread(
target=cli.watch_phrasebook,
args=[phrasebook_path.absolute(), client],
daemon=True,
)
thread.start()
deadline = time.monotonic() + 5.0
while (
time.monotonic() < deadline
and f"Start watching file {phrasebook_path}" not in caplog.text
):
time.sleep(0.01)
assert f"Start watching file {phrasebook_path}" in caplog.text
caplog.clear()
def read_enriched_english_values() -> list[str]:
# While `enriched_path` is being written or before it has been
# created, `pd.read_csv()` raises an error
try:
enriched_df = pd.read_csv(enriched_path, sep="\t", dtype="string")
return enriched_df["english"].dropna().to_list()
except Exception:
return []
# enrich "en1" and "en2" records
phrasebook_path.write_text(
"date\tfrench\tenglish\n2025-12-15\tfr1\ten1\n2025-12-16\tfr2\ten2"
)
deadline = time.monotonic() + 5.0
while (
time.monotonic() < deadline
and "en1" not in read_enriched_english_values()
and "en2" not in read_enriched_english_values()
):
time.sleep(0.01)
assert "Skip existing record: ('2025-12-15', 'fr1', 'en1')" not in caplog.text
assert "en1" in read_enriched_english_values()
assert "en2" in read_enriched_english_values()
caplog.clear()
# Invalid file logged due to last row having 4 fields instead of 3
phrasebook_path.write_text(
"date\tfrench\tenglish\n"
"2025-12-15\tfr1\ten1\n"
"2025-12-16\tfr2\ten2\n"
"2025-12-17\tfr3\ten3\twrong-extra-field"
)
deadline = time.monotonic() + 5.0
while (
time.monotonic() < deadline
and "Invalid file" not in caplog.text
and "Expected 3 fields in line 4, saw 4" not in caplog.text
):
time.sleep(0.01)
assert f"Invalid file {phrasebook_path}" in caplog.text
assert "Expected 3 fields in line 4, saw 4" in caplog.text # pandas error msg
caplog.clear()
# Skip existing "en1" and "en2" records and enriched "en2" record
phrasebook_path.write_text(
"date\tfrench\tenglish\n"
"2025-12-15\tfr1\ten1\n"
"2025-12-16\tfr2\ten2\n"
"2025-12-17\tfr3\ten3"
)
deadline = time.monotonic() + 5.0
while time.monotonic() < deadline and "en3" not in read_enriched_english_values():
time.sleep(0.01)
assert "Skip existing record: ('2025-12-15', 'fr1', 'en1')" in caplog.text
assert "Skip existing record: ('2025-12-16', 'fr2', 'en2')" in caplog.text
assert "en3" in read_enriched_english_values()
assert thread.is_alive()
@pytest.mark.respx(base_url="https://api.openai.com/v1/")
def test_generate_translations(
respx_mock: MockRouter, caplog: pytest.LogCaptureFixture
):
caplog.set_level(logging.INFO, logger="phrasebook_fr_to_en.cli")
record = ("2025-12-15", "fr1", "en1")
client = OpenAI(api_key="foo-api-key")
def partial_json_response(output_id: str, output_text: str):
return {
"output": [
{
"type": "message",
"id": f"msg_{output_id}",
"status": "completed",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": output_text,
"annotations": [],
}
],
}
]
}
# We receive exactly 2 translations and this is what we want
respx_mock.post("/responses").mock(
return_value=httpx.Response(
200,
json=partial_json_response(
"id_1",
'{"translations":[{"french":"fr2","english":"en2"},{"french":"fr3","english":"en3"}]}',
),
)
)
translations = cli.generate_translations(record, client)
assert translations == [("fr2", "en2"), ("fr3", "en3")]
assert (
"Generating translations for record ('2025-12-15', 'fr1', 'en1')" in caplog.text
)
assert (
"Translations generated for record ('2025-12-15', 'fr1', 'en1')" in caplog.text
)
assert "using model gpt-5.2 and input 'fr1 -> en1'" in caplog.text
caplog.clear()
# First request returns 3 translations -> We take the first 2
# Second request would return 2 translations, which is ok, but we
# never send that second request because we stopped at the first one.
respx_mock.post("/responses").mock(
side_effect=[
httpx.Response(
200,
json=partial_json_response(
"id_1",
'{"translations":[{"french":"fr2","english":"en2"},{"french":"fr3","english":"en3"}, {"french":"fr4","english":"en4"}]}',
),
),
httpx.Response(
200,
json=partial_json_response(
"id_2",
'{"translations":[{"french":"frA","english":"enA"},{"french":"frB","english":"enB"}]}',
),
),
]
)
translations = cli.generate_translations(record, client)
assert translations == [("fr2", "en2"), ("fr3", "en3")]
# 3 retries with the 3rd OK
# First request returns 1 translation -> should be 2 so we retry
# Second request returns no translation -> should be 2 so we retry
# Third request returns 2 translations -> this is ok
respx_mock.post("/responses").mock(
side_effect=[
httpx.Response(
200,
json=partial_json_response(
"id_1",
'{"translations":[{"french":"fr2","english":"en2"}]}',
),
),
httpx.Response(
200,
json={
"incomplete_details": {"reason": "max_output_tokens"},
"output": [
{
"id": "rs_0c8b0343bd64d781006971f5c6041c8194b28661972de6acc2",
"summary": [],
"type": "reasoning",
}
],
},
),
httpx.Response(
200,
json=partial_json_response(
"id_2",
'{"translations":[{"french":"fr2","english":"en2"},{"french":"fr3","english":"en3"}]}',
),
),
]
)
translations = cli.generate_translations(record, client)
assert translations == [("fr2", "en2"), ("fr3", "en3")]
assert "No translations were returned by the model at attempt 2." in caplog.text
assert (
"Wrong number of translations returned by the model at attempt 1."
in caplog.text
)
# Raise an error because we receive only one translation pair each
# time we do a request to the API (at the 3rd attempt we raise an error)
respx_mock.post("/responses").mock(
return_value=httpx.Response(
200,
json=partial_json_response(
"id_1",
'{"translations":[{"french":"fr2","english":"en2"}]}',
),
),
)
with pytest.raises(ValueError, match="Wrong number of translations: 1."):
translations = cli.generate_translations(record, client)
# Raise an error because we receive no translation pair each time we
# do a request to the API (at the 3rd attempt we raise an error)
# This can happens if you use for instance gpt-5-nano with
# a limited amount output token that entirely consumed by the
# reasoning.
# We're not using reasoning of gpt-5.2 but just in case.
respx_mock.post("/responses").mock(
return_value=httpx.Response(
200,
json={
"incomplete_details": {"reason": "max_output_tokens"},
"output": [
{
"id": "rs_0c8b0343bd64d781006971f5c6041c8194b28661972de6acc2",
"summary": [],
"type": "reasoning",
}
],
},
)
)
with pytest.raises(ValueError, match="No translations were returned by the model."):
translations = cli.generate_translations(record, client)
@pytest.mark.respx(base_url="https://api.openai.com/v1/")
def test_generate_translations_request_logged_when_api_error_raised(
respx_mock: MockRouter, caplog: pytest.LogCaptureFixture
):
caplog.set_level(logging.INFO, logger="phrasebook_fr_to_en.cli")
respx_mock.post("/responses").mock(
return_value=httpx.Response(
401, json={"error": {"message": "Incorrect API key provided"}}
)
)
record = ("2025-12-15", "fr1", "en1")
client = OpenAI(api_key="foo-api-key", max_retries=0)
with pytest.raises(APIError):
translations = cli.generate_translations(record, client)
# Log httpx request
assert "<Request('POST', 'https://api.openai.com/v1/responses')>" in caplog.text
# Log httpx headers: Headers({'host': 'api.openai.com', ...})
assert "Request headers -" in caplog.text
assert "'host': 'api.openai.com'" in caplog.text
# Ensure API key not log in headers
assert "foo-api-key" not in caplog.text
# Log httpx body request
assert re.search(r"Request body - .*\"input\"\s*:\s*\"fr1 -> en1\"", caplog.text)
@pytest.mark.skipif(
os.getenv("OPENAI_LIVE") != "1",
reason="Requires OPENAI_LIVE=1. In that case, we do real call to OpenAI API.",
)
def test_generate_translations_real(caplog: pytest.LogCaptureFixture):
caplog.set_level(logging.INFO, logger="phrasebook_fr_to_en.cli")
record = ("2025-12-15", "Il est beau.", "He is handsome.")
client = OpenAI()
translations = cli.generate_translations(record, client)
# Raise an error if `translations` not valid against `TranslationList`
TranslationList = conlist(tuple[str, str], min_length=2, max_length=2)
TypeAdapter(TranslationList).validate_python(translations)
assert (
"Translations generated for record ('2025-12-15', 'Il est beau.', 'He is handsome.')"
in caplog.text
)
assert (
"using model gpt-5.2 and input 'Il est beau. -> He is handsome.'" in caplog.text
)
@pytest.mark.skipif(
os.getenv("OPENAI_LIVE") != "1",
reason="Requires OPENAI_LIVE=1. In that case, we do real call to OpenAI API.",
)
def test_generate_audio_real(caplog: pytest.LogCaptureFixture, tmp_path: Path):
caplog.set_level(logging.INFO, logger="phrasebook_fr_to_en.cli")
record = cli.build_record(
record_id=1,
french="Il est beau.",
english="He is handsome.",
generated_from=pd.NA,
date="2025-12-15",
)
client = OpenAI()
media_dir = tmp_path
audio_path = media_dir / record["audio_filename"]
cli.generate_audio(record, media_dir, client)
assert "Generating audio 'He is handsome.' for record 1" in caplog.text
assert media_dir.exists()
assert is_mp3(audio_path)
assert f"Audio has been generated: {audio_path}." in caplog.text
@pytest.mark.skipif(
os.getenv("OPENAI_LIVE") != "1",
reason="Requires OPENAI_LIVE=1. In that case, we do real call to OpenAI API.",
)
def test_generate_img_real(caplog: pytest.LogCaptureFixture, tmp_path: Path):
caplog.set_level(logging.INFO, logger="phrasebook_fr_to_en.cli")
record = cli.build_record(
record_id=1,
french="Il est beau.",
english="He is handsome.",
generated_from=pd.NA,
date="2025-12-15",
)
client = OpenAI()
media_dir = tmp_path
img_path = media_dir / record["img_filename"]
cli.generate_img(record, media_dir, client)
assert "Generating image 'He is handsome.' for record 1" in caplog.text
assert media_dir.exists()
assert is_png(img_path)
assert f"Image has been generated: {img_path}." in caplog.text
@pytest.mark.skipif(
os.getenv("OPENAI_LIVE") != "1",
reason="Requires OPENAI_LIVE=1. In that case, we do real call to OpenAI API.",
)
def test_enrich_record_ok_real(tmp_path: Path, caplog: pytest.LogCaptureFixture):
caplog.set_level(logging.INFO, logger="phrasebook_fr_to_en.cli")
record = ("2025-12-15", "Il est beau.", "He is handsome.")
media_dir = tmp_path
client = OpenAI()
records = cli.enrich_record(record, 10, media_dir, client)
assert len(records) == 3
for rec in records:
audio_path = tmp_path / rec["audio_filename"]
img_path = tmp_path / rec["img_filename"]
assert is_mp3(audio_path)
assert is_png(img_path)
@pytest.mark.skipif(
os.getenv("OPENAI_LIVE") != "1",
reason="Requires OPENAI_LIVE=1. In that case, we do real call to OpenAI API.",
)
def test_app_real(tmp_path: Path, caplog: pytest.LogCaptureFixture):
caplog.set_level(logging.INFO, logger="phrasebook_fr_to_en.cli")
phrasebook_path = tmp_path / "phrasebook.tsv"
phrasebook_path.write_text("date\tfrench\tenglish\n2025-12-15\tbeau\thandsome")
enriched_path = cli.enriched_path_func(phrasebook_path)
result = runner.invoke(cli.app, [str(phrasebook_path)], catch_exceptions=False)
assert result.exit_code == 0, result.stdout
enriched_df = pd.read_csv(enriched_path, sep="\t", dtype="string")
pd.testing.assert_index_equal(enriched_df.columns, pd.Index(cli.ENRICHED_COLUMNS))
assert len(enriched_df) == 3
media_dir = cli.media_dir_func(phrasebook_path)
for audio_path in enriched_df["audio_filename"]:
assert is_mp3(media_dir / audio_path)
for img_filename in enriched_df["img_filename"]:
assert is_png(media_dir / img_filename)
assert "Record has been enriched: ('2025-12-15', 'beau', 'handsome')" in caplog.text
References
-
phrasebook-fr-to-en
-
OpenAI docs:
-
Python libraries:
-
For the tests:
-
Anki: