Assess Relevance for a Batch of Study Identifiers — df_assess

Runs the relevance assessment workflow (df_assess_relevance) for multiple DOIs/PMIDs, leveraging caching and providing a summary of results.

Usage

df_assess_batch(
  chat,
  identifiers,
  metawoRld_path,
  force_fetch = FALSE,
  force_assess = FALSE,
  email = NULL,
  ncbi_api_key = NULL,
  stop_on_error = FALSE,
  ...
)

Arguments

identifiers: Character vector. A vector of DOIs and/or PMIDs.
metawoRld_path: Character string. Path to the root of the metawoRld project.
force_fetch: Logical. If TRUE, bypass the metadata cache for all identifiers.
force_assess: Logical. If TRUE, bypass the assessment cache for all identifiers.
email: Character string (optional). Email for NCBI Entrez.
ncbi_api_key: Character string (optional). NCBI API key.
stop_on_error: Logical. If TRUE, the batch process stops if any single assessment fails. If FALSE (default), it attempts to process all identifiers and reports errors in the summary.
...: Additional arguments passed down to df_assess_relevance and subsequently to the LLM API call function (e.g., temperature).
service: Character string. The LLM service to use (e.g., "openai").
model: Character string. The specific LLM model name.

Value

A data frame (tibble) summarizing the assessment results for each identifier, with columns:

identifier: The DOI or PMID.
status: "Success" or "Failure".
decision: Assessment decision ("Include", "Exclude", etc.) if status is "Success".
score: Confidence score if status is "Success".
rationale: LLM rationale if status is "Success".
error_message: The error message if status is "Failure".

Also prints progress and summary information to the console. Assessment results are saved to the cache within the metawoRld project.

Examples

if (FALSE) { # \dontrun{
# --- Prerequisites ---
# 1. Set API key: usethis::edit_r_environ("project") -> add OPENAI_API_KEY=sk-... -> Restart R
# 2. Create a dummy metawoRld project
proj_path <- file.path(tempdir(), "assess_batch_proj")
metawoRld::create_metawoRld(
   proj_path,
   project_name = "Test Batch Assessment",
   project_description = "Testing DataFindR batch assessment",
   inclusion_criteria = c("Human study", "Pregnancy", "Serum or Plasma", "Cytokine measurement"),
   exclusion_criteria = c("Animal study", "Review article", "Non-English")
)

# --- Identifiers from a hypothetical search ---
ids_to_assess <- c(
  "31772108", # Should likely be Include
  "25376210", # Should likely be Include
  "invalid_pmid", # Should fail fetch
  "10.1038/nature14539" # Example DOI (Nature review, likely Exclude)
)

# --- Run Batch Assessment ---
batch_results <- df_assess_batch(
  identifiers = ids_to_assess,
  metawoRld_path = proj_path,
  email = "your.email@example.com", # Replace with your email
  service = "openai",
  model = "gpt-3.5-turbo",
  stop_on_error = FALSE # Continue processing even if one fails
)

# --- View Results ---
print(batch_results)

# --- Clean up ---
unlink(proj_path, recursive = TRUE)
} # }