Skip to contents

Runs the relevance assessment workflow (df_assess_relevance) for multiple DOIs/PMIDs, leveraging caching and providing a summary of results.

Usage

df_assess_batch(
  chat,
  identifiers,
  metawoRld_path,
  force_fetch = FALSE,
  force_assess = FALSE,
  email = NULL,
  ncbi_api_key = NULL,
  stop_on_error = FALSE,
  ...
)

Arguments

identifiers

Character vector. A vector of DOIs and/or PMIDs.

metawoRld_path

Character string. Path to the root of the metawoRld project.

force_fetch

Logical. If TRUE, bypass the metadata cache for all identifiers.

force_assess

Logical. If TRUE, bypass the assessment cache for all identifiers.

email

Character string (optional). Email for NCBI Entrez.

ncbi_api_key

Character string (optional). NCBI API key.

stop_on_error

Logical. If TRUE, the batch process stops if any single assessment fails. If FALSE (default), it attempts to process all identifiers and reports errors in the summary.

...

Additional arguments passed down to df_assess_relevance and subsequently to the LLM API call function (e.g., temperature).

service

Character string. The LLM service to use (e.g., "openai").

model

Character string. The specific LLM model name.

Value

A data frame (tibble) summarizing the assessment results for each identifier, with columns:

identifier

The DOI or PMID.

status

"Success" or "Failure".

decision

Assessment decision ("Include", "Exclude", etc.) if status is "Success".

score

Confidence score if status is "Success".

rationale

LLM rationale if status is "Success".

error_message

The error message if status is "Failure".

Also prints progress and summary information to the console. Assessment results are saved to the cache within the metawoRld project.

Examples

if (FALSE) { # \dontrun{
# --- Prerequisites ---
# 1. Set API key: usethis::edit_r_environ("project") -> add OPENAI_API_KEY=sk-... -> Restart R
# 2. Create a dummy metawoRld project
proj_path <- file.path(tempdir(), "assess_batch_proj")
metawoRld::create_metawoRld(
   proj_path,
   project_name = "Test Batch Assessment",
   project_description = "Testing DataFindR batch assessment",
   inclusion_criteria = c("Human study", "Pregnancy", "Serum or Plasma", "Cytokine measurement"),
   exclusion_criteria = c("Animal study", "Review article", "Non-English")
)

# --- Identifiers from a hypothetical search ---
ids_to_assess <- c(
  "31772108", # Should likely be Include
  "25376210", # Should likely be Include
  "invalid_pmid", # Should fail fetch
  "10.1038/nature14539" # Example DOI (Nature review, likely Exclude)
)

# --- Run Batch Assessment ---
batch_results <- df_assess_batch(
  identifiers = ids_to_assess,
  metawoRld_path = proj_path,
  email = "your.email@example.com", # Replace with your email
  service = "openai",
  model = "gpt-3.5-turbo",
  stop_on_error = FALSE # Continue processing even if one fails
)

# --- View Results ---
print(batch_results)

# --- Clean up ---
unlink(proj_path, recursive = TRUE)
} # }