Assess Relevance for a Batch of Study Identifiers
df_assess_batch.Rd
Runs the relevance assessment workflow (df_assess_relevance
) for multiple
DOIs/PMIDs, leveraging caching and providing a summary of results.
Usage
df_assess_batch(
chat,
identifiers,
metawoRld_path,
force_fetch = FALSE,
force_assess = FALSE,
email = NULL,
ncbi_api_key = NULL,
stop_on_error = FALSE,
...
)
Arguments
- identifiers
Character vector. A vector of DOIs and/or PMIDs.
- metawoRld_path
Character string. Path to the root of the metawoRld project.
- force_fetch
Logical. If TRUE, bypass the metadata cache for all identifiers.
- force_assess
Logical. If TRUE, bypass the assessment cache for all identifiers.
Character string (optional). Email for NCBI Entrez.
- ncbi_api_key
Character string (optional). NCBI API key.
- stop_on_error
Logical. If TRUE, the batch process stops if any single assessment fails. If FALSE (default), it attempts to process all identifiers and reports errors in the summary.
- ...
Additional arguments passed down to
df_assess_relevance
and subsequently to the LLM API call function (e.g.,temperature
).- service
Character string. The LLM service to use (e.g., "openai").
- model
Character string. The specific LLM model name.
Value
A data frame (tibble) summarizing the assessment results for each identifier, with columns:
identifier
The DOI or PMID.
status
"Success" or "Failure".
decision
Assessment decision ("Include", "Exclude", etc.) if status is "Success".
score
Confidence score if status is "Success".
rationale
LLM rationale if status is "Success".
error_message
The error message if status is "Failure".
Also prints progress and summary information to the console. Assessment
results are saved to the cache within the metawoRld
project.
Examples
if (FALSE) { # \dontrun{
# --- Prerequisites ---
# 1. Set API key: usethis::edit_r_environ("project") -> add OPENAI_API_KEY=sk-... -> Restart R
# 2. Create a dummy metawoRld project
proj_path <- file.path(tempdir(), "assess_batch_proj")
metawoRld::create_metawoRld(
proj_path,
project_name = "Test Batch Assessment",
project_description = "Testing DataFindR batch assessment",
inclusion_criteria = c("Human study", "Pregnancy", "Serum or Plasma", "Cytokine measurement"),
exclusion_criteria = c("Animal study", "Review article", "Non-English")
)
# --- Identifiers from a hypothetical search ---
ids_to_assess <- c(
"31772108", # Should likely be Include
"25376210", # Should likely be Include
"invalid_pmid", # Should fail fetch
"10.1038/nature14539" # Example DOI (Nature review, likely Exclude)
)
# --- Run Batch Assessment ---
batch_results <- df_assess_batch(
identifiers = ids_to_assess,
metawoRld_path = proj_path,
email = "your.email@example.com", # Replace with your email
service = "openai",
model = "gpt-3.5-turbo",
stop_on_error = FALSE # Continue processing even if one fails
)
# --- View Results ---
print(batch_results)
# --- Clean up ---
unlink(proj_path, recursive = TRUE)
} # }