Assess Study Relevance using LLM — df_assess

Fetches metadata for an identifier, generates a prompt based on project criteria, calls an LLM API to assess relevance based on Title/Abstract, parses the response, and caches results.

Usage

df_assess_relevance(
  chat,
  identifier,
  metawoRld_path,
  force_fetch = FALSE,
  force_assess = FALSE,
  email = NULL,
  ncbi_api_key = NULL,
  ...
)

Arguments

chat: An ellmer chat object.
identifier: Character string. The DOI or PMID of the study.
metawoRld_path: Character string. Path to the root of the metawoRld project.
force_fetch: Logical. If TRUE, bypass the metadata cache and re-fetch from online sources. Defaults to FALSE.
force_assess: Logical. If TRUE, bypass the assessment cache and re-run the LLM assessment. Defaults to FALSE.
email: Character string (optional). Email for NCBI Entrez.
ncbi_api_key: Character string (optional). NCBI API key.
...: Additional arguments passed to the underlying LLM API call function (e.g., temperature, max_tokens passed to .call_llm_openai).
service: Character string. The LLM service to use (currently only "openai").
model: Character string. The specific LLM model name.

Value

A list containing the structured assessment result (decision, score, rationale) or aborts on critical failure.

Examples

if (FALSE) { # \dontrun{
# --- Prerequisites ---
# 1. Set API key: usethis::edit_r_environ("project") -> add OPENAI_API_KEY=sk-... -> Restart R
# 2. Create a dummy metawoRld project
proj_path <- file.path(tempdir(), "assess_test_proj")
metawoRld::create_metawoRld(
   proj_path,
   project_name = "Test Assessment",
   project_description = "Testing DataFindR assessment",
   inclusion_criteria = c("Human study", "Pregnancy", "Serum or Plasma", "Cytokine measurement"),
   exclusion_criteria = c("Animal study", "Review article", "Non-English")
)

# --- Run Assessment ---
pmid <- "31772108" # Example PMID relevant to cytokines/pregnancy
tryCatch({
  assessment_res <- df_assess_relevance(
     identifier = pmid,
     metawoRld_path = proj_path,
     email = "your.email@example.com", # Replace with your email
     service = "openai",
     model = "gpt-3.5-turbo" # Use a cheaper model for testing initially
  )
  print(assessment_res)

  # --- Run again (should use cache) ---
  assessment_res_cached <- df_assess_relevance(pmid, proj_path, email = "your.email@example.com")
  print(assessment_res_cached)

  # --- Force re-assessment ---
  assessment_res_forced <- df_assess_relevance(pmid, proj_path, email = "your.email@example.com", force_assess = TRUE)
  print(assessment_res_forced)

}, error = function(e) {
  message("Assessment failed: ", e$message)
})

# --- Clean up ---
unlink(proj_path, recursive = TRUE)
} # }