### Install DataQualityDashboard from GitHub Source: https://ohdsi.github.io/DataQualityDashboard/index.html Install the development version of DataQualityDashboard directly from its GitHub repository using the 'remotes' package. Ensure 'remotes' is installed first. ```r install.packages("remotes") remotes::install_github("OHDSI/DataQualityDashboard") ``` -------------------------------- ### Install http-server Source: https://ohdsi.github.io/DataQualityDashboard/articles/DataQualityDashboard.html Installs the http-server package globally using npm. This is required for launching the dashboard on a web server. ```bash npm install -g http-server ``` -------------------------------- ### Run http-server Source: https://ohdsi.github.io/DataQualityDashboard/articles/DataQualityDashboard.html Starts a simple HTTP server from the command line. Navigate to the directory containing the results.json file before running. ```bash http-server ``` -------------------------------- ### SQL-Only Mode Connection Details Example Source: https://ohdsi.github.io/DataQualityDashboard/articles/DataQualityDashboard.html When sqlOnly is TRUE, connectionDetails only requires dbms and pathToDriver. This example shows the minimal parameters needed for generating SQL scripts without a live connection. ```r connectionDetails <- DatabaseConnector::createConnectionDetails( dbms = "", # specify your dbms pathToDriver = "/" ) ``` -------------------------------- ### Install DataQualityDashboard from CRAN Source: https://ohdsi.github.io/DataQualityDashboard/index.html Use this command to install the latest stable version of the DataQualityDashboard package from the Comprehensive R Archive Network (CRAN). ```r install.packages("DataQualityDashboard") ``` -------------------------------- ### Analyze Future Dates by Condition Attributes Source: https://ohdsi.github.io/DataQualityDashboard/articles/checks/plausibleValueHigh.html This SQL query groups and counts records with future condition start dates by various condition attributes. It helps identify if future dates are associated with specific condition concepts or types. ```sql SELECT co.condition_concept_id, co.condition_type_concept_id, co.condition_status_concept_id, COUNT(1) FROM condition_occurrence co WHERE condition_start_date > current_date GROUP BY co.condition_concept_id, co.condition_type_concept_id, co.condition_status_concept_id ORDER BY COUNT(1) DESC ; ``` -------------------------------- ### Define Database Parameters for SQL Templates Source: https://ohdsi.github.io/DataQualityDashboard/articles/SqlOnly.html Sets up database schema and table names using '@-syntax' for templated SQL queries. These parameters are pre-filled during query generation. ```r # Database parameters that are pre-filled in the written queries # Use @-syntax if creating a template-sql at execution-time (e.g. "@cdmDatabaseSchema") cdmDatabaseSchema <- "@cdmDatabaseSchema" # the fully qualified database schema name of the CDM resultsDatabaseSchema <- "@resultsDatabaseSchema" # the fully qualified database schema name of the results schema (that you can write to) writeTableName <- "@writeTableName" ``` -------------------------------- ### Violated Rows Query for Plausible Start Before End Source: https://ohdsi.github.io/DataQualityDashboard/articles/checks/plausibleStartBeforeEnd.html This SQL query identifies records where the specified start date field is after the specified end date field within a given CDM table. Use this to pinpoint data inconsistencies. ```sql SELECT '@cdmTableName.@cdmFieldName' AS violating_field, cdmTable.* FROM @schema.@cdmTableName cdmTable WHERE cdmTable.@cdmFieldName IS NOT NULL AND cdmTable.@plausibleStartBeforeEndFieldName IS NOT NULL AND cdmTable.@cdmFieldName > cdmTable.@plausibleStartBeforeEndFieldName ``` -------------------------------- ### Create Connection Details for SQL Dialect Source: https://ohdsi.github.io/DataQualityDashboard/articles/SqlOnly.html Establishes connection details, specifying the database management system (DBMS) and the path to the driver. This is a prerequisite for SQL-only mode. ```r dbmsConnectionDetails <- DatabaseConnector::createConnectionDetails( dbms = "sql server", # can be rendered to any dbms upon execution pathToDriver = "/" ) ``` -------------------------------- ### Investigate Future Dates in Condition Occurrence Source: https://ohdsi.github.io/DataQualityDashboard/articles/checks/plausibleValueHigh.html This SQL query calculates the median difference in days between condition start dates and the current date for records with future dates. It aids in understanding the magnitude of date discrepancies. ```sql SELECT MEDIAN(DATEDIFF(day, condition_start_date, current_date)) FROM condition_occurrence WHERE condition_start_date > current_date ; ``` -------------------------------- ### Configure SQL-Only Mode Parameters Source: https://ohdsi.github.io/DataQualityDashboard/articles/SqlOnly.html Sets various flags and parameters to enable and configure the SQL-only mode. This includes specifying the output SQL folder, CDM source name, and options for incremental inserts and query unioning. ```r sqlFolder <- "./results_sql_only" cdmSourceName <- "Synthea" sqlOnly <- TRUE sqlOnlyIncrementalInsert <- TRUE # this will generate an insert SQL query for each check type that will compute check results and insert them into a database table sqlOnlyUnionCount <- 100 # this unions up to 100 queries in each insert query verboseMode <- TRUE cdmVersion <- "5.4" checkLevels <- c("TABLE", "FIELD", "CONCEPT") tablesToExclude <- c() checkNames <- c() ``` -------------------------------- ### Launch Log Viewer Source: https://ohdsi.github.io/DataQualityDashboard/articles/DataQualityDashboard.html Launches the log viewer to inspect the dashboard's logs. Ensure the outputFolder and cdmSourceName are correctly defined. ```r ParallelLogger::launchLogViewer(logFileName = file.path(outputFolder, cdmSourceName, sprintf("log_DqDashboard_%s.txt", cdmSourceName))) ``` -------------------------------- ### Query for Violated Rows in withinVisitDates Check Source: https://ohdsi.github.io/DataQualityDashboard/articles/checks/withinVisitDates.html This SQL query identifies rows where an event's date falls outside a 7-day window around the associated visit's start or end date. It helps pinpoint specific instances of date mismatches for further investigation. ```sql SELECT '@cdmTableName.@cdmFieldName' AS violating_field, vo.visit_start_date, vo.visit_end_date, vo.person_id, cdmTable.* FROM @cdmDatabaseSchema.@cdmTableName cdmTable JOIN @cdmDatabaseSchema.visit_occurrence vo ON cdmTable.visit_occurrence_id = vo.visit_occurrence_id WHERE cdmTable.@cdmFieldName < dateadd(day, -7, vo.visit_start_date) OR cdmTable.@cdmFieldName > dateadd(day, 7, vo.visit_end_date) ``` -------------------------------- ### View Dashboard as Shiny App Source: https://ohdsi.github.io/DataQualityDashboard/articles/DataQualityDashboard.html Launches the Data Quality Dashboard as a Shiny application. Requires the fully-qualified path to the JSON results file. ```r # Use the fully-qualified path to the JSON results file DataQualityDashboard::viewDqDashboard(jsonFilePath) ``` -------------------------------- ### Define Output Folder and File Source: https://ohdsi.github.io/DataQualityDashboard/articles/DataQualityDashboard.html Specifies the directory where results and logs will be stored and the name of the JSON results file. ```R outputFolder <- "output" outputFile <- "results.json" ``` -------------------------------- ### Configure Incremental SQL Inserts Source: https://ohdsi.github.io/DataQualityDashboard/articles/DataQualityDashboard.html When running in SQL-only mode, set this to TRUE to generate SQL queries that insert results into a specified database table. This is required for storing metadata and check results. ```r sqlOnlyIncrementalInsert <- FALSE # set to TRUE if you want the generated SQL queries to calculate DQD results and insert them into a database table (@resultsDatabaseSchema.@writeTableName) ``` -------------------------------- ### Execute Generated SQL Queries and Convert Results to JSON Source: https://ohdsi.github.io/DataQualityDashboard/articles/SqlOnly.html This R script demonstrates how to execute the SQL files generated by DQD in SQL-only mode. It connects to a database, creates the results table, runs each check query, and then exports the results to a JSON file for use with the DQD Shiny app. Note that this uses non-exported DQD functions. ```r library(DatabaseConnector) cdmSourceName <- "" sqlFolder <- "./results_sql_only" jsonOutputFolder <- sqlFolder jsonOutputFile <- "sql_only_results.json" dbms <- Sys.getenv("DBMS") server <- Sys.getenv("DB_SERVER") port <- Sys.getenv("DB_PORT") user <- Sys.getenv("DB_USER") password <- Sys.getenv("DB_PASSWORD") pathToDriver <- Sys.getenv("PATH_TO_DRIVER") connectionDetails <- DatabaseConnector::createConnectionDetails( dbms = dbms, server = server, port = port, user = user, password = password, pathToDriver = pathToDriver ) cdmDatabaseSchema <- '' resultsDatabaseSchema <- '' writeTableName <- 'dqd_results' # or whatever you want to name your results table c <- DatabaseConnector::connect(connectionDetails) # Create results table ddlFile <- file.path(sqlFolder, "ddlDqdResults.sql") DatabaseConnector::renderTranslateExecuteSql( connection = c, sql = readChar(ddlFile, file.info(ddlFile)$size), resultsDatabaseSchema = resultsDatabaseSchema, writeTableName = writeTableName ) # Run checks dqdSqlFiles <- Sys.glob(file.path(sqlFolder, "*.sql")) for (dqdSqlFile in dqdSqlFiles) { if (dqdSqlFile == ddlFile) { next } print(dqdSqlFile) tryCatch( expr = { DatabaseConnector::renderTranslateExecuteSql( connection = c, sql = readChar(dqdSqlFile, file.info(dqdSqlFile)$size), cdmDatabaseSchema = cdmDatabaseSchema, resultsDatabaseSchema = resultsDatabaseSchema, writeTableName = writeTableName ) }, error = function(e) { print(sprintf("Writing table failed for check %s with error %s", dqdSqlFile, e$message)) } ) } # Extract results table to JSON file for viewing or secondary use DataQualityDashboard::writeDBResultsToJson( c, connectionDetails, resultsDatabaseSchema, cdmDatabaseSchema, writeTableName, jsonOutputFolder, jsonOutputFile ) jsonFilePath <- R.utils::getAbsolutePath(file.path(jsonOutputFolder, jsonOutputFile)) DataQualityDashboard::viewDqDashboard(jsonFilePath) ``` -------------------------------- ### Execute DQD with Custom Thresholds Source: https://ohdsi.github.io/DataQualityDashboard/articles/Thresholds.html Use this function call to run the Data Quality Dashboard with custom threshold files. Specify the fully qualified locations for table, field, and concept check threshold files using the respective parameters. To use default thresholds, remove these parameters or set them to 'default'. ```r DataQualityDashboard::executeDqChecks(connectionDetails = connectionDetails, cdmDatabaseSchema = cdmDatabaseSchema, resultsDatabaseSchema = resultsDatabaseSchema, cdmSourceName = cdmSourceName, numThreads = numThreads, sqlOnly = sqlOnly, outputFolder = outputFolder, verboseMode = verboseMode, writeToTable = writeToTable, checkLevels = checkLevels, tablesToExclude = tablesToExclude, checkNames = checkNames, tableCheckThresholdLoc = location of the table check file, fieldCheckThresholdLoc = location of the field check file, conceptCheckThresholdLoc = location of the concept check file) ``` -------------------------------- ### Execute Data Quality Checks Source: https://ohdsi.github.io/DataQualityDashboard/articles/DataQualityDashboard.html Configure connection details and database schemas to execute data quality checks. Ensure the CDM_SOURCE table has at least one row. ```r # fill out the connection details ----------------------------------------------------------------------- connectionDetails <- DatabaseConnector::createConnectionDetails( dbms = "", user = "", password = "", server = "", port = "", extraSettings = "", pathToDriver = "" ) cdmDatabaseSchema <- "yourCdmSchema" # the fully qualified database schema name of the CDM resultsDatabaseSchema <- "yourResultsSchema" # the fully qualified database schema name of the results schema (that you can write to) cdmSourceName <- "Your CDM Source" # a human readable name for your CDM source cdmVersion <- "5.4" # the CDM version you are targetting. Currently supports 5.2, 5.3, and 5.4 # determine how many threads (concurrent SQL sessions) to use ---------------------------------------- numThreads <- 1 # on Redshift, 3 seems to work well ``` -------------------------------- ### Investigate Source Concept Record Completeness Failures Source: https://ohdsi.github.io/DataQualityDashboard/articles/checks/sourceConceptRecordCompleteness.html This SQL query helps identify source codes that failed to map to an OMOP concept. It summarizes standard concept names, source concept IDs, source values, and their counts where the source concept ID is 0. ```sql SELECT concept.concept_name AS standard_concept_name, cdmTable._concept_id, -- standard concept ID field for the table c2.concept_name AS source_value_concept_name, cdmTable._source_value, -- source value field for the table COUNT(*) FROM @cdmDatabaseSchema.@cdmTableName cdmTable LEFT JOIN @vocabDatabaseSchema.concept ON concept.concept_id = cdmTable._concept_id -- WARNING this join may cause fanning if a source value exists in multiple vocabularies LEFT JOIN @vocabDatabaseSchema.concept c2 ON concept.concept_code = cdmTable._source_value AND c2.domain_id = WHERE cdmTable.@cdmFieldName = 0 GROUP BY 1,2,3 ORDER BY 4 DESC ``` -------------------------------- ### viewDqDashboard Source: https://ohdsi.github.io/DataQualityDashboard/reference/viewDqDashboard.html Launches the Data Quality Dashboard in a web browser, allowing users to visualize the results of data quality checks. The dashboard is powered by a JSON file generated by the `executeDqChecks` function. ```APIDOC ## viewDqDashboard ### Description Launches the Data Quality Dashboard, which visualizes the results of executed data quality checks. This function requires a JSON file containing the check results as input. ### Function Signature `viewDqDashboard(jsonPath, launch.browser = NULL, display.mode = NULL, ...)` ### Arguments * **jsonPath** (string) - Required - The fully-qualified path to the JSON file produced by `executeDqChecks`. * **launch.browser** (logical or character) - Optional - Passed on to `shiny::runApp`. Determines if and how the browser should be launched. * **display.mode** (character) - Optional - Passed on to `shiny::runApp`. Specifies the display mode for the Shiny application. * **...** - Optional - Extra parameters for `shiny::runApp()`, such as "port" or "host". ### Value NULL (The function launches a Shiny application and does not return a value directly.) ### Example ```r # Assuming dq_results.json is the output from executeDqChecks viewDqDashboard(jsonPath = "path/to/dq_results.json", launch.browser = TRUE) ``` ``` -------------------------------- ### Configure Writing Results to SQL Table Source: https://ohdsi.github.io/DataQualityDashboard/articles/DataQualityDashboard.html Determines if results should be written to a SQL table. Set to FALSE to skip this step. Also specifies the name of the results table. ```R writeToTable <- TRUE # set to FALSE if you want to skip writing to a SQL table in the results schema writeTableName <- "dqdashboard_results" ``` -------------------------------- ### Execute Data Quality Checks Source: https://ohdsi.github.io/DataQualityDashboard/articles/DataQualityDashboard.html Run the Data Quality Dashboard checks with specified connection details, database schemas, source name, version, and output configurations. Ensure all necessary parameters like `connectionDetails`, `cdmDatabaseSchema`, and `resultsDatabaseSchema` are properly set. ```R DataQualityDashboard::executeDqChecks(connectionDetails = connectionDetails, cdmDatabaseSchema = cdmDatabaseSchema, resultsDatabaseSchema = resultsDatabaseSchema, cdmSourceName = cdmSourceName, cdmVersion = cdmVersion, numThreads = numThreads, sqlOnly = sqlOnly, sqlOnlyUnionCount = sqlOnlyUnionCount, sqlOnlyIncrementalInsert = sqlOnlyIncrementalInsert, outputFolder = outputFolder, outputFile = outputFile, verboseMode = verboseMode, writeToTable = writeToTable, writeToCsv = writeToCsv, csvFile = csvFile, checkLevels = checkLevels, checkSeverity = checkSeverity, tablesToExclude = tablesToExclude, checkNames = checkNames) ``` -------------------------------- ### Configure SQL Execution Mode Source: https://ohdsi.github.io/DataQualityDashboard/articles/DataQualityDashboard.html Set this flag to TRUE to generate SQL scripts instead of executing them against a database. This is useful for debugging or manual execution. ```r sqlOnly <- FALSE # set to TRUE if you just want to get the SQL scripts and not actually run the queries ``` -------------------------------- ### Configure Redshift Bulk Loading Environment Variables Source: https://ohdsi.github.io/DataQualityDashboard/articles/DataQualityDashboard.html Sets environment variables for bulk loading data into Redshift, including AWS credentials and bucket information. This snippet is commented out and requires specific values to be provided. ```R # Sys.setenv("AWS_ACCESS_KEY_ID" = "", # "AWS_SECRET_ACCESS_KEY" = "", # "AWS_DEFAULT_REGION" = "", # "AWS_BUCKET_NAME" = "", # "AWS_OBJECT_KEY" = "", # "AWS_SSE_TYPE" = "AES256", # "USE_MPP_BULK_LOAD" = TRUE) ``` -------------------------------- ### executeDqChecks Source: https://ohdsi.github.io/DataQualityDashboard/reference/executeDqChecks.html Executes data quality checks against a CDM database. It can generate SQL, run checks, and output results to various formats. ```APIDOC ## executeDqChecks ### Description Connects to the database, generates SQL scripts, and runs data quality checks. By default, results are written to a JSON file and a database table. ### Arguments - **connectionDetails** (object) - A connectionDetails object for connecting to the CDM database. - **cdmDatabaseSchema** (string) - The fully qualified database name of the CDM schema. - **resultsDatabaseSchema** (string) - The fully qualified database name of the results schema. - **vocabDatabaseSchema** (string) - The fully qualified database name of the vocabulary schema (default is to set it as the cdmDatabaseSchema). - **cdmSourceName** (string) - The name of the CDM data source. - **numThreads** (integer) - The number of concurrent threads to use to execute the queries (default is 1). - **sqlOnly** (boolean) - Should the SQLs be executed (FALSE) or just returned (TRUE)? - **sqlOnlyUnionCount** (integer) - (OPTIONAL) In sqlOnlyIncrementalInsert mode, how many SQL commands to union in each query to insert check results into results table (can speed processing when queries done in parallel). Default is 1. - **sqlOnlyIncrementalInsert** (boolean) - (OPTIONAL) In sqlOnly mode, boolean to determine whether to generate SQL queries that insert check results and associated metadata into results table. Default is FALSE. - **outputFolder** (string) - The folder to output logs, SQL files, and JSON results file to. - **outputFile** (string) - (OPTIONAL) File to write results JSON object. - **verboseMode** (boolean) - Boolean to determine if the console will show all execution steps. Default is FALSE. - **writeToTable** (boolean) - Boolean to indicate if the check results will be written to the dqdashboard_results table in the resultsDatabaseSchema. Default is TRUE. - **writeTableName** (string) - The name of the results table. Defaults to `dqdashboard_results`. Used when sqlOnly or writeToTable is True. - **writeToCsv** (boolean) - Boolean to indicate if the check results will be written to a csv file. Default is FALSE. - **csvFile** (string) - (OPTIONAL) CSV file to write results. - **checkLevels** (array) - Choose which DQ check levels to execute. Default is all 3 (TABLE, FIELD, CONCEPT). - **checkNames** (array) - (OPTIONAL) Choose which check names to execute. Names can be found in inst/csv/OMOP_CDM_v[cdmVersion]_Check_Descriptions.csv. Note that "cdmTable", "cdmField" and "measureValueCompleteness" are always executed. - **checkSeverity** (array) - Choose which DQ check severity levels to execute. Default is all 3 (fatal, convention, characterization). - **cohortDefinitionId** (array) - The cohort definition id for the cohort you wish to run the DQD on. The package assumes a standard OHDSI cohort table with the fields cohort_definition_id and subject_id. - **cohortDatabaseSchema** (string) - The schema where the cohort table is located. - **cohortTableName** (string) - The name of the cohort table. Defaults to `cohort`. - **tablesToExclude** (array) - (OPTIONAL) Choose which CDM tables to exclude from the execution. - **cdmVersion** (string) - The CDM version to target for the data source. Options are "5.2", "5.3", or "5.4". By default, "5.3" is used. - **tableCheckThresholdLoc** (string) - The location of the threshold file for evaluating the table checks. If not specified the default thresholds will be applied. - **fieldCheckThresholdLoc** (string) - The location of the threshold file for evaluating the field checks. If not specified the default thresholds will be applied. - **conceptCheckThresholdLoc** (string) - The location of the threshold file for evaluating the concept checks. If not specified the default thresholds will be applied. ### Value A list object of results ``` -------------------------------- ### SQL Query Template for Data Quality Dashboard Source: https://ohdsi.github.io/DataQualityDashboard/articles/AddNewCheck.html This template demonstrates the expected structure for SQL queries within the Data Quality Dashboard. It includes placeholders for input parameters and calculates violated rows, percentage, and denominator rows. ```sql SELECT num_violated_rows , CASE WHEN denominator.num_rows = 0 THEN 0 ELSE 1.0*dqd_check.num_violated_rows/denominator.num_rows END AS pct_violated_rows, denominator.num_rows as num_denominator_rows FROM ( ) dqd_check CROSS JOIN ( SELECT COUNT_BIG(*) AS num_rows FROM @cdmDatabaseSchema.@cdmTableName cdmTable ) denominator; ``` -------------------------------- ### Configure Writing Results to CSV File Source: https://ohdsi.github.io/DataQualityDashboard/articles/DataQualityDashboard.html Determines if results should be written to a CSV file. Set to FALSE to skip this step. The CSV file name is specified separately. ```R writeToCsv <- FALSE # set to FALSE if you want to skip writing to csv file csvFile <- "" # only needed if writeToCsv is set to TRUE ``` -------------------------------- ### List Data Quality Checks Source: https://ohdsi.github.io/DataQualityDashboard/reference/listDqChecks.html Call the `listDqChecks` function to retrieve descriptions and thresholds for various data quality checks. Default thresholds are applied if specific locations are not provided. ```R listDqChecks( cdmVersion = "5.3", tableCheckThresholdLoc = "default", fieldCheckThresholdLoc = "default", conceptCheckThresholdLoc = "default" ) ``` -------------------------------- ### Find Standard Concept Mapping Source: https://ohdsi.github.io/DataQualityDashboard/articles/checks/standardConceptRecordCompleteness.html Use this query to find the standard concept mapping for a given source concept ID. If no results are returned, it may indicate a vocabulary issue or a problem with local source-to-concept mappings. ```sql SELECT concept_id AS standard_concept_mapping FROM @vocabDatabaseSchema.concept_relationship JOIN @vocabDatabaseSchema.concept ON concept.concept_id = c oncept_relationship.concept_id_2 AND relationship_id = 'Maps to' WHERE concept_relationship.concept_id_1 = ``` -------------------------------- ### Execute DQD Checks in SQL-Only Mode Source: https://ohdsi.github.io/DataQualityDashboard/articles/SqlOnly.html This R code executes Data Quality Dashboard checks with `sqlOnly` and `sqlOnlyIncrementalInsert` set to TRUE. It generates SQL files for each check in the specified output folder. ```r DataQualityDashboard::executeDqChecks( connectionDetails = dbmsConnectionDetails, cdmDatabaseSchema = cdmDatabaseSchema, resultsDatabaseSchema = resultsDatabaseSchema, writeTableName = writeTableName, cdmSourceName = cdmSourceName, sqlOnly = sqlOnly, sqlOnlyUnionCount = sqlOnlyUnionCount, sqlOnlyIncrementalInsert = sqlOnlyIncrementalInsert, outputFolder = sqlFolder, checkLevels = checkLevels, verboseMode = verboseMode, cdmVersion = cdmVersion, tablesToExclude = tablesToExclude, checkNames = checkNames ) ``` -------------------------------- ### Execute DQD for a Specific Cohort Source: https://ohdsi.github.io/DataQualityDashboard/articles/DqdForCohorts.html Use this function to run the Data Quality Dashboard on a specific cohort. Ensure `cohortDefinitionId` and `cohortDatabaseSchema` are correctly set to point to your cohort's location and ID. The `cdmSourceName` and `writeTableName` should reflect the cohort name to avoid confusion with database-wide results. ```r DataQualityDashboard::executeDqChecks(connectionDetails = connectionDetails, cdmDatabaseSchema = cdmDatabaseSchema, resultsDatabaseSchema = resultsDatabaseSchema, cdmSourceName = "IBM_CCAE_cohort_123", cohortDefinitionId = 123, cohortDatabaseSchema = "IBM_CCAE.results", cohortTableName = "cohort", numThreads = numThreads, sqlOnly = sqlOnly, outputFolder = outputFolder, verboseMode = verboseMode, writeToTable = writeToTable, writeTableName = "dqdashboard_results_123", checkLevels = checkLevels, tablesToExclude = tablesToExclude, checkNames = checkNames) ``` -------------------------------- ### View DQ Dashboard Function Signature Source: https://ohdsi.github.io/DataQualityDashboard/reference/viewDqDashboard.html This is the function signature for viewDqDashboard, used to launch the DQ Dashboard. It requires the path to the JSON results file and can accept additional arguments for shiny::runApp. ```r viewDqDashboard(jsonPath, launch.browser = NULL, display.mode = NULL, ...) ``` -------------------------------- ### Execute DQ Checks Function Signature Source: https://ohdsi.github.io/DataQualityDashboard/reference/executeDqChecks.html This is the function signature for executeDqChecks, outlining all available parameters and their default values. Use this to understand the full range of customization options. ```r executeDqChecks( connectionDetails, cdmDatabaseSchema, resultsDatabaseSchema, vocabDatabaseSchema = cdmDatabaseSchema, cdmSourceName, numThreads = 1, sqlOnly = FALSE, sqlOnlyUnionCount = 1, sqlOnlyIncrementalInsert = FALSE, outputFolder, outputFile = "", verboseMode = FALSE, writeToTable = TRUE, writeTableName = "dqdashboard_results", writeToCsv = FALSE, csvFile = "", checkLevels = c("TABLE", "FIELD", "CONCEPT"), checkNames = c(), checkSeverity = c("fatal", "convention", "characterization"), cohortDefinitionId = c(), cohortDatabaseSchema = resultsDatabaseSchema, cohortTableName = "cohort", tablesToExclude = c("CONCEPT", "VOCABULARY", "CONCEPT_ANCESTOR", "CONCEPT_RELATIONSHIP", "CONCEPT_CLASS", "CONCEPT_SYNONYM", "RELATIONSHIP", "DOMAIN"), cdmVersion = "5.3", tableCheckThresholdLoc = "default", fieldCheckThresholdLoc = "default", conceptCheckThresholdLoc = "default" ) ``` -------------------------------- ### Search Concept Table by Source Code Source: https://ohdsi.github.io/DataQualityDashboard/articles/checks/standardConceptRecordCompleteness.html This query searches the concept table for a given source value. It may return false positives if the same value exists in multiple vocabularies. This is applicable when the source value column is populated only with a vocabulary code. ```sql -- may return false positives if the same value exists in multiple vocabularies -- only applicable in the case where the source value column is populated only with a vocabulary code SELECT * FROM @vocabDatabaseSchema.concept WHERE concept_code = ``` -------------------------------- ### Generate SQL for Data Quality Checks Source: https://ohdsi.github.io/DataQualityDashboard/articles/SqlOnly.html Use this function to generate SQL queries for Data Quality Dashboard checks without executing them. A dummy connectionDetails object is required, where only the dbms is used. The sqlOnlyUnionCount parameter controls the number of check SQLs to union in a single query. ```r library(DataQualityDashboard) connectionDetails <- DatabaseConnector::createConnectionDetails( dbms = "sql server", server = "server", user = "user", password = "password" ) DataQualityDashboard::dqdSqlOnly( connectionDetails = connectionDetails, cdmDatabaseSchema = "results_schema", resultsDatabaseSchema = "results_schema", sqlOnly = TRUE, sqlOnlyIncrementalInsert = FALSE, sqlOnlyUnionCount = 100 ) ``` -------------------------------- ### Configure Logging Verbosity Source: https://ohdsi.github.io/DataQualityDashboard/articles/DataQualityDashboard.html Controls whether detailed logs are printed to the console. Set to FALSE to suppress console logging. ```R verboseMode <- TRUE # set to FALSE if you don't want the logs to be printed to the console ``` -------------------------------- ### List Data Quality Checks Source: https://ohdsi.github.io/DataQualityDashboard/articles/DataQualityDashboard.html Lists all available data quality checks for a specified CDM version using R. This helps in understanding the checks performed by the dashboard. ```r checks <- DataQualityDashboard::listDqChecks(cdmVersion = "5.3") # Put the version of the CDM you are using ``` -------------------------------- ### Write JSON Results to Table Source: https://ohdsi.github.io/DataQualityDashboard/articles/DataQualityDashboard.html Optionally writes the JSON results file to a database table. Specify the connection details, results database schema, and the JSON file path. ```r jsonFilePath <- "" DataQualityDashboard::writeJsonResultsToTable(connectionDetails = connectionDetails, resultsDatabaseSchema = resultsDatabaseSchema, jsonFilePath = jsonFilePath) ``` -------------------------------- ### Query to Identify Violated Rows for fkDomain Check Source: https://ohdsi.github.io/DataQualityDashboard/articles/checks/fkDomain.html This SQL query helps identify specific concepts within a table that have an incorrect domain_id. It counts distinct primary keys and person IDs associated with these mis-sorted concepts. Use this to understand the scope of the issue and investigate ETL errors. ```sql -- @cdmTableName.@cdmFieldName is the standard concept ID field in the table -- @cdmTableName.@cdmTablePk is the primary key field in the table SELECT concept.concept_id, concept.domain_id, concept.concept_name, concept.concept_code, COUNT(DISTINCT @cdmTableName.@cdmTablePk), COUNT(DISTINCT @cdmTableName.person_id) FROM @cdmDatabaseSchema.@cdmTableName cdmTable LEFT JOIN @vocabSchema.concept on @cdmTableName.@cdmFieldName = concept.concept_id AND concept.domain_id != {fkDomain} AND concept.concept_id != 0 GROUP BY concept.concept_id, concept.domain_id, concept.concept_name, concept.concept_code ``` -------------------------------- ### writeDBResultsToJson Source: https://ohdsi.github.io/DataQualityDashboard/reference/writeDBResultsToJson.html Writes DQD results from a specified database table to a JSON file in the given output folder. ```APIDOC ## writeDBResultsToJson ### Description Writes the Data Quality Dashboard (DQD) results from a database table to a JSON file. ### Arguments * **connection**: A connection object to the database. * **resultsDatabaseSchema** (string): The fully qualified database name of the results schema. * **cdmDatabaseSchema** (string): The fully qualified database name of the CDM schema. * **writeTableName** (string): The name of the DQD results table in the database to read from. * **outputFolder** (string): The folder where the JSON results file will be saved. * **outputFile** (string): The desired filename for the output JSON results file. ### Value NULL. The function's output is the creation of a JSON file. ```