From c3ab4d463a44f87ef3a4e608baf842a6b5ee581a Mon Sep 17 00:00:00 2001 From: Olivier Leroy Date: Fri, 12 Jul 2024 14:08:29 -0400 Subject: [PATCH 1/2] review metadata and start with how --- Metadata.md | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/Metadata.md b/Metadata.md index e7259e5..84061c2 100644 --- a/Metadata.md +++ b/Metadata.md @@ -1,8 +1,13 @@ +# How ? + + + + ## What is Metadata? The MDA tracks a variety of metadata on high-traffic tables. Metadata is information _about_ the data that we are sourcing and/or creating, which may include: * Where the data came from (source) - * When the data was acquired + * When the data was acquired -> * What the table names mean * What the field names mean * Any information about how the data was created/generated/derived @@ -85,3 +90,16 @@ present in 3 tables and save a 3 csv for them with select row - update our tools in production: BEAD / CH - cori.utils check - side not unsure if dplyr::add_rows should be added here + +First steps: + +- we should focus first on: + + * tables in DB + + * bucket in s3 + +- setup automation for that (write db) + +- see what we do with pull data that is writing to temp/disk + From 04c9637f940dcb392856d6be2f55928e9a6afae2 Mon Sep 17 00:00:00 2001 From: John Hall Date: Fri, 12 Jul 2024 14:19:33 -0400 Subject: [PATCH 2/2] update mda data infrastructure diagrams --- MDA-Data-Infrastructure-Overview.qmd | 22 +++++++++++++----- _quarto.yml | 31 +++++++++++++------------- img/MDA Data Flow - Current.drawio.svg | 4 ++++ img/MDA Data Flow - Future.drawio.svg | 4 ++++ 4 files changed, 41 insertions(+), 20 deletions(-) create mode 100644 img/MDA Data Flow - Current.drawio.svg create mode 100644 img/MDA Data Flow - Future.drawio.svg diff --git a/MDA-Data-Infrastructure-Overview.qmd b/MDA-Data-Infrastructure-Overview.qmd index 93af20b..82d9800 100644 --- a/MDA-Data-Infrastructure-Overview.qmd +++ b/MDA-Data-Infrastructure-Overview.qmd @@ -7,15 +7,27 @@ execute: warning: false --- -
-![](img/MDA%20Data%20Infrastructure.svg) +## Current + +
+ +![](img/MDA%20Data%20Flow%20-%20Current.drawio.svg)

+ +## Future + +
+ +![](img/MDA%20Data%20Flow%20-%20Future.drawio.svg) \ No newline at end of file +mermaid-diagram +--> + +
+
\ No newline at end of file diff --git a/_quarto.yml b/_quarto.yml index d1fc62c..647cb50 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -21,23 +21,24 @@ website: - CORI-Ontology.qmd - section: "Infrastructure" contents: + - MDA-Data-Infrastructure-Overview.qmd - Ansible.md - CORI-Data-API.md - # - section: "CORI Data API" - # contents: - # - CORI-Data-API.md - ## - Adding-New-Endpoints-&-Services-to-the-Python-RESTApi.md - ## - Apollo-Studio-Setup.md - ## - Configuring-our-AWS-CLI-(CDK-and-SAM)-Credentials.md - ## - Example-React-Map-Container-with-REACT-MAP-GL-Package.md - ## - Migrating-from-CDK-V1-=--V2.md - ## - Postman-Workspace-Setup-and-Configuration.md - ## - R-and-Python-API-Usage.md - ## - React,-AWS-Amplify-and-API-Usage.md - ## - SAML-User-Pool-IdP-Authentication-Flow.md - ## - Session-2-‐-2023‐10‐23.md - ## - Support-Sessions.md - ## - Working-with-the-GraphQL-Schemas.md + # - section: "CORI Data API" + # contents: + # - CORI-Data-API.md + ## - Adding-New-Endpoints-&-Services-to-the-Python-RESTApi.md + ## - Apollo-Studio-Setup.md + ## - Configuring-our-AWS-CLI-(CDK-and-SAM)-Credentials.md + ## - Example-React-Map-Container-with-REACT-MAP-GL-Package.md + ## - Migrating-from-CDK-V1-=--V2.md + ## - Postman-Workspace-Setup-and-Configuration.md + ## - R-and-Python-API-Usage.md + ## - React,-AWS-Amplify-and-API-Usage.md + ## - SAML-User-Pool-IdP-Authentication-Flow.md + ## - Session-2-‐-2023‐10‐23.md + ## - Support-Sessions.md + ## - Working-with-the-GraphQL-Schemas.md - Data-Security.qmd - duckDB.qmd - PostgreSQL-RDS-Managment.md diff --git a/img/MDA Data Flow - Current.drawio.svg b/img/MDA Data Flow - Current.drawio.svg new file mode 100644 index 0000000..7b79ed0 --- /dev/null +++ b/img/MDA Data Flow - Current.drawio.svg @@ -0,0 +1,4 @@ + + + +
RII/RIN teams
RII/RIN t...
PUBLIC
PUBLIC
Internet  
Internet...
"On prem" (Remote work environment)
"On prem" (Remote work environment)
MDA team
MDA team
   BB team
   BB team
RII/RIN teams  
RII/RIN t...
        CORI Data API
        CORI Data API
BEAD & BB Climate Risk Apps
BEAD & BB Climate Risk Apps
Amplify
Amplify
 CORI-RISI DB (RDS)  
 CORI-RISI DB (RDS)  
 AWS
 AWS
Data Sources
Data Sources
API
API
HTTP/FTP Direct Download
HTTP/FTP Dire...
Web Scraping
Web Scraping
S3 Direct Download / AWS API
S3 Direct Dow...
PostgreSQL 15
PostgreSQL 15
Ubuntu 20
R version 4+
Ubuntu 20...
Lambda
Lambda
CartoDB 4.32.0 (2019-12-27)
CartoDB 4.32.0 (2019...
Text is not SVG - cannot display
\ No newline at end of file diff --git a/img/MDA Data Flow - Future.drawio.svg b/img/MDA Data Flow - Future.drawio.svg new file mode 100644 index 0000000..88fc529 --- /dev/null +++ b/img/MDA Data Flow - Future.drawio.svg @@ -0,0 +1,4 @@ + + + +
RII/RIN teams
RII/RIN t...
PUBLIC
PUBLIC
Internet  
Internet...
"On prem" (Remote work environment)
"On prem" (Remote work environment)
MDA team
MDA team
   BB team
   BB team
RII/RIN teams  
RII/RIN t...
 CORI-RISI DB (RDS)  
 CORI-RISI DB (RDS)  
 AWS
 AWS
Data Sources
Data Sources
API
API
HTTP/FTP Direct Download
HTTP/FTP Dire...
Web Scraping
Web Scraping
S3 Direct Download / AWS API
S3 Direct Dow...
arrow
duckdb
targets
(R packages)
arrow...
        CORI Data API
        CORI Data API
BEAD & BB Climate Risk Apps
BEAD & BB Climate Risk Apps
Amplify
Amplify
Lambda
Lambda
(EC2)
(EC2)
(EC2)
(EC2)
Prep for AI & ML (MLOps)
Prep for AI & ML (ML...
Self-Service Data Analytics
Self-Service Data An...
Text is not SVG - cannot display
\ No newline at end of file