Data Infrastructure Overview
Data Infrastructure Overview
-
+
- The central repository for our data is a single RDS cluster running PostgreSQL and hosting multiple database instances. These databases can either be accessed by regular login roles which are assigned privileges via membership in pg managed group roles or by a few administrative login roles that are managed with Active Directory and thus restricted to access via AD login services within the CORI-RISI VPC (the AD Domain controllers do not have any public network interfaces). +
-
+
- The CORI-RISI R server is an EC2/Linux virtual machine accessible to all MDA team members via ssh keys. It is joined the AD domain, so administrators have direct access to the database from this remote server. It also runs R pipelines/workloads as well as hosts a shared instances of R Server, although most day-to-day R analytics happens on our individual work stations (laptops). +
- An EC2/Windows virtual machine is available for more advanced administration of Active Directory. +
+- AWS hosts an AWS-managed Active Directory instance which includes two domain controllers. The servers joined to this domain (aws.ruralinnovation.us) include the RDS cluster, the CORI-RISI R server and the Windows server administrative instance. +
- AWS Cognito service for user authentication and profiles. This service also integrates with Google login, which is currently our main way of tracking non-anonymous users. The CORI Data API endpoints connect to this service to validate authorized access to the API via temporary tokens issued by Cognito. +
- CORI Data API development endpoint services. Each endpoint service routes request to Lambda functions to perform the backend processing and querying of the database. These are divided into
+
-
+
- GraphQL (TypeScript) and +
- REST (Python) API protocols. +
+ - CORI Data API production endpoint services. These are divided into
+
-
+
- GraphQL (TypeScript) and +
- REST (Python) API protocols. +
+ - Public facing web applications are deployed on AWS Amplify, a frontend hosting service. These applications are based on React JS and utilize the CORI Data API for backend data support. +
- We maintain a standalone instance of the opensource CartoDB map server hosted on an EC2/Linux virtual machine. The data used by this server is self-contained and the server itself has no connection to our database. +