Database Management - kubernetes/kubernetes

Kubernetes Database Schema

Kubernetes does not use a traditional database schema like SQL. Instead, it relies on a distributed, key-value store called etcd for storing and managing cluster state.

etcd provides a simple API for storing and retrieving key-value pairs. Kubernetes uses this API to store various data, including:

  • Configuration: This includes Kubernetes resources such as deployments, pods, services, etc.
  • Cluster State: Information about the health of nodes, pods, and other components.
  • Access Control: User authentication and authorization data.

While etcd doesn’t have a defined schema in the traditional sense, Kubernetes uses a specific format to represent data. Here’s a breakdown:

  • Resource Definitions: Kubernetes resources, such as Pods, Deployments, Services, etc., are represented as JSON objects. The structure of these objects is defined by the Kubernetes API.
  • etcd Key Structure: Keys in etcd are hierarchical and follow a specific pattern. The structure is generally based on the resource type and its associated namespace. For example, a pod named mypod in the default namespace would have a key similar to pods/default/mypod.
  • Data Storage: The actual data for each resource is stored as a JSON object under the corresponding key.

Example

Let’s consider a simple Pod resource:

apiVersion: v1
kind: Pod
metadata:
  name: mypod
  namespace: default
spec:
  containers:
  - name: nginx
    image: nginx:1.14.2

This pod would be stored in etcd with a key similar to pods/default/mypod, and its data would be represented as a JSON object based on the Kubernetes API definition for Pods.

Retrieving Information

You can access data stored in etcd using the kubectl command-line tool. For example:

kubectl get pods -n default

This command retrieves all Pods in the default namespace and displays them in a user-friendly format.

Note:

While etcd stores data in a key-value format, its internal structure might not be directly accessible or easily analyzed. The focus is on providing a reliable and consistent data store for Kubernetes.

Kubernetes uses a dedicated API for interacting with etcd, and the underlying schema is largely internal and not directly exposed.

This documentation does not cover the details of how to access or interact with etcd directly.

This documentation focuses on querying databases within Kubernetes. You will need to access the database through a pod, service, or a persistent volume claim.

Accessing Databases Through Pods

test/e2e/upgrades/apps/mysql.go

const mysqlManifestPath = "test/e2e/testing-manifests/statefulset/mysql-upgrade"

You can use kubectl to exec into a pod running your database and query it directly. For example, to exec into a pod named mysql-0 and run a query:

kubectl exec -it mysql-0 -n <namespace> -c mysql bash -c "mysql -u root -p <database_name> -e 'SELECT * FROM your_table'"

This command will:

  • kubectl exec: Execute a command in a container of a pod.
  • -it: Enable interactive mode and allocate a pseudo terminal.
  • mysql-0: The name of the pod.
  • -n <namespace>: The namespace of the pod.
  • -c mysql: The name of the container within the pod.
  • bash -c: Execute a shell command.
  • "mysql -u root -p <database_name> -e 'SELECT * FROM your_table'": This command will run the query against your database.

Accessing Databases Through Services

test/e2e/testing-manifests/statefulset/mysql-upgrade/service.yaml

apiVersion: v1
kind: Service
metadata:
  name: mysql
  labels:
    app: mysql
spec:
  ports:
  - name: mysql
    port: 3306
  clusterIP: None
  selector:
    app: mysql
---
apiVersion: v1
kind: Service
metadata:
  name: mysql-read
  labels:
    app: mysql
spec:
  ports:
  - name: mysql
    port: 3306
  selector:
    app: mysql
  type: LoadBalancer

You can use the service to access your database by using its service name in your queries. This is helpful when you have a StatefulSet with multiple pods and need to ensure you connect to the correct pod. You can then use this service name in your query, using kubectl exec to execute the query:

kubectl exec -it <pod-name> -n <namespace> -c <container-name> bash -c "mysql -h mysql -u root -p <database_name> -e 'SELECT * FROM your_table'"

This command will:

  • -h mysql: The name of the service.

Accessing Databases Through Persistent Volume Claims

test/e2e/testing-manifests/statefulset/mysql-upgrade/configmap.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: mysql
  labels:
    app: mysql
data:
  master.cnf: |
    [mysqld]
    log-bin
  slave.cnf: |
    [mysqld]
    super-read-only

You can access the database directly using the persistent volume claim and accessing the database through the pod’s mounted volume.

kubectl exec -it <pod-name> -n <namespace> -c <container-name> bash -c "mysql -u root -p <database_name> -e 'SELECT * FROM your_table'"

You will need to know where the data is mounted within the pod.

Using Tools for Database Access

There are various tools that can simplify database access, like kubectl exec, and can be used for different types of databases:

  • SQL Clients: Clients like mysql and psql can be used to connect and query the database.
  • Database Management Tools: Tools like DataGrip or Dbeaver offer a visual interface for interacting with databases, including querying and data visualization.

Best Practices

  • Limit Permissions: Use least privilege principle for your database users.
  • Secure Access: Implement security measures like TLS/SSL for database connections.
  • Database Backups: Regularly back up your database to prevent data loss.
  • Monitoring: Monitor your database performance and resource usage.
  • Security Patches: Apply security patches regularly to address vulnerabilities.
  • Use Kubernetes Secrets: Store database credentials in Kubernetes secrets.

This information provides a high-level overview of accessing and querying databases within Kubernetes. Choose the method that best suits your needs and follow best practices to ensure the security and performance of your database.