A data warehouse is a central repository for a business that integrates data from a huge number of sources and stores its organized version. Data analysts require access to this organized data repository to study insights. Security of data covers three main areas in data warehouse applications: storage, transit, and access.
Under these stages of storage, transit and access, data shall be made secure, confidential, integrated, and available. Google cloud’s new security update for its data warehouse BigQuery promises novel data measures under these areas of protection.
There are four layers of BigQuery security scheme as visible in the Google Cloud’s official block diagram. These four layered areas are discussed as follows.
At the first layer, the “landing area layer” Google cloud aims to protect the data coming from various data sources at the ingestion stage. This data includes the already collected set of data (also called the batch data) stored in Google cloud, as well as the new data streams (also called streaming data) in Google Pub/Sub.
At this layer, BigQuery offers the protection measures of Identity and Access Management (IAM), Access Control Lists (ACLs), and encryption of data in transit and storage.
At the storage level, the “data warehouse layer”, Google Cloud happens to ensure safe storage of data and processing it for de-identification. De-identification of data is the process of making data unrelatable to its user identity and is a compliance under the California Consumer Protection Act (CCPA). This ensures user’s identity protection in case of a breach as well as keeps the user’s identity invisible from inside administrators.
While the data is de-identified at one pipeline, Google Cloud allows to re-identify data in a second and completely separate pipeline for some confidential use cases. These specific projects are accessible to a limited number of data analysts or warehouse administrators.
The “classification and data governance layer” performs encryption on data and classifies data against confidential and non-confidential parameters. Categorizing data in such perimeters allows isolation of confidential data and restricts its access to unrelated projects. This saves companies from the risk of data breach such as copying or transferring data.
The category of data (cofidential/non-confidential) along with the access information is recorded in the Data Catalog. The category is identified with tags called policy tags.
Finally, the fourth layer is the “security posture layer” that is designed to perform malice detection, warehouse monitoring and checking response of Google Cloud’s data warehouse.
One main feature of the security update is the useability of “Infrastructure as Code (IaC)” which is the management of infrastructure using the code. Customers find it easy to use IaC and monitor their current security compliance to track security KPIs.
More interestingly, experts noted that Google Cloud’s layered security design has been specifically modeled as a way that is clearly and easily understood by business administrators.
Google has ensured the reliable working of its data warehouse security scheme. The comprehensive layered-security is tested by the Google cybersecurity action team and a third party security organization. Earlier Google had been caught by a Pakistani ethical hacker Rafay Baloch who identified a security fault in its android OS in 2016.