How do I analyze my audit logs using Redshift Spectrum?

3 minute read
1

I want to use Amazon Redshift Spectrum to analyze my audit logs.

Short description

Before you use Redshift Spectrum, complete the following tasks:

1.    Turn on your audit logs.

Note: It might take some time for your audit logs to appear in your Amazon Simple Storage Service (Amazon S3) bucket.

2.    Create an AWS Identity and Access Management (IAM) role.

3.    Associate the IAM role to your Amazon Redshift cluster.

To query your audit logs in Redshift Spectrum, create external tables, and then configure them to point to a common folder (used by your files). Then, use the hidden $path column and regex function to create views that generate the rows for your analysis.

Resolution

To query your audit logs in Redshift Spectrum, follow these steps:

1.    Create an external schema:

create external schema s_audit_logs 
from data catalog 
database 'audit_logs' 
iam_role 'arn:aws:iam::your_account_number:role/role_name' create external database if not exists

Replace your_account_number to match your real account number. For role_name, specify the IAM role attached to your Amazon Redshift cluster.

2.    Create the external tables.

Note: In the following examples, replace bucket_name, your_account_id, and region with your bucket name, account ID, and AWS Region.

Create a user activity logs table:

create external table s_audit_logs.user_activity_log(
	logrecord varchar(max)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
	'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
	'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
	's3://bucket_name/logs/AWSLogs/your_account_id/redshift/region'

Create a connection log table:

CREATE EXTERNAL TABLE s_audit_logs.connections_log(
	event varchar(60), recordtime varchar(60),
	remotehost varchar(60), remoteport varchar(60),
	pid int, dbname varchar(60),
	username varchar(60), authmethod varchar(60),
	duration int, sslversion varchar(60),
	sslcipher varchar(150), mtu int,
	sslcompression varchar(70), sslexpansion varchar(70),
	iamauthguid varchar(50), application_name varchar(300))
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
	'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
	'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://bucket_name/logs/AWSLogs/your_account_id/redshift/region';

Create a user log table:

create external table s_audit_log.user_log(
	userid varchar(255),
	username varchar(500),
	oldusername varchar(500),
	usecreatedb varchar(50),
	usesuper varchar(50),
	usecatupd varchar(50),
	valuntil varchar(50),
	pid varchar(50),
	xid varchar(50),
	recordtime varchar(50))
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
	'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
	'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://bucket_name/logs/AWSLogs/your_account_id/redshift/region’;

3.    Create a local schema to view the audit logs:

create schema audit_logs_views

4.    To access the external tables, create views in a database with the WITH NO SCHEMA BINDING option:

CREATE VIEW audit_logs_views.v_connections_log AS
select *
FROM s_audit_logs.connections_log
WHERE "$path" like '%connectionlog%'
with no schema binding;

The returned files are restricted by the hidden $path column to match the connectionlog entries.

In the following example, the hidden $path column and regex function restrict the files that are returned for v_connections_log:

CREATE or REPLACE VIEW audit_logs_views.v_useractivitylog AS
SELECT    logrecord,
          substring(logrecord,2,24) as recordtime,
          replace(regexp_substr(logrecord,'db=[^" "]*'),'db=','') as db,
          replace(regexp_substr(logrecord,'user=[^" "]*'),'user=','') AS user,
          replace(regexp_substr(logrecord, 'pid=[^" "]*'),'pid=','') AS pid,
          replace(regexp_substr(logrecord, 'userid=[^" "]*'),'userid=','') AS userid,
          replace(regexp_substr(logrecord, 'xid=[^" "]*'),'xid=','') AS xid,
          replace(regexp_substr(logrecord, '][^*]*'),']','') AS query
   FROM s_audit_logs.user_activity_log
   WHERE "$path" like '%useractivitylog%'
   with no schema binding;

The returned files match the useractivitylog entries.

Note: There's a limitation that's related to the multi-row queries in user activity logs. It's a best practice to query the column log records directly.

Related information

Analyze database audit logs for security and compliance using Amazon Redshift Spectrum

STL_CONNECTION_LOG

AWS OFFICIAL
AWS OFFICIALUpdated 10 months ago
2 Comments

we can also create a user role table using below

create external table s_audit_log_test.user_log( userid varchar(255), username varchar(500), oldusername varchar(500), usecreatedb varchar(50), usesuper varchar(50), usecatupd varchar(50), valuntil varchar(50), pid varchar(50), xid varchar(50), recordtime varchar(50) ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://CCCC/audit/AWSLogs/1234566789/redshift/us-east-1';

select

  • from s_audit_log_test.user_log where "$path" like '%userlog%';
replied a year ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERATOR
replied a year ago