AWS RDS Mysql SlowQuery monitoring on Kibana using Logstash

Posted by in Other

AWS RDS is an amazing service, it provides an uptime of almost 100%, the only way you can bring it down is if you push a bad code or you “drop the database”. We have been trying to improve hackerrank performance by optimising lot of core parts of our application. But before any optimization can be done one needs lot of logging and analytics to figure out what exactly is slow. Well there are logs but they are not helpful unless parsed and analysed. MySql provides logging in two ways TABLE or FILE via log_output parameter. We earlier used TABLE output to analyse the slow queries but as our systems grew that was becoming difficult to monitor. So we decided to setup proper analytics slow query log on kibana. We looked around a lot if there is any service already written which can do the same. But only to be disappointed.
Then we decided to come up with our own solution for it.

Steps to setup:

1) Configuring RDS

You should go to RDS and change log_output in Parameter group for your database to FILE.

RDS writes log to a file “mysql-slowquery.log”. every hour it rotates the log, it rotates by renaming the file to ‘mysql-slowquery.log.’
We are going to read that file every hour and dump it to our elastic search.

2) Getting logs from RDS

You will also need rds-cli tools on the system from where you are going to run this.

You can check the what all logs are available for your db using

rds-describe-db-log-files <db identifier>

It will show list of log files available for past 24 hours

slowquery $ rds-describe-db-log-files hackerrank
DBLOGFILES  error/mysql-error.log             Wed Nov 26 15:35:00 IST 2014  0
DBLOGFILES  slowquery/mysql-slowquery.log     Wed Nov 26 15:35:36 IST 2014  98396
DBLOGFILES  slowquery/mysql-slowquery.log.0   Wed Nov 26 05:29:47 IST 2014  2193813
DBLOGFILES  slowquery/mysql-slowquery.log.1   Wed Nov 26 06:29:41 IST 2014  2206979
DBLOGFILES  slowquery/mysql-slowquery.log.10  Wed Nov 26 15:29:58 IST 2014  1945483
DBLOGFILES  slowquery/mysql-slowquery.log.11  Tue Nov 25 16:29:58 IST 2014  2811275
DBLOGFILES  slowquery/mysql-slowquery.log.12  Tue Nov 25 17:29:42 IST 2014  2045696
DBLOGFILES  slowquery/mysql-slowquery.log.13  Tue Nov 25 18:29:59 IST 2014  1537427
DBLOGFILES  slowquery/mysql-slowquery.log.14  Tue Nov 25 19:29:54 IST 2014  1664541
DBLOGFILES  slowquery/mysql-slowquery.log.15  Tue Nov 25 20:34:59 IST 2014  2600035
DBLOGFILES  slowquery/mysql-slowquery.log.16  Tue Nov 25 21:29:58 IST 2014  2307096
DBLOGFILES  slowquery/mysql-slowquery.log.17  Tue Nov 25 22:29:59 IST 2014  3030610
DBLOGFILES  slowquery/mysql-slowquery.log.18  Tue Nov 25 23:29:59 IST 2014  5128621
DBLOGFILES  slowquery/mysql-slowquery.log.19  Wed Nov 26 00:29:59 IST 2014  5376034
DBLOGFILES  slowquery/mysql-slowquery.log.2   Wed Nov 26 07:34:58 IST 2014  1792204
DBLOGFILES  slowquery/mysql-slowquery.log.20  Wed Nov 26 01:29:59 IST 2014  5381321
DBLOGFILES  slowquery/mysql-slowquery.log.21  Wed Nov 26 02:29:59 IST 2014  10707608
DBLOGFILES  slowquery/mysql-slowquery.log.22  Wed Nov 26 03:29:55 IST 2014  4671459
DBLOGFILES  slowquery/mysql-slowquery.log.23  Wed Nov 26 04:29:59 IST 2014  4835133
DBLOGFILES  slowquery/mysql-slowquery.log.3   Wed Nov 26 08:29:56 IST 2014  953020
DBLOGFILES  slowquery/mysql-slowquery.log.4   Wed Nov 26 09:29:59 IST 2014  3713110
DBLOGFILES  slowquery/mysql-slowquery.log.5   Wed Nov 26 10:29:57 IST 2014  3002565
DBLOGFILES  slowquery/mysql-slowquery.log.6   Wed Nov 26 11:29:56 IST 2014  2063385
DBLOGFILES  slowquery/mysql-slowquery.log.7   Wed Nov 26 12:29:57 IST 2014  1903671
DBLOGFILES  slowquery/mysql-slowquery.log.8   Wed Nov 26 13:34:59 IST 2014  1573170
DBLOGFILES  slowquery/mysql-slowquery.log.9   Wed Nov 26 14:29:47 IST 2014  1501473

and to download any of the log file from above list simply use

rds-download-db-logfile <db identifier> --log-file-name <log file name>

3) Generalising the queries for better analytics.

Thanks to Nathanial Hendler for that beautiful query generalisation parser http://www.percona.com/blog/files/utils/mysql_slow_log_parser . What that script does is if you have a query.

SELECT id FROM table WHERE a = 'something';
SELECT id FROM table WHERE a = 'something2';
SELECT id FROM table WHERE a = 'some3'

It will generalise the query by replacing the parameter with XXX.

SELECT id FROM table WHERE a = 'XXX'

This is really helpful when we wanted to identify the pattern and focus on the pattern which is the most in number as well as consuming the most time. The parser was good but we modified it a bit to put more information in it, like User, Host, Database name. Also we needed the output in a single line for our logstash to parse it. Well Logstash can read multi line as well but thats a complicated configuration so avoided it. 😀

You can get the modified parser from here:
https://gist.github.com/theprogrammerin/e3206a4ec7a7a4086ac2

Our modified parser reads the slow query log like this:

# Time: 141119  7:33:09
# User@Host: master[master] @  [10.10.1.1]  Id: 1749090
# Query_time: 0.008349  Lock_time: 0.000033 Rows_sent: 1  Rows_examined: 27467
SET timestamp=1416382389;
SELECT `questions`.`id` FROM `questions` INNER JOIN `question_mappings` ON `questions`.`id` = `question_mappings`.`question_id` WHERE `question_mappings`.`team_id` = 211;
/*request_uid:pk3gjacs,application:HackerRank,controller:submission,action:show,line:/app/models/file.rb:359:in some_method'*/;
# Time: 141119  7:33:09
# User@Host: master[master] @  [10.10.1.1]  Id: 1749090
# Query_time: 0.008349  Lock_time: 0.000033 Rows_sent: 1  Rows_examined: 27467
SET timestamp=1416382389;
SELECT `questions`.`id` FROM `questions` INNER JOIN `question_mappings` ON `questions`.`id` = `question_mappings`.`question_id` WHERE `question_mappings`.`team_id` = 211;
/*request_uid:pk3gjacs,application:HackerRank,controller:submission,action:show,line:/app/models/file.rb:359:in some_method'*/;

You can run the command using DATABASE=hackerrank ./mysql_slow_log_parser test.log
which will parse the above log to following format:

Database: hackerrank | User: master[master] | Host: 10.10.1.1 | NoQuery: 1 | TotalTime: 0.008349 | AverageTime: 0.008349 | TimeTaken: 0.008349  seconds to complete  | RowsAnalyzed: 27467 | Pattern: SET timestamp=XXX; SELECT `questions`.`id` FROM `questions` INNER JOIN `question_mappings` ON `questions`.`id` = `question_mappings`.`question_id` WHERE `question_mappings`.`team_id` = XXX; /*request_uid:pkXXXgjacs,application:HackerRank,controller:submission,action:show,line:/app/models/file.rb:XXX:in some_method'*/;  | Orignal: SET timestamp=1416382389; SELECT `questions`.`id` FROM `questions` INNER JOIN `question_mappings` ON `questions`.`id` = `question_mappings`.`question_id` WHERE `question_mappings`.`team_id` = 211; /*request_uid:pk3gjacs,application:HackerRank,controller:submission,action:show,line:/app/models/file.rb:359:in some_method'*/;

4) Pushing to Kibana/ElasticSearch

This is the last step for getting those analytics. This is fairly simple process, using logstash we will parse the above log and push the data to kibana. We used this config file to parse the above generalised log https://gist.github.com/theprogrammerin/034a3efd849112d166ea

There is one issue with slow query log, that is that they do not have timestamp when the query was executed, but thanks to Active Record in ROR which adds a timestamp before every query. So our logstash config that timestamp from the query and make entry with that timestamp.

We also use marginalia which appends some useful information for our app like, the name of controller, method invoked, and the line from where the query came in. Also an unique request uid. Thorugh this we can map the query to the web request.

We use the same elastic search instance to push our all logs webserver, slowquery, etc and namespace it using the namespace key.

5) Automating all of the above

For automation purpose we have this bash script https://gist.github.com/theprogrammerin/b82f368414024315e07b

To invoke you need to simply call

./slowquery_logger.sh <dbname>

What this does is fetch the log file that hour (the hour on which script was executed). Generalise it using mysql_slow_log_parser and then finaly push it to elastic search. We run this script on 15th minute of every hour.

If you are impatient and can not afford to wait an hour for the logs, then for you RDS also provides live stream of log through rds-watch-db-logfile which can update the logs in realtime. You just need to pipe the output of this to mysql_slow_log_parser and in turn pipe it to logstash and then you will getting live feed of your logs.

When I started on it I had a fairly simple idea of pulling things from the mysql table log and push it to kibana, but sp2hari came up with this idea of generalisation and now I doubt if that idea was really that simple or would have actually worked.