Author: Michael Blood
Bulk MX and DNS lookup enhanced
Bulk MX and DNS lookup enhanced
As our bulk MX and DNS tool has become more popular, Google has sent many people to our site looking for all sorts of DNS lookup tools. It seems that I was not the only one that needed a simple way to quickly gather DNS information on multiple domains at once
Many people have been able to quickly find all of the information they need about a domain (A, MX, NS and WHOIS records) and they move on. It turns out that even though the tool had the ability to lookup records for multiple domains at once, it is also useful for getting a quick DNS overview of a single domain. Web professionals often bookmark the tool and come back often.
The tool evolves as we get special requests, so my request to you is: let me know if I can add some functionality to the tool for you.
Several new features have been added in the last month.
- A premium option allows you to buy credits to remove usage limits
- Whois lookups are turned off on larger requests in order to avoid abuse restrictions
- Large scale lookups can be started by uploading a text file with one domain per line.
- For each file upload a background process builds a downloadable CSV file listing one dns record result per lin
- A row for each www and @ A record with the ip address
- One row for each MX record
- One row for each NS record
- One row displaying the ip addresses for each MX record.
- As the process builds the records file in the background , you can refresh the page withe your premium code in it and watch the progress.
Future upgrades may include
- The ability to choose to only lookup certain record types (A, MX or NS)
- Multithreading options to speed up background lookups.
- They ability to choose the download format of the DNS results list.
If you find the tool useful or you would like a certain type of enhancement, please leave a comment or send us a comment.
If you happen to be a techie that has to look at log files with timestamps, take a look at the tool that bulk converts timestamps to readable dates. You can just cut an paste a big block of text from you log and click
Postfix Configuration: The argument against reject_unverified_sender
Postfix Configuration: The argument against reject_unverified_sender
When configuring a mail server there techniques available which can help you to reduce spam.
A popular spam reduction technique is to ‘greylist’ emails. This method temporarily rejects an email if the server has never seen a prior attempt to send the same combination of From Email, To Email and From IP address , legitimate emails are tried again after a few moments and the email goes through no problem.
Another option in the postfix system which can be used to reduce spam is the ‘reject_unverified_sender’ option. As postfix confirming that the ‘Sender’ is a valid user, a connection is made to the server associated with the Senders domain (by MX record). It goes through the email sending process far enough to find out if the server accepts or rejects the Sender email (RCPT TO: sender@email.com).
While it seems like a good idea to confirm that the email address that is sending email to us is valid, if the sender’s server has greylisting on their domain they would reject the ‘verification’ connection, which would then ‘reject’ the attempted message.
For this reason, we choose not to globally install postfix servers with reject_unverified_sender.
————————–
There is some argument though that this does not cause a permanent problem, because when the reject_unverified_sender_code is set to 450. Because the rejection of the email will only be temporary and when the email is attempted again, the sender verification should pass the grey listing.
However, this is not good enough for me because there are other reasons that the sender verification could fail. Such as the fact that the server does not accept the MAIL FROM email from the verifying server. This could be because the doublebounce@ email address used for verification is not accepted by the server for other reasons such as the fact that THEY may be doing some sender verification, which would fail when an email is attempted to doublebounce, additionally the verifying server could end up getting a bad ‘reputation’ by only checking to see if email addresses are deliverable, but then never actually delivering a mail message.
For these reasons, I recommend skipping this setting and perhaps only using the reject_unknown_sender_domain which just checks whether there is a valid MX record for a domain.
While postfix is powerful with a plethora of options, not all of them immediately helpful.
Ubuntu – Base Secure Apache
Ubuntu – Base Secure Apache
In order to install a server that is able to pass the many SSL problems out there you can not install the default servers.
apt-get install make gcc
Install the latest open ssl from the openssl site first.
– download it to a directory
extract , config and install
then install apache2
Quiz App For Dentist Offices Celebrity Smiles
Matraex Dental has developed a customizable quiz app for dentist offices Celebrity Smiles.
Our latest web app for dentist offices has several features to assist in up selling your dental services. Your patients can test their skills in recognizing the smiles of popular male or female celebrities. This drives the marketing of the “celebrity smile” for the patient. In this case, the button is foremost in the user interface.
I WANT A CELEBRITY SMILE
“I Want A Celebrity Smile” is a prominently labeled button in the web app and can be customized to your needs. The text of the button can be altered as well as the target landing page of the button. Do you want to promote your Dental Implants? Cosmetic Teeth Whitening? Crowns and Bridges?
Custom Logo Branding
The content and branding will be completely customized for your dentist office. Here is an example customized for the Norwalk Dental Center. Your brand logo can be integrated throughout the web app as well.
Interested in obtaining a custom version for your practice? Get a free quote today!
Custom Web App For Dentist Office Waiting Rooms
A new solution for a custom web app for dentist office waiting rooms is our Origins of Dentistry web app. This web app has several key features:
- Rich media content (high resolution photos and graphics)
- Easy user interface (Intuitive controls)
- Fast loading times (No waiting for the next page)
- Intermittent advertisements custom to your dentist office (You control how frequent your custom ads display)
- Branding for your dentist office (Your logo and office information integrated into the web app)
Are your patients tired of the same old magazines in your waiting room? Is the daytime television showing advertisements for other businesses?
With the Origins of Dentistry, your waiting room will come alive as patients begin to discuss the interesting and sometimes comical history of dentistry.
This customizable web app can be integrated into your custom dental website and is mobile-friendly. Your custom Origin of Dentistry web app is compatible with all web accessible devices like iPhones, Androids, Blackberrys, or any tablets and all web browsers.
The Origins of Dentistry historical eras go back as far as ancient Egypt and Sumeria when historians refer to the first accepted dentist, Hesy-Re, the “greatest of those who deal with teeth”.
Another historical feature takes place in early American history when Paul Revere was the first documented physician to use Forensic Dentistry to identify his friend and former patient, Dr. Joseph Warren.
Intermittent advertisements for your dental services can be set to display between views of the different historical events. Do you want to showcase your dental implant services? Prompt to remind patients to make appointments before they leave? Announce a company event or even introduce your staff? These options are easily integrated into the web app.
This custom web app for dentist office waiting rooms has many features and options to add value to your dental business.
Preg_Match Visual Composer PHP Error
We started getting a preg_match Visual Composer PHP error that was: Warning: preg_match() expects parameter 2 to be string, array given in /wp-content/themes/bridge/vc_templates/vc_empty_space.php on line 12.
I was able to determine that when using the empty space feature for managing site contents in the design, if you saved “px” in the pixel value for the size of the empty space, WordPress results would throw this error. For example, if I saved 16px for the size, we would get this error message. However, if I went back in and just saved the value as 16 the system would work as expected.
This is only a slight fix, I have discovered that in a recent update to Visual Composer, the issue was resolved. So be sure you have updated to (at this time) v4.8
Here is a quick link to the Visual Composer plugin.
Recovering / Resyncing a distributed DRBD dual primary Split Brain – [servera] has a different data from [serverb]
Recovering / Resyncing a distributed DRBD dual primary Split Brain – [servera] has a different data from [serverb]
A client had a pair of servers running drbd in order to keep a large file system syncronized and highly available. However at some point in time the drbd failed and the two servers got out of sync and it went unnoticed for long enough, that new files were written on both ‘servera’ and on ‘serverb’.
At this point both servers believe that they are the primary, and the servers are running in what you call a ‘Split Brain’
To determine that split brain has happened you can run several commands. In our scenario we have two servers servera and serverb
servera#drbd-overview 0:r0/0 WFConnection Primary/Unknown UpToDate/DUnknown C r----- /data ocfs2 1.8T 1001G 799G 56% serverb#drbd-overview 0:r0/0 StandAlone Primary/Unknown UpToDate/DUnknown r----- /data ocfs2 1.8T 1.1T 757G 58%
From the output above (color added) we can see that servera knows that it is in StandAlone mode, the server realizes that it can not connect. We can research the logs and we can find out why it things it is in StandAlone d. To do this we grep the syslog.
serverb#grep split /var/log/syslog Nov 2 10:15:26 serverb kernel: [41853948.860147] block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0 Nov 2 10:15:26 serverb kernel: [41853948.862910] block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0 exit code 0 (0x0) Nov 2 10:15:26 serverb kernel: [41853948.862934] block drbd0: Split-Brain detected but unresolved, dropping connection! Nov 2 10:15:26 serverb kernel: [41853948.862950] block drbd0: helper command: /sbin/drbdadm split-brain minor-0 Nov 2 10:15:26 serverb kernel: [41853948.865829] block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0)
This set of log entries lets us know that when serverb attempted to connect to servera, it detected a situation where both file systems had been written to, so it could no longer synchronize. it made these entries and put itself into Standalone mode.
servera on the other hand says that it is waiting for a connection WFConnection.
The next step is to determine which of the two servers has the ‘master’ set of data. This set of data will sync OVER THE TOP of the other server.
In our client’s case we had to do some investigation in order to determine what differences there were on the two servers.
After some discovery we realized that in our case serverb had the most up to date information, except in the case of one directory, we simply copied that data from servera to serverb, and then serverb was ready to become our primary. In the terminology of DRBD, servera is our ‘split-brain victim’ and serverb is our ‘splitbrain survivor’ we will need to run a set of commands which
- ensures the status of the victim to ‘Standalone’ (currently it is ‘WFConnection’)
- umount the drive on the victim(servera) so that the filesystem is no longer accessible
- sets the victim to be ‘secondary’ server, this will allow us to sync from the survivor to victim KNOWING the direction the data will go.
- start the victim (servera) and let the let the ‘split brain detector’ know that it is okay to overwrite the data on the victim(servera) with the data on the survivor (serverb)
- start the survivor(serverb) (if the serverb server was in WFConnection mode, it would not need to be started, however ours was in StandAlone mode so it will need to be restarted)
At first we were concerned that we would have to resync 1.2 TB of data, however we read here that
The split brain victim is not subjected to a full device synchronization. Instead, it has its local modifications rolled back, and any modifications made on the split brain survivor propagate to the victim.
The client runs a dual primary, however as we rebuild the synced pair, we need to ensure that the ‘victim’ is rebuilt from the survivor, so we move the victim from a primary, to a secondary. And it seems that we are unable to mount a drive (using our ocfs2 filesystem) while it is a secondary. So we had to ‘umount’ the drive, and we were unable to remount it while it is a secondary. In a future test (in which restoring data redundancy primary / primary is less critical), we will find out whether we are able to keep the primary/primary status while we are rebuilding from a split brain.
While the drbd-overview tool shows all of the ‘resources’ we are required to use a third parameter specifying the ‘resource’ to operate on . If you have more than one drbd resource defined you will need to identify which resource you are working with. You can look in your /etc/drbd.conf file or in your /etc/drbd.d/disk.res (your file may be named differently). The file has the form of
resource r0 {
....................
}
where r0 is your resource name, you can also see this buried in your output of drbd-overview
servera# drbd-overview
0:r0/0 WFConnection Primary/Unknown UpToDate/DUnknown C r----- /data ocfs2 1.8T 1001G 799G 56%
So we ran the following commands on servera to prepare it as the victim
servera# drbd-overview #check the starting status of the victim 0:r0/0 WFConnection Primary/Unknown UpToDate/DUnknown C r----- /data ocfs2 1.8T 1001G 799G 56% serverb# drbd-overview #check the starting status of the survivor 0:r0/0 StandAlone Primary/Unknown UpToDate/DUnknown r----- /data ocfs2 1.8T 1.1T 760G 58%
From this above we can see that serverb has 58% usage and 760GB free, were server a has 56% usage and 799GB free.
Based on what I know about the difference between servera and serverb, this helps me to confirm that serverb has more data and is the ‘survivor’
servera# drbdadm disconnect r0 # 1. ensures the victim is standalone servera# drbd-overview #confirm it is now StandAlone 0:r0/0 StandAlone Primary/Unknown UpToDate/DUnknown r----- /data ocfs2 1.8T 1001G 799G 56% servera# umount /data # 2. we can not mount the secondary drive with read write servera# drbdadm secondary r0 # 3. ensures the victim is the secondary servera# drbd-overview #confirm it is now secondary 0:r0/0 StandAlone Secondary/Unknown UpToDate/DUnknown r----- servera# drbdadm connect --discard-my-data r0 # 4. start / connect the victim up again knowing that its data should be overwritten with a primary servera# drbd-overview #confirm the status and that it it is now connected [WFConnection] 0:r0/0 WFConnection Secondary/Unknown UpToDate/DUnknown C r-----
I also checked the logs to confirm the status change
servera#grep drbd /var/log/syslog|tail -4 Nov 4 05:14:03 servera kernel: [278068.555213] drbd r0: conn( StandAlone -> Unconnected ) Nov 4 05:14:03 servera kernel: [278068.555247] drbd r0: Starting receiver thread (from drbd_w_r0 [19105]) Nov 4 05:14:03 servera kernel: [278068.555331] drbd r0: receiver (re)started Nov 4 05:14:03 servera kernel: [278068.555364] drbd r0: conn( Unconnected -> WFConnection )
Next we simply have to run this command on serverb to let it know that it can connect as the survivor (like I mentioned above, if the survivor was in WFConnection mode, it would automatically reconnect, however we were in StandAlone mode)
serverb# drbd-overview #check one more time that serverb is not yet connected
0:r0/0 StandAlone Primary/Unknown UpToDate/DUnknown r----- /data ocfs2 1.8T 1.1T 760G 58%
serverb# drbdadm connect r0 # 5. start the surviving server to ensure that it reconnects
serverb# drbd-overview #confirm serverb and servera are communicating again
0:r0/0 SyncSource Primary/Secondary UpToDate/Inconsistent C r----- /data ocfs2 1.8T 1.1T 760G 58%
[>....................] sync'ed: 0.1% (477832/478292)M
servera# drbd-overview #check that servera confirms what serverb says about communicating again
0:r0/0 SyncTarget Secondary/Primary Inconsistent/UpToDate C r-----
[>....................] sync'ed: 0.3% (477236/478292)M
Another way to confirm that the resync started happening is to check the logs
servera# grep drbd /var/log/syslog|grep resync Nov 4 05:18:09 servera kernel: [278314.571951] block drbd0: Began resync as SyncTarget (will sync 489771348 KB [122442837 bits set]). serverb# grep drbd /var/log/syslog|grep resync Nov 4 05:18:09 serverb kernel: [42008909.652451] block drbd0: Began resync as SyncSource (will sync 489771348 KB [122442837 bits set]).
Finally, we simply run a command to promote servera to be a primary again, and then both servers will be writable
servera#drbdadm primary r0 servera# drbd-overview 0:r0/0 Connected Primary/Primary UpToDate/UpToDate C r----- servera# mount /data #remount the data drive we unmounted previously
Now that we ‘started’ recovering from the split-brain issue we just have to watch the two servers to confirm once they have fully recovered. once that is complete we will put in place log watchers and FileSystem tests to send out a notification to the system administrator if it should happen again.
Matraex – Halloween Costume Contest – 2015
Matraex – Halloween Costume Contest – 2015
The crew here at Matraex had a costume contest at lunch today- here is the result.
We all voted and the Winner was – Fat Jeremy by a landslide.
There was a first second and participation prizes for the rest of us.
Debugging Multiple MySQL query in a stored procedure – unusually long runtime
Debugging Multiple MySQL query in a stored procedure – unusually long runtime
I recently was asked to look into an issue where a client had recently moved a mysql database from one server to another server, and the run time of a stored procedure (which internally ran 2000+ queries ) was running 90 seconds, where it had previously only taken 0.6 seconds.
What follows is an ongoing journal of the debugging, trouble shooting and discovery as we walked through the process of addressing issues. Along the way we had lots of small wins, and at the end we had one major win which basically allowed us to undo all of the other changes.
The result is a documented discussion of many different methods to trouble shoot issues, as well as some solutions to different issues. I find these type of blogs to be useful to me in the future as they help me to shortcut addressing different issues where I don’t have to come up similar troubleshooting steps again.
Here is a list of commands I ran that help me evaluate the overall server
- top – to determine whether there was significant load and adequate memory free (in this case there was a load average hovering between .2 and 1 and there was adequate memory)
- mysqladmin variables|grep datadir – the location of the MySQL datadirectory
- mysqladmin variables|grep storage – to find out what storage engine is being used (innodb or myisam)
- mysqladmin variables|grep binlog – to find out what synchronization or replication the server might be doing
- df -h – to determine what disks are mounted and used to store you mysql instance
- cat /proc/mdstat – to find out if there are any software raid instances
- du -h /var/lib/mysql – to show the total size of the databases
- ls -lt /var/lib/mysql/*/* – to find out the size of tables and which tables and logs are update most often (in my case there were no large tables being updates during the last few days)
- echo ‘desc tblname’|mysql databasename – to ensure there is an index on the column I am using to update with
- dd if=/dev/zero of=/tmp/test bs=1048576 count=2048 – to check the write speed (I have had to run this command multiple times before in order to get accurate disk write speeds because of File System caching)
- dd if=test of=/dev/null bs=1048576 – to check the read speed
- iostat -x 1 – to watch the disc write speed and usage (run this in the other window to watch dd run) -x gives more table based output
- echo “show create procedure procedurename” |mysql databasename – to evaluate any stored procedures that are causing the issues
- wc -l /var/lib/mysql/mysql-bin.* – find out how big the bin logs are (if any) to help determine what kind of mysql overhead is required to keep the bin logs up to date.
I created two connections to the server so I could evaluate what was happening as I executed the 180 second run time of the query.
- Based on the output of the dd commands the write speed on the /var/lib/mysql folder is about double (60+ MB/s) than thta of the /home dir. They are separate partitions however we are not doing any software raid, so I assume some sort of a Hardware Raid with striping.
I additionally started to dig into the stored procedure that started having the issue, adding debugging statements, commenting out capabilities to evaluate the time each takes.
- As I ran the stored procedure, I also ran wc -l /var/lib/mysql/mysql-bin.000current and found that the binlog was growing rapidly during the running of the stored procedure. It turns out that the binlog file was only growing by less than 100 K during the time that 100s of queries were running,
- Next I evaluated the contents of the binlogs and it turns out that it is actually adding 1000s of lines to the bin logs, multiple every time that one of the ‘update’ queries inside of the stored procedure ran.
Now I recognize that somehow, inside of the stored procedure, the system is syncing the results of an update to the disk before it is done.
- The server it am running on is 5.5.xx so there are not any advanced binlog write delay features like in 5.7 so I had to come up with another solution.
By using a Start Transaction and Commit, I could avoid writing / committing the transaction to disk until the very end of stored procedure, so I added this to the begging and end.
This was IT! by using transactions I avoided having to evaluate and reconfigure the server (which is running a master slave configuration for backup) and I was able to reduce SP run times from
180 seconds to sub seconds.
One important thing to note about the amount of time that it takes to execute – Each update statement that it runs (more than 2000) do not actually end up making any changes to the database. Perhaps this is much easier for MySQL to write to the binlog after the transaction is complete, because the server has the ability to say that the NET CHANGE is …. nothing.
Digging Deeper
It turns out that even though I was able to find a way to make the multiple sql statements go faster on the new server. We still have other uses of the same SQL Statements which require a faster run time. (we are talking about an execution time of a single query which takes .15 seconds on the new server and .01 seconds on the old server. Some additional commands that I ran (on both the old and new servers) in order to figure out differences and why it may be running a bit slower on the new server.
- tune2fs -l /dev/mapper/DEVICE |grep -i block
- mysql.databasename> stop slave; – tried this temporarily to determine whether the running replication is causing a delay – this did not stop the writes to the bin log.
- mysqladmin variables|grep innodb_buffer_pool_size – confirms that the buffer is big enough to fit the entire set of tables into the server memory
- mysql.databasename> SET sql_log_bin = 0; – this stops the current connection from logging to the bin log – this can help to evaluate the amount of time that it takes to run the update event when the bin file is not being updated
- mysql.databasename> SET sql_log_bin = 1; — reset this to 1 so that future updates are bin logged.
I noticed with this testing that the sql_log_bin = 0 reduced the update time by 50%! I have two possible assumptions from this, which may note be mutually exclusive
- That the disk seek and write speed is slower on the new server to the point that the additional work that has to be done by writing to the bin file is slowing the server down.
- That the server is simply computationally slower and CPU work is simply slower, so this additional work takes longer.
Related to the sql_log_bin is a varialbe I found which seems to make a difference (from a run time of 0.15 second to 0.10) is sync_binlog. I updated this to set it to 0 and this apparently makes it so that MySQL does not have to wait until the binlog is synced to the file system before it returns.
Some notes about what I feel comfortable ruling out:
- There is adequate memory, and memory is not being swapped around by this very mild update. Infact, I can run the exact same update statement a dozen times, to where there is actually NO change happening to the database.
- When an update is made to the database, no change is being immediately pushed to a slave. the only change that happens is a write to the binlog file
So since no change is happening to the database, i wonder, what is causing the overhead? Why would turning off the sql_log_bin (0) help anything if we are not actually logging a change?
I decided to look into the binlog to see what was being tracked, even though there is no real need to do anything on the slave server.
- mysqlbinlog mysql-bin.0000CURRENT|grep MYTABLENAME
The result showed me a plethora of updates to the same table , even though the change did not make a difference to the underlying database. With more research I found that binlog_format accepts either ‘STATEMENT’ or ‘ROW’. It seems that if you use ‘ROW’ the actual database changes are somehow appended to the log file, instead of the statements which generate the change.
- mysql> SET binlog_format = ‘ROW’;
- mysql> SET binlog_format = ‘STATEMENT’;
I immediately noticed a change in the amount of time it took to run the statements. However the time it took is smaller ONLY when there is NOT an update to the database.
To dig deeper here and find out which of the two methods is better, we will need to do a more indepth analysis of how many of the updates actually manipulate the database, and how many updates do not. At this point, I am not yet comfortable changing the binlog method from statement to row permanently because I am not comfortable that the log files will be properly replicated. I would want to be able to have a more robust test setup where I am not only evaluating a single statement, but rather a group of them so that I could determine what method of updating is more optimal.
Version Differences
As we look into the difference between the update statement on the new server and the old server, we can quickly tell that there is an issue which how quickly things are writing to the disk, when we remove some of the features which write to the disk, we quickly find out that the queries operate more quickly. To break this down to a something we can verify, we stopped all other services which might write to the disk and we ran on a second screen
- iostat 1 – this shows the statistics of how many writes are happening within one second
We found that there were no writes until we ran an update statement
- update TABLENAME set column=’xxxx’ where id = 10;
This updates a single rows, and we were able to see the number of bytes that it wrote to the disk. We could run the query multiple times and we could see it write to the disk multiple times. However, when we run the same test on the original server, it would write to the disk the first time the update statement ran, but not the second, or subsequent time. Logically this makes sense, the row / column does not need to be updated the second time because the column already = ‘xxxx’ for id = 10.
With this in mind, we have a test that I came up with we can do to confirm that our different servers behave differently to the same query. Since we are running innodb we can check to see some stats about how many times innodb has written something to the disk. and we can check this to see whether it increments as we make updates
Old Server – MySQL 5.0
- echo “show engine innodb status”|mysql databasename|sed -e’s/\\n/\n/g’|grep ‘OS file’
29744505 OS file reads, 10036031 OS file writes, 6213811 OS fsyncs - [root@auth1 ~]# echo ” update TABLENAME set column=’xxxx‘ where id = 10;”|mysql databasename
- [root@auth1 ~]# echo “show engine innodb status”|mysql databasename|sed -e’s/\\n/\n/g’|grep ‘OS file’
29744505 OS file reads, 10036033 OS file writes, 6213813 OS fsyncs - [root@auth1 ~]# echo ” update TABLENAME set column=’xxxx‘ where id = 10;”|mysql databasename
- [root@auth1 ~]# echo “show engine innodb status”|mysql databasename|sed -e’s/\\n/\n/g’|grep ‘OS file’
29744505 OS file reads, 10036033 OS file writes, 6213813 OS fsyncs - [root@auth1 ~]# echo ” update TABLENAME set column=’1234‘ where id = 10;”|mysql databasename
- [root@auth1 ~]# echo “show engine innodb status”|mysql databasename|sed -e’s/\\n/\n/g’|grep ‘OS file’
29744505 OS file reads, 10036035 OS file writes, 6213815 OS fsyncs
Note that the first time that value is set the number of writes to the OS increments, but the SECOND time it does not, because there is no actual update to the data
However, If I run the exact same set of updates on the new server, the number of file writes increments EACH time that an update statement is run, regardless of whether the update statement represents a change in the data
New Server – MySQL 5.5
- echo “show engine innodb status”|mysql databasename|sed -e’s/\\n/\n/g’|grep ‘OS file’
28371 OS file reads, 27188 OS file writes, 9678 OS fsyncs - [root@auth1 ~]# echo ” update TABLENAME set column=’xxxx‘ where id = 10;”|mysql databasename
- [root@auth1 ~]# echo “show engine innodb status”|mysql databasename|sed -e’s/\\n/\n/g’|grep ‘OS file’
28371 OS file reads, 27190 OS file writes, 9680 OS fsyncs - [root@auth1 ~]# echo ” update TABLENAME set column=’xxxx‘ where id = 10;”|mysql databasename
- [root@auth1 ~]# echo “show engine innodb status”|mysql databasename|sed -e’s/\\n/\n/g’|grep ‘OS file’
28371 OS file reads, 27192 OS file writes, 9682 OS fsyncs - [root@auth1 ~]# echo ” update TABLENAME set column=’1234‘ where id = 10;”|mysql databasename
- [root@auth1 ~]# echo “show engine innodb status”|mysql databasename|sed -e’s/\\n/\n/g’|grep ‘OS file’
28371 OS file reads, 27194 OS file writes, 9684 OS fsyncs
Basically this means that unless we can find out why the update is writing each time, we are not goig to be able to reduce the query time on the new server. It seems like this issue is either
- Built into the new version innodb engine
- Built into the new version of MySQL
- Either way, it could be an option configurable via my.cnf
As an aside this means that we could potentially write sql which DOES NOT select a row if it does not need to update it.Forexample, note that the following SQL statement has somewhat of a redundancy in it, but it results in NOT touching a row in the way that we are seeing the row be touched in the New Server
- update TABLENAME set column=’xxxx‘ where id = 10 and column <> ‘xxxx’
I started looking into a couple of innodb variables to try to find some way of addressing this. This article seemed to help and led me to the ultimate solution – 8.5.8 Optimizing InnoDB Disk I/O
The article suggested innodb_flush_method
parameter to O_DSYNC
. I updated this and immediately my queries that were running at 50 MS went down to 20 MS, consistanly. In thinking about this, we potentially have a small possibility of data loss between the time that the innodb receives the update and when it is flushed to disk. In our case with the type of data we are tracking, that is acceptable. Elsewhere on the page, I found another variable which seemed to go one more step. innodb_flush_log_at_trx_commit. This setting intuitively explains that the log must be flushed to the drive each time that the transaction is commited. The option did not even exist in 5.0 so I am unsure of what the default behavior was, however when setting innodb_flush_log_at_trx_commit to 0, I was able to get the query time down to 0.00.
Basically this means that, after an update query is run, innodb is allowed to return immediately before the change is flushed and stored to disk. This addresses our concerns completely.
Run this command to see mysql variables that control how things are flushed.
- mysqladmin variables|grep flush
So drastic was the change I made using the innodb_flush_log_at_trx_commit variable set to 0, I decided to undo some of the other testing I had done, to see how performance was without them.
- When I restore innodb_flush_method to default (NULL), the time to update stayed at 0.00
- The OS file writes DO NOT always increment immediately, however they will increment after a second or two. I could confirm this by running a statement to look at the number of srv_master_thread log flush and writes: in the innodb status, immediately before and immediately after an update statement and they would be the same, however if I wait a couple of seconds, they would increment.
- echo “show engine innodb status”|mysql radius|sed -e’s/\\n/\n/g’|grep srv_master_thread
- I updated restored sync_binlog from 0 back to 1 in the /etc/my.cnf. to determine whether syncing the binlog makes a difference. This is where I ran into a bit of discrepancy. I have actualy been testing with two separate servers this whole time, one is under load, and one is not under load. until now everything has been the same, however changing the sync_binlog variable affects the performance differently on the two servers.
- On the server WITHOUT load, the sync_binlog can be 1, and the server still has an update time of 0.00
- On the server WITH load, the sync_binlog must be 0 to keep the update time to 0.00
This makes some sense as a server that does not have much load on it, may have more File System caching available or other resources available in order to be able to quickly respond to a single infrequent disk write. This is likely a similar principle to the concept where we must run dd a couple of times in order to get past the FS cache.