weblog
JBoss woes
cornet — Sat, 2009-11-07 02:09
JBoss, on the whole, does hold up surprisingly well. This can probably be attributed to our skilled developers who, to be fair, I don't always give enough credit to.
However every so often JBoss plays up and the symptoms presented seem to point to obvious problems. However all is not as it appears.
It's release day and shiny new software is just itching to be deployed. The sysadmin gets up early and rocks up to the office to go through the standard deployment procedure. If only things always went to plan!
We have a cluster of a number of servers and use the JBoss Farm Deployment service to deploy applications. It's fairly straight forward, you build and deploy .ear, .war, .spring, etc... files to the $JBOSS_HOME/server/default/farm/ directory and all the nodes pick up the new code.
Here comes the first gotcha, which we have known about for quite a while now. If you re-farm an already running package then by default it won't free itself from the PermGen heap so continuous redeploying will eventually mean you run out of PermGen memory.
The solution we have is to make sure we restart every node after deployment to clear out this memory.
What I've found out recently is this issue can be resolved by setting the following Java options:
-XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled
which will free up the PermGen memory.
This will be going into testing out or dev and test environments shortly and hopefully on live (once we are sure there are no adverse affects). This should mean no restarting required in most cases.
I say in most cases as we do have some applications we can't "hot deploy". We have been instructed to do the following:
- Shutdown the build node
- Build and farm the application
- Shutdown all other nodes
- Bring up the build node
- Bring up the other nodes
This obviously leads to a complete outage lasting a few minutes, but we can live with that for the most part.
Once such morning a co-worker followed this procedure and all appeared to go fine. However not long after some of our application started throwing "Broken Pipe" exceptions. The applications in question were communicating with our JBoss clusting using RMI. From the exceptions this initially looked like some network issue. The load balancers (LVS ones) were checked but no issues. More investigation required...
The nodes throwing the exception were part of a 6 node Tomcat cluster communicating to a 3 node JBoss cluster. On closer exception only nodes 1 and 4 of the Tomcat cluster were throwing exceptions. These were restarted but to no avail.
Then I remembered that we we do Source Hashing on our LVS nodes. Source hashing is used to make sure the same clients hit the servers, normally for session tracking purposes, this helped with diagnosis.
I found that 1 JBoss cluster node was at fault, but there were no exceptions in the logs. Further more most transactions were working fine. Just to be safe JBoss was restarted on the offending node but no difference. On with investigation I guess...
Eventually I found something that didn't make sense. $JBOSS_HOME/server/default/tmp/deploy had a timestamp older than I expected.
This directory is used to hold the expanded files from $JBOSS_HOME/server/default/farm/ and should disappear when JBoss is shut down. I shutdown JBoss again and, for whatever reason, it still remained. So I deleted the directory by hand and started up JBoss. Sure enough the "Broken Pipe" exceptions disappeared.
I shutdown JBoss on offending node again and this time it removed the directory. Started up again and all fine.
After much playing around I've no idea what causes this. I know that if JBoss doesn't shut down correctly then this directory can remain causing clustering issues (which really don't make sense to me) but I've seen a number of occasions on our Test Environment where this directory has remained after a successful shutdown.
To make sure this doesn't happen again I've modified out start scripts to check for the presence of this directory and refuse to start up if it exists.
Fingers crossed we won't see this issue again.
Committing to Git
cornet — Thu, 2009-10-22 22:48
Unfortunately for me I still have to use and administer CVS at work for our developers. Personally I ditched it a long time ago in favour of SVN and then Bazaar.
Lately I've been looking because, well....just because really (us tech's have a habit of trying out things for the hell of it).
So I've decided to commit to using it. I've also gone and got myself a GitHub account which I will use to store configurations, scripts and any public dev work that might be of use.
Will see how it all pans out, I'll hopefully get round to adding more repositories soon.
For now probably the only one worth a look is my vim configuration.
Of course you can get a copy by just doing:
git clone git://github.com/cornet/dotvim.git
Microsoft's BPOS Using Postfix
cornet — Wed, 2009-06-10 13:18
Nice to see MS using opensource software :)
Email received today from someone using BPOS:
Received: from mail187-tx2-R.bigfish.com (10.9.14.251) by TX2EHSOBE009.bigfish.com (10.9.40.29) with Microsoft SMTP Server id 8.1.340.0; Wed, 10 Jun 2009 10:33:02 +0000 Received: from mail187-tx2 (localhost.localdomain [127.0.0.1]) by mail187-tx2-R.bigfish.com (Postfix) with ESMTP id 520131100037 for <******@*******.***>; Wed, 10 Jun 2009 10:33:02 +0000 (UTC)
Looks like bigfish.com are using the Postfix mail server.
Quick whois on bigfish.com reveals:
Registrant:
Microsoft Corporation
Domain Administrator
One Microsoft Way
Redmond, WA 98052
US
Email: domains@microsoft.com
Most interesting ;)
Failover hosts using Xen, DRBB and Heartbeat
cornet — Wed, 2009-04-08 20:19
After quite a lot of reading and a morning playing I managed to get failover Xen hosts working.
The idea was to have 2 physical servers to run 2 (or more) Xen hosts between them. If one server was to die or needed some work doing
on it then the domU would automatically move to the other node.
I've done some testing and all appears to work fine. However let me stress that this is not live migration so you would suffer about a minute or so outage
(not really a big deal in the grand scheme of things).
Click the "Read More" button for full details on the setup.
MySQL AUTO_INCREMENT Madness
cornet — Thu, 2009-04-02 11:04
Can anyone explain this behaviour ?
mysql> select version();
+--------------+
| version() |
+--------------+
| 5.1.30-2-log |
+--------------+
1 row in set (0.00 sec)
mysql> create database foo;
Query OK, 1 row affected (0.01 sec)
mysql> use foo;
Database changed
mysql> CREATE TABLE test (
id bigint(20) NOT NULL AUTO_INCREMENT,
stuff varchar(10) DEFAULT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB;
Query OK, 0 rows affected (0.04 sec)
mysql> SHOW CREATE TABLE test \G
*************************** 1. row ***************************
Table: test
Create Table: CREATE TABLE `test` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`stuff` varchar(10) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
mysql> INSERT INTO test (id, stuff) VALUES (-1,'Hello!');
Query OK, 1 row affected (0.04 sec)
mysql> SELECT * FROM test;
+----+--------+
| id | stuff |
+----+--------+
| -1 | Hello! |
+----+--------+
1 row in set (0.00 sec)
mysql> INSERT INTO test (id, stuff) VALUES (0,'Hello!');
ERROR 1467 (HY000): Failed to read auto-increment value from storage engine
mysql> SHOW CREATE TABLE test \G
*************************** 1. row ***************************
Table: test
Create Table: CREATE TABLE `test` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`stuff` varchar(10) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=18446744073709551615 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
We have found a workaround (sort of):
mysql> DROP TABLE test;
Query OK, 0 rows affected (0.04 sec)
mysql> CREATE TABLE test (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
stuff varchar(10) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
Query OK, 0 rows affected (0.10 sec)
mysql> INSERT INTO test (id, stuff) VALUES (-1,'Hello!');
Query OK, 1 row affected (0.03 sec)
mysql> SHOW CREATE TABLE test \G
*************************** 1. row ***************************
Table: test
Create Table: CREATE TABLE `test` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`stuff` varchar(10) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
mysql> INSERT INTO test (id, stuff) VALUES (0,'Hello!');
Query OK, 1 row affected (0.03 sec)
mysql> SELECT * FROM test;
+----+--------+
| id | stuff |
+----+--------+
| -1 | Hello! |
| 1 | Hello! |
+----+--------+
2 rows in set (0.00 sec)
For those that missed it, the difference is the backticks in the CREATE TABLE line.
Also I've yet to test with other versions.
Update
Looks like this is a bug that was fixed in 5-1-31
http://bugs.mysql.com/bug.php?id=41841
http://bugs.mysql.com/bug.php?id=36411