Account: (login)

More Channels


Are you the publisher? Claim this channel

Search in 110,419,727 RSS articles:

Channel Description:

my life with MySQL

Latest Articles in this Channel:

  • 06/12/07--22:59: Changing everything (chan 1825419)
  • This article does not even contain the words database or MySQL. I still believe it is somewhat interesting.

    Mail has, for some reason, always been playing a big role in my life. I have been running mail for two, my girlfriend and me, in 1988. I have been running mail for 20 and 200 people in 1992, setting up a citizens network. Later I designed and built mail systems for 2 000 and 20 000 person corporations, and planned mail server clusters for 200 000 and 2 million users. And just before I became a consultant at MySQL I was working for a shop that did mail for a living for 20 million users.

    Mail is a very simple and well defined collection of services. You accept incoming messages to local users, you implement relaying for your local users with POP-before-SMTP and SMTP AUTH, you build POP, IMAP and webmail accesses, and you deploy spam filter systems and virus scanners for incoming and outgoing messages. This services collection does hardly change when you go from 2 to 20 million users – maybe the larger systems will also provide additional services such as portal services, a news server or other more directed stuff, but that is just fluff outside of the scope of the mail system. The solutions, though, are very different, and very much dependent on the scale of your operations.

    Continue reading "Changing everything"

  • 06/15/07--03:52: Innodb cache preloading using blackhole (chan 1825419)
  • In MyISAM, we do have LOAD INDEX INTO CACHE. In InnoDB this does not work. For benchmarking I often require a way to preload the innodb_buffer_pool with the primary key and data after a server restart to shorten warmup phases.

    According to Blackhole Specialist Kai, the following should work:

    CODE:
    mysql> create table t like innodbtable;
    mysql> alter table t engine = blackhole;
    mysql> insert into t select * from innodbtable;
    Another win for the unbreakable BLACKHOLE storage engine.

  • 07/10/07--22:42: Replication - now and then (chan 1825419)
  • One of the major contributing factors to the success of MySQL is the ease and simplicity of its replication. Read-slaves for scaleout and backup-slaves for noninterrupting backups are the norm in any MySQL installation I have seen in the last two years.

    So how does replication work? And how shall it be expanded in the future?

    What is available?

    The binlog written by MySQL currently logs all statements changing the tablespace. It is a serialization of all tablespace changes. The binlog position, expressed as (binlog name, offset), is a database global timestamp - a timestamp expressed in seconds.fraction does not work for any precision at all, because on a multi-core machine multiple things can happen concurrently.

    If you want to make a consistent full backup of the database, the database must not change during the backup. That is, it must be possible to associate one and exactly one binlog position with the backup.

    In fact, if you have such a backup - one associated with a binlog position - and you happen to have the binlogs from that time until now, it is possible to do a point-in-time (PIT) recovery. You'd recover from the full backup and you'd then replay the binlog from the backups binlog position until now. That is why it is important to store the binlog in a filesystem that fails independently from the rest of your MySQL. That's also why you must not filter the binlog that is written by MySQL using binlog-do-db and binlog-ignore-db - if you do, you'll get an incomplete binlog that will fail to be useful in a PIT recovery scenario.

    A slave in MySQL is now nothing but a binlog downloader and executor: The slave must be restored from a PIT-capable full backup. It is then being told the current binlog position and where to log in to get the missing binlog. The slaves IO_THREAD will then log into the master server and download the binlog to the local disk as fast as possible, storing it as the relay log. The slaves SQL_THREAD will then start to execute the relay log as fast as possible. Replication can thus be thought of as an ongoing live recovery.


    Continue reading "Replication - now and then"

  • 07/11/07--09:25: Rubyisms (chan 1825419)
  • Lately, I have had opportunity to evaluate a very large Ruby installation that also was growing very quickly. A lot of the work performed on site has been specific to the site, but other observations are true for the platform no matter what is being done on it. This article is about Ruby On Rails and its interaction with MySQL in general.

    Continue reading "Rubyisms"

  • 10/31/07--06:35: Seven times faster commit speed in Windows? (chan 1825419)
  • According to my findings in Bug #31876, MySQL does not commit data to disk in Windows using the same method MS SQL Server and DB/2 are using. The method MySQL uses appears to be seven times slower in pathological scenarios.

    The bug report contains a patch - thanks to the MySQL WTF (The Windows Task Force) and the lab provided by the customer for helping me to find that.

    Does this work for you? I want to hear about your test results.

  • 11/26/08--05:15: CREATE TEMPORARY TABLE (chan 1825419)
  • If you have a slave, that slave is probably running with the read-only flag set in the mysqld-section of your my.cnf. To be able to write to a read-only slave you need to be the replication SQL_THREAD or have SUPER privilege.

    Since 5.0.16, it is still possible to execute CREATE TEMPORARY TABLE on a read-only slave, so CREATE TEMPORARY TABLE privilege also allows you to write to a read-only slave in a limited and controlled way.

    If you want to process a lot of data in a temporary table, you are probably creating the temporary table without any indices, then INSERT ... SELECT data into it, and then ALTER TABLE ... ADD INDEX afterwards, because that is usually faster than to insert data into a table with indices. Only that you cannot ALTER TABLE a temporary table, even on a server that is not read-only - in order to run ALTER TABLE on any table, even temporary onces, you need ALTER TABLE privilege which you might not want to give out lightly.

    There is no reason at all to check ALTER TABLE privilege for an alter table operation on a temporary table, because that table is visible only to your connection and cannot be shared. It is also deleted when you disconnect. In fact there is no reason at all to check permissions for temporary tables. But it is done.

    Because it is done, you can grant ALTER TABLE to a single table that does exist or even to a table that does not yet exist. If you grant ALTER TABLE privilege to an existing table on a read-only server, you cannot alter that table, because the server is read-only. If you then use CREATE TEMPORARY TABLE to create a table with that name, the temporary table will shadow the existing persistent table and for your connection the persistent table will become inaccessible.

    The semantics of the GRANT will change, though, and will now apply to the temporary table, which is writeable on a read-only server because it is temporary, and is alterable because of the grant which was not meant to apply to it in the first place. Problem solved: I now can ALTER TABLE my temporary table on the read-only server after I have finished my data load.

    All is well? Not!

    There are multiple things at work here which I consider broken:

    1. GRANTs are applied to temporary tables. This is not making any sense at all in my book. Temporary tables are connection-local objects and they cannot have grants applied to them which were always referring to persistent objects when they were made.
    2. Temporary tables can shadow persistent tables in the namespace of a connection. Because GRANTS are tied to objects via the objects name and not an objects UUID or another form of truly unique object-identifier, GRANTS can refer to changing objects even when the grant does not change. This feels somehow broken, as in "not properly normalized". Does RENAME TABLE edit grant tables as well? I have to check!
    3. By granting CREATE TEMPORARY TABLE privilege to a user I am allowing that user to shadow any other object within a schema. The temporary table will then pick up any rights granted to the shadowed object for the duration of its lifetime. This cannot be good.

  • 03/18/09--01:34: DELETE, innodb_max_purge_lag and a case for PARTITIONS (chan 1825419)
  • Where I work, Merlin is an important tool for us and provides a lot of insight that other, more generic monitoring tools do not provide. We love it, and in fact love it such much that we have about 140 database agents reporting into Merlin 2.0 from about 120 different machines. That results in a data influx of about 1.2G a day without using QUAN, and in a data influx of about 6G a day using QUAN on a set of selected machines.

    It completely overwhelms the Merlin data purge process, so the merlin database grows out of bounds, which is quite unfortunate because our disk space is in fact very bounded.

    The immediate answer to our purge problem was to disable the merlin internal purge and with the kind help of MySQL support to create a script which generates a list of record ids to delete. These ids end up in a number of delete statements with very large WHERE ... IN (...) clauses that do the actual delete.

    This is a band-aid fix, which does work in a way, but also has unintended consequences, though. Or, as we use to say around here: 'That also breaks, but in a different and interesting way.'
    Continue reading "DELETE, innodb_max_purge_lag and a case for PARTITIONS"

  • 05/19/09--03:46: Connection Scoped State in MySQL (chan 1825419)
  • This is the translation of an article from my german language blog. It is not a literal translation, but has been amended and changed a bit to take more recent information into account.

    It started out as a discussion within the german language MySQL group in USENET. There the eternal question came up why phpMyAdmin gets no love at all from the helpers and regulars in that group. My answer was:

    phpMyAdmin (PMA) like many other GUI tools for MySQL has a number of limitations. For a web tool such as PMA these come from its operating principles and can hardly be changed. But let's start at the beginning:

    In MYSQL the connection is a special context or scope for many things. At least the following things are part of the connection scope:

    • Transactions. A disconnect implies a ROLLBACK.
    • Locks. Transactions generate locks with writing statements or SELECT for UPDATE. Table locks are generated by LOCK TABLES. Disconnect releases the locks.
    • The number returned by LAST_INSERT_ID() is cached within the connection context. It is unavailable after a disconnect.
    • Tables created by CREATE TEMPORARY TABLE. They are deleted on disconnect.
    • Prepared Statements and the parsed bytecode of stored routines are kept per connection, even if that is the wrong solution from a performance POV.
    • @-variables. SET @bla = 10 or SELECT @bla := count(*) FROM cookie; are defining variables within the context of a connection. They are lost on disconnect.
    • SESSION-parameters. SET SESSION mysiam_sort_buffer_size = 1024*1024*64 or SET @@session.myisam_sort_buffer_size = 1024*1024*64 are setting configuration parameters with session scope. They are lost on disconnect.
    • Replication specific SET-commands such as SET TIMESTAMP or SET LAST_INSERT_ID can affect the behavior of functions such as SELECT NOW() or SELECT LAST_INSERT_ID().
    I am calling all of that connection scoped state.

    A client that is disconnecting in an uncontrolled way or that does not report a disconnect properly is defective in the sense that all functionality dependent on connection scoped state is unavailable. The opposite case also exists - a client that reuses existing connections may have connections that have an unclean state, which may or may not be a security risk.

    Continue reading "Connection Scoped State in MySQL"

  • 03/18/10--08:38: Getting SQL from a SPAN port (chan 1825419)
  • Recently I needed the query stream hitting a very busy master. Normally I would have been using the MySQL Proxy to collect queries, but on a very busy machine the Proxy is as much of a problem as it is part of the solution, so I chose a different approach.

    I had a SPAN port configured for the master, which is Ciscospeak for a port on a switch which mirrors all traffic of one or more other ports. I had an otherwise idle machine listening to the SPAN port on a spare network card. That way it is possible to collect traffic to and from the master without any interference with the master.

    On the listener box, I had tcpflow collecting data to my master (and only traffic to, not from the master):

    CODE:
    tcpflow -i eth1 dst master and port 3306
    These tcpflow files now need to be processed into a slow-log like format for further processing. For that I wrote a very simple processor in C after some experimentation with tcpdump and mk-query-digest had been shown as being too slow to keep up.

    The processor is called extract_queries and it's souce can be found below. It would be used like so:
    CODE:
    # mkdir flow
    # cd flow
    # tcpflow -i eth1 dst master and port 3306
    (wait 1h)
    (break)
    # cd ..
    # find flow -print0 | xargs -0 extract_queries -u > slow
    # mysqldumpslow -s c slow > stats


    The Source: (extract_queries.c)

  • 04/22/10--02:02: Down the dirty road (chan 1825419)
  • Ok. So it all begins with somebody who is using INSERT ON DUPLICATE KEY UPDATE. That guy wants to count the number of UPDATE actions that statement has taken, as opposed to INSERT actions.

    We could have been using mysql_info() to fetch that information. But instead we rig the UPDATE clause:

    CODE:
    root@localhost [kris]> create table t ( 
      id integer unsigned not null primary key, 
      d integer unsigned not null 
    ) engine = innodb;
    Query OK, 0 rows affected (0.16 sec)

    root@localhost [kris]> insert into t values ( 1, 1), (2,2), (3,3);
    Query OK, 3 rows affected (0.00 sec)
    Records: 3  Duplicates: 0  Warnings: 0

    root@localhost [kris]> set @x = 0;
    Query OK, 0 rows affected (0.00 sec)

    root@localhost [kris]> insert into t values (4,4), (2,1), (3, 1) 
    -> on duplicate key update 
    -> d= values (d) + 0\* ( @x := @x +1 );
    Query OK, 5 rows affected (0.00 sec)
    Records: 3  Duplicates: 2  Warnings: 0

    root@localhost [kris]> select @x;
    +------+
    | @x   |
    +------+
    |    2 |
    +------+
    1 row in set (0.00 sec)
    Wonderful side effects! And this is only the beginning.
    Continue reading "Down the dirty road"

  • 03/31/11--22:51: Fighting the mysqld init script (chan 1825419)
  • Today we discovered a particularly subtle way of fucking up a server restart. After a routine configuration change and RPM upgrade, a colleague tried to restart an important master. That failed. The message:

    CODE:
    root@master ~]# /etc/init.d/mysql start
    Starting MySQLCouldn't find MySQL manager (//bin/mysqlmanag[FAILED]erver (//bin/mysqld_safe)

    The colleague tried multiple times, and finally resorted to manually typing a
    CODE:
    nohup mysqld_safe ...
    into a screen console, which he detached.

    That took care of production for now and left us with an investigation. Why is the init script trying to start the MySQL manager?

    It is not, and never tried to. What happen?


    Continue reading "Fighting the mysqld init script"

  • 07/20/11--06:32: Make me a MEM replication delay screen (chan 1825419)
  • "List me all databases that have a current replication delay of more than 10 seconds."
    "Easy. Let's fetch the data from Merlin."

    And that is how it started.

    The mem schema has a table inventory_attributes, which decodes reported attribute names into attribute_ids:

    Continue reading "Make me a MEM replication delay screen"

  • 08/10/11--04:53: LOAD DATA INFILE (and mysqldump) (chan 1825419)
  • A colleague of mine has been benchmarking mysqldump data load vs. various versions of LOAD DATA INFILE. He created a sample data as a text file with either 100k or 20M rows of five integers each, the first column of which is the pk.

    CODE:
    perl -MList::Util=shuffle -e '@k=shuffle(1..20e6);
      for (@k) {
        print $_, "    ", join("    ", map int(rand(1e9)), 0..3), "\n";
    }' > loadme_nonpkorder.txt

    perl -e 'print ++$i, "    ", join("    ", map int(rand(1e9)), 0..3), "\n" 
      for 1..20e6' > loadme_pkorder.txt

    All insertion has been done on empty and new tables. The text files we read at least once before to warm up the OS disk cache. The tables have two non-unique single-column indexes. All happens on a idle-ish DB master with some substantial memory and a NetApp hosting the datadir (via XFS and LVM).

    He benchmarked four cases:
    1. Insertion in PK order.
    2. Insertion in PK order, dropping indexes before insertion and re-adding them later.
    3. Insertion in random order.
    4. Insertion in random order, dropping indexes before insertion and re-adding them later.

    Summary: The result is not surprising: Both using PK order and dropping/re-adding indexes improves performance considerably. The PK order insertion becomes more and more crucial with a larger dataset (which is not at all surprising if you think about what happens when adding a record to the innodb PK tree).


    Continue reading "LOAD DATA INFILE (and mysqldump)"

  • 09/22/11--00:40: Call for best practice: Talking to r/o slaves through a load-balancer (chan 1825419)
  • I am looking for people who have a bunch of r/o slaves running, and who are using a load balancer to distribute queries across them.

    The typical setup would be a PHP or Perl type of deployment with transient connections which end at the end of the page generation, and where a reconnect is being made at the next request serviced. The connect would go to the load balancer, which will forward it to any suitable database in the pool.

    I am looking for people who are actually deploying this, and what strategies they have to cope with potential problems. I also would like to better understand what common problems are they needed to address.

    Things I can imagine from the top of my head:

    - Slave lag. Slave lag can happen on single boxes due to individual failures (battery on raid controller expires) or many boxes (ALTER TABLE logjams hierarchy). In the latter case boxes cannot be dropped from the load balancer lest you end up with an empty pool.

    - Identifying problematic machines and isolating faults. At the moment, problematic machines sending requests are easily identified: We can SHOW PROCESSLIST, see the problem query, and the host and port it is coming from. We can find that, lsof on the offending source machine and see what the process is. With an LB inbetween we do lose this ability, unless we do fearful layer 2 magic at the LB. How do you identify sources of disruption elegantly and find them to take them out?

    - What is a good pool size? We can unify any number of cells up to an entire data centers capacity from individual cells into one single supercell, but we think that this may be too big a setup. What are sizing guidelines to be used here?

    What else am I missing here?

  • 12/06/11--04:43: pam modules for MySQL: What is wrong with these people? (chan 1825419)
  • Percona just released their MySQL PAM Authentication insanity, just as Oracle did before, for MySQL 5.5 and MariaDB is no better.

    The Oracle module requires a module to be loaded into your client, which is done automatically if the module is present and the server supports PAM auth. The module is called ominously "mysql_clear_password" and does what it says on the tin: Your database server access password is henceforth sent from the client to the server in clear, not encrypted, hashed, salted or otherwise protected.

    I suppose the Percona module does the same, although it is not being mentioned in the docs at all (or at least I have not been able to find it in there). They also openly suggest to run the database server as root, as that is the only way for an in-process PAM auth module to be able to access /etc/shadow.

    *headdesk*

    Does any of you know what SASL is and why it has been invented?

    I know it's a pain, but it is there for a reason. Many reasons. saslauthd for example will read your authentication secrets instead of your worker process, because you are unable to write and maintain a secure codebase the size of a database server. And by speaking SASL on the wire and then handing off an authenticated connection to your actual worker code you gain access to a number of integrated mechanisms for communicating passwords in a compatible and secure manner, none of which include clear text passwords on the wire.

    Can we please bury these plugins, deeply in the Mariana trench, in a CASTOR, put a warning beacon over the site and then start over, doing it right this time?

    Thanks. I knew you would see the light eventually.