Blane D is responsible for loading data into a Vertica 8.1 database for analysis. Vertica is a distributed, column-oriented store, for data-warehousing applications, and its driver has certain quirks.

For example, a common task that you might need to perform is swapping storage partitions around between tables to facilitate bulk data-loading. Thus, there is a SWAP_PARTITIONS_BETWEEN_TABLES() stored procedure. Unfortunately, if you call this function from within a prepared statement, one of two things will happen: the individual node handling the request will crash, or the entire cluster will crash.

No problem, right? Just don’t use a prepared statement. Unfortunately, if you use the ODBC driver for Python, every statement is converted to a prepared statement. There’s a JDBC driver, and a bridge to enable it from within Python, but it also has that problem, and it has the added cost of requiring a a JVM running.

So Blane did what any of us would do in this situation: he created a hacky-workaround that does the job, but requires thorough apologies.

def run_horrible_partition_swap_hack(self, horrible_src_table, horrible_src_partition,
                                     terrible_dest_table, terrible_dest_partition):
    """
    First things first - I'm sorry, I am a terrible person.

    This is a horrible horrible hack to avoid Vertica's partition swap bug and should be removed once patched!

    What does this atrocity do?
    It passes our partition swap into the ODBC connection string so that it gets executed outside of a prepared
    statement... that's it... I'm sorry.
    """
    conn = self.get_connection(getattr(self, self.conn_name_attr))

    hacky_sql = "SELECT SWAP_PARTITIONS_BETWEEN_TABLES('{src_table}',{src_part},{dest_part},'{dest_table}')"\
        .format(src_table=horrible_src_table,
                src_part=horrible_src_partition,
                dest_table=terrible_dest_table,
                dest_part=terrible_dest_partition)

    even_hackier_sql = hacky_sql.replace(' ', '+')

    conn_string = ';'.join(["DSN={}".format(conn.host),
                            "DATABASE={}".format(conn.schema) if conn.schema else '',
                            "UID={}".format(conn.uid) if conn.uid else '',
                            "PWD={}".format(conn.password) if conn.password else '',
                            "ConnSettings={}".format(even_hackier_sql)])  # :puke:

    odbc_conn = pyodbc.connect(conn_string)

    odbc_conn.close()

Specifically, Blane leverages the driver’s ConnSettings option in the connection string, which allows you to execute a command when a client connects to the database. This is specifically meant for setting up session parameters, but it also has the added “benefit” of not performing that action using a prepared statement.

If it’s stupid but it works, it’s probably the vendor’s fault, I suppose.

[Advertisement] BuildMaster allows you to create a self-service release management platform that allows different teams to manage their applications. Explore how!