Some time ago, Marla was asked to take a look at a problematic data load to help figure out exactly what the heck was going wrong with it and report back to management.

The process was supposed to be simple: download an XML file from a web service and then merge the data to the database, updating and inserting as needed.

It worked, but the time it took to process a 500 KB test file wasn't exactly stellar - in all, it took several minutes to finish. Not necessarily an issue but when files are normally 50 MB or more, it was taking hours to process a single file.

After sifting through the project's VB code, Marla was able to narrow down the core WTF of the data load process:


Public Shared Sub LoadXMLtoDataset(ByVal xmlfile As String, ByVal schemafile As String, ByVal conn As String)

    For Each dataTable In dataSet.Tables
      dataTable.BeginLoadData()
    Next

    dataSet.ReadXml(xmlfile)
    For Each dataTable In dataSet.Tables
      Dim tablename As String = dataTable.TableName
      Dim row As DataRow
      For Each row In dataTable.Rows
        'Check if there is a new table found in the XML which cannot be found in the database.  
        If iTableProperty.CheckTableExists(dataTable.TableName, conn) = True Then
          Dim declareTableVar As String = "Declare "
          Dim setValues As String = Nothing
          Dim mergesqlcmd As String
          Dim selectsqlcmd As String = " Select "
          Dim updatesqlcmd As String = "Update Set "
          Dim insertsqlCmd As String = "Insert ("
          Dim matchfields As String = ""

          'Iterate the datatable columns in order to form an insert, a select, a declare and set statements of a merge 
          For i As Integer = 0 To dataTable.Columns.Count - 1
            If dataTable.Columns(i).ColumnName.ToString().Trim <> "ID" Then
              'Check if there is a new column found in the XML which cannot be found in the database.  
              If iTableProperty.CheckColumnsExists(dataTable.TableName, dataTable.Columns(i).ColumnName.ToString(), conn) = True Then   '<---- database call
                declareTableVar = declareTableVar + "@" + dataTable.Columns(i).ColumnName.ToString() + " " + iTableProperty.GetDataTypeLength(dataTable.TableName, dataTable.Columns(i).ColumnName.ToString(), conn) + ","    '<---- database call
                Dim ConvertedSetVal As String = Nothing
                If IsDBNull(row(i)) Then
                  ConvertedSetVal = "NULL"
                Else
                  ConvertedSetVal = iTableProperty.ConvertValue(dataTable.TableName, dataTable.Columns(i).ColumnName.ToString(), IIf(IsDBNull(row(i)), System.DBNull.Value, row(i).ToString().Replace("'", "''")), conn)    '<---- database call
                End If
                setValues = setValues + " Set " + "@" + dataTable.Columns(i).ColumnName.ToString() + "=" & ConvertedSetVal.ToString & ";"
                matchfields = matchfields + "source." + dataTable.Columns(i).ColumnName.ToString() + " <> target." + dataTable.Columns(i).ColumnName.ToString() + " Or "
                selectsqlcmd = selectsqlcmd + "@" + dataTable.Columns(i).ColumnName.ToString() + " as " + dataTable.Columns(i).ColumnName.ToString() + ","
                insertsqlCmd = insertsqlCmd + dataTable.Columns(i).ColumnName.ToString() + ","
              End If
            End If
          Next
          insertsqlCmd = insertsqlCmd.Substring(0, insertsqlCmd.Length - 1) & ") values ("
          selectsqlcmd = selectsqlcmd.Substring(0, selectsqlcmd.Length - 1)
          declareTableVar = declareTableVar.Substring(0, declareTableVar.Length - 1)
          setValues = setValues.Substring(0, setValues.Length - 1)
          matchfields = matchfields.Substring(0, matchfields.Length - 3)

          'Iterate the DataTable columns for constructing an insert statement.
          For x As Integer = 0 To dataTable.Columns.Count - 1
            ' Add the column value for this row
            If dataTable.Columns(x).ColumnName.ToString().Trim <> "ID" Then
              Dim ConvertedInsertval As Object = Nothing
              insertsqlCmd = (insertsqlCmd) & "@" + dataTable.Columns(x).ColumnName.ToString() & ","
            End If
          Next
          insertsqlCmd = insertsqlCmd.Substring(0, insertsqlCmd.Length - 1) & ");"
          'Iterate the DataTable columns for constructing an update statement.
          For i As Integer = 0 To dataTable.Columns.Count - 1
            ' Add the column name
            If dataTable.Columns(i).ColumnName.ToString().Trim <> "ID" Then
              If dataTable.Columns(i).ColumnName.ToString().Trim <> "ID" Then
                Dim ConvertedUpdateVal As Object = Nothing
                updatesqlcmd = updatesqlcmd + dataTable.Columns(i).ColumnName.ToString() & "=" & "@" + dataTable.Columns(i).ColumnName.ToString() & ","
              End If
            End If
          Next
          updatesqlcmd = updatesqlcmd.Substring(0, updatesqlcmd.Length - 1)
          'Form the merge statement and execute it.
          mergesqlcmd = " " & declareTableVar & ";" & _
             "" & setValues & "" & _
             " Merge " & "dbo." & getWord.findWord(dataTable.TableName) & " as Target" & _    '<----- database call
             " Using (" & selectsqlcmd & ") as Source" & _
             " On " + "Target." + iTableProperty.GetTablePrimaryKeyName(dataTable.TableName, conn) + "" & _  '<----- database call
             " When Matched and (" & matchfields & ") Then" & _
             " " & updatesqlcmd & "" & _
             " When not Matched by Target then" & _
             " " & insertsqlCmd & ""

          'Execute the merge query.
          Using connDB As New Data.SqlClient.SqlConnection(System.Configuration.ConfigurationManager.ConnectionStrings.Item(conn).ConnectionString)
            connDB.Open()
            Using cmd As New SqlClient.SqlCommand(mergesqlcmd, connDB)
              cmd.ExecuteNonQuery()
            End Using
          End Using
          'Clear the query statements before going to the next row.
          insertsqlCmd = ""
          updatesqlcmd = ""
          selectsqlcmd = ""
          declareTableVar = ""
          setValues = ""
        End If
      Next row
      dataTable.EndLoadData()
    Next

  End Sub

Marla ...didn't really know where to begin... So many problems. Merging data one row at a time...was the merge command really meant to work that way? Making calls to the database to check if columns and tables exist after the XML file has already been validated earlier in the program. And perhaps the most infuriating detail: THE FACT THAT THE ORIGINAL DEVELOPER DIDN'T THINK THAT MAKING OVER 40,000 CALLS TO THE SQL SERVER TO PROCESS A SMALL DATA FILE WAS GOING TO LEAD TO “SLOW” PERFORMANCE!

But Marla - being the cool headed professional that she is - simply identified the offending procedure in its home source file, called out a handful of the really big problems, and sent her report off to management.

The reply from her higher ups was surprisingly swift. As she opened the reply, Marla half expected to find that she had been "nominated" to author the solution, which wouldn't have been that bad of a task, but instead, it wasn't anything like what she expected.

________________________________________________________
From: [email protected]
To: [email protected]
Subject: RE: Dataload Performance Analysis

Thanks for that great analysis, Marla!

We haven't heard anything from the users so we're going to sit on this one until someone complains.

Regards,
Herb Jenkins
Senior Project Lead
Magenta Corporation

[Advertisement] BuildMaster allows you to create a self-service release management platform that allows different teams to manage their applications. Explore how!