Some time ago, Marla was asked to take a look at a problematic data load to help figure out exactly what the heck was going wrong with it and report back to management.
The process was supposed to be simple: download an XML file from a web service and then merge the data to the database, updating and inserting as needed.
It worked, but the time it took to process a 500 KB test file wasn't exactly stellar - in all, it took several minutes to finish. Not necessarily an issue but when files are normally 50 MB or more, it was taking hours to process a single file.
After sifting through the project's VB code, Marla was able to narrow down the core WTF of the data load process:
Public Shared Sub LoadXMLtoDataset(ByVal xmlfile As String, ByVal schemafile As String, ByVal conn As String) For Each dataTable In dataSet.Tables dataTable.BeginLoadData() Next dataSet.ReadXml(xmlfile) For Each dataTable In dataSet.Tables Dim tablename As String = dataTable.TableName Dim row As DataRow For Each row In dataTable.Rows 'Check if there is a new table found in the XML which cannot be found in the database. If iTableProperty.CheckTableExists(dataTable.TableName, conn) = True Then Dim declareTableVar As String = "Declare " Dim setValues As String = Nothing Dim mergesqlcmd As String Dim selectsqlcmd As String = " Select " Dim updatesqlcmd As String = "Update Set " Dim insertsqlCmd As String = "Insert (" Dim matchfields As String = "" 'Iterate the datatable columns in order to form an insert, a select, a declare and set statements of a merge For i As Integer = 0 To dataTable.Columns.Count - 1 If dataTable.Columns(i).ColumnName.ToString().Trim <> "ID" Then 'Check if there is a new column found in the XML which cannot be found in the database. If iTableProperty.CheckColumnsExists(dataTable.TableName, dataTable.Columns(i).ColumnName.ToString(), conn) = True Then '<---- database call declareTableVar = declareTableVar + "@" + dataTable.Columns(i).ColumnName.ToString() + " " + iTableProperty.GetDataTypeLength(dataTable.TableName, dataTable.Columns(i).ColumnName.ToString(), conn) + "," '<---- database call Dim ConvertedSetVal As String = Nothing If IsDBNull(row(i)) Then ConvertedSetVal = "NULL" Else ConvertedSetVal = iTableProperty.ConvertValue(dataTable.TableName, dataTable.Columns(i).ColumnName.ToString(), IIf(IsDBNull(row(i)), System.DBNull.Value, row(i).ToString().Replace("'", "''")), conn) '<---- database call End If setValues = setValues + " Set " + "@" + dataTable.Columns(i).ColumnName.ToString() + "=" & ConvertedSetVal.ToString & ";" matchfields = matchfields + "source." + dataTable.Columns(i).ColumnName.ToString() + " <> target." + dataTable.Columns(i).ColumnName.ToString() + " Or " selectsqlcmd = selectsqlcmd + "@" + dataTable.Columns(i).ColumnName.ToString() + " as " + dataTable.Columns(i).ColumnName.ToString() + "," insertsqlCmd = insertsqlCmd + dataTable.Columns(i).ColumnName.ToString() + "," End If End If Next insertsqlCmd = insertsqlCmd.Substring(0, insertsqlCmd.Length - 1) & ") values (" selectsqlcmd = selectsqlcmd.Substring(0, selectsqlcmd.Length - 1) declareTableVar = declareTableVar.Substring(0, declareTableVar.Length - 1) setValues = setValues.Substring(0, setValues.Length - 1) matchfields = matchfields.Substring(0, matchfields.Length - 3) 'Iterate the DataTable columns for constructing an insert statement. For x As Integer = 0 To dataTable.Columns.Count - 1 ' Add the column value for this row If dataTable.Columns(x).ColumnName.ToString().Trim <> "ID" Then Dim ConvertedInsertval As Object = Nothing insertsqlCmd = (insertsqlCmd) & "@" + dataTable.Columns(x).ColumnName.ToString() & "," End If Next insertsqlCmd = insertsqlCmd.Substring(0, insertsqlCmd.Length - 1) & ");" 'Iterate the DataTable columns for constructing an update statement. For i As Integer = 0 To dataTable.Columns.Count - 1 ' Add the column name If dataTable.Columns(i).ColumnName.ToString().Trim <> "ID" Then If dataTable.Columns(i).ColumnName.ToString().Trim <> "ID" Then Dim ConvertedUpdateVal As Object = Nothing updatesqlcmd = updatesqlcmd + dataTable.Columns(i).ColumnName.ToString() & "=" & "@" + dataTable.Columns(i).ColumnName.ToString() & "," End If End If Next updatesqlcmd = updatesqlcmd.Substring(0, updatesqlcmd.Length - 1) 'Form the merge statement and execute it. mergesqlcmd = " " & declareTableVar & ";" & _ "" & setValues & "" & _ " Merge " & "dbo." & getWord.findWord(dataTable.TableName) & " as Target" & _ '<----- database call " Using (" & selectsqlcmd & ") as Source" & _ " On " + "Target." + iTableProperty.GetTablePrimaryKeyName(dataTable.TableName, conn) + "" & _ '<----- database call " When Matched and (" & matchfields & ") Then" & _ " " & updatesqlcmd & "" & _ " When not Matched by Target then" & _ " " & insertsqlCmd & "" 'Execute the merge query. Using connDB As New Data.SqlClient.SqlConnection(System.Configuration.ConfigurationManager.ConnectionStrings.Item(conn).ConnectionString) connDB.Open() Using cmd As New SqlClient.SqlCommand(mergesqlcmd, connDB) cmd.ExecuteNonQuery() End Using End Using 'Clear the query statements before going to the next row. insertsqlCmd = "" updatesqlcmd = "" selectsqlcmd = "" declareTableVar = "" setValues = "" End If Next row dataTable.EndLoadData() Next End Sub
Marla ...didn't really know where to begin... So many problems. Merging data one row at a time...was the merge command really meant to work that way? Making calls to the database to check if columns and tables exist after the XML file has already been validated earlier in the program. And perhaps the most infuriating detail: THE FACT THAT THE ORIGINAL DEVELOPER DIDN'T THINK THAT MAKING OVER 40,000 CALLS TO THE SQL SERVER TO PROCESS A SMALL DATA FILE WAS GOING TO LEAD TO “SLOW” PERFORMANCE!
But Marla - being the cool headed professional that she is - simply identified the offending procedure in its home source file, called out a handful of the really big problems, and sent her report off to management.
The reply from her higher ups was surprisingly swift. As she opened the reply, Marla half expected to find that she had been "nominated" to author the solution, which wouldn't have been that bad of a task, but instead, it wasn't anything like what she expected.
________________________________________________________ From: [email protected] To: [email protected] Subject: RE: Dataload Performance Analysis Thanks for that great analysis, Marla! We haven't heard anything from the users so we're going to sit on this one until someone complains. Regards, Herb Jenkins Senior Project Lead Magenta Corporation