Parsing KML HTML Tables - anthonyblackham/GIS-Wiki GitHub Wiki

Have you ever tried to convert KML/KMZ files to a shapefile/feature class etc. and had problems with the attributes not carrying over? You're not alone. The problem is that keyhole markup language uses html tables for the attributes in the pop up tables so when you import it to a more GIS centric format it treats the table/tags as one field which isn't super useful.

One way to handle this would be to use a script to convert those html tags to individual fields.

The following is a method in arcmap using a python script.

First the script courtesy of Dwarburger from this thread: I saved it as kml2featureclass.py

import arcpy, os

#this line is for using a script tool in arcmap
input_parameter = arcpy.GetParameterAsText(0)
#alternatively use input_parameter = 'C:\......\file.kmz' to run as stand alone script

direct = os.path.dirname(input_parameter)
arcpy.conversion.KMLToLayer(input_parameter, direct)
arcpy.env.overwriteOutput = True

database = input_parameter[:-3] + 'gdb'
dataset = database + '\Placemarks'

arcpy.env.workspace = dataset
GCS_List = arcpy.ListFeatureClasses()

coord_sys = arcpy.GetParameter(1)
#in stand-alone script use arcpy.SpatialReferece('desired Coord Sys name')

e_count = 0

for FC in GCS_List:

   arcpy.Project_management(FC, database + '\\' + FC + '_Proj', coord_sys)

arcpy.env.workspace = database
UTM_List = arcpy.ListFeatureClasses()

mxd = arcpy.mapping.MapDocument('CURRENT')
df = arcpy.mapping.ListDataFrames(mxd)[0]

keep_fields = ['OID', 'Shape', 'SHAPE', 'PopupInfo', 'Shape_Length', 'Shape_Area', 'SHAPE_Length', 'SHAPE_Area']

for FC in UTM_List:

   update_layer = arcpy.mapping.Layer(database + '\\' + FC)
   arcpy.mapping.AddLayer(df, update_layer)

# first add the fields

   SC = arcpy.SearchCursor(FC)
   for row in SC:
      
      pop_string = row.getValue("PopupInfo")
      pop_array = pop_string.split("<")
      fields_array = []
      names_array = []

      for tag in pop_array:
         if "td>" in tag and "/td>" not in tag:
            fields_array.append(tag)
      break

   for fields in arcpy.ListFields(FC):
      
      if fields.name not in keep_fields:
         arcpy.DeleteField_management(FC,fields.name)

#this will list the field names and field values
#even indexes are field names (starts at 0)  
#and odd indexes are field values
   del fields_array[:2]
  
   for x in range(0,len(fields_array)):
      fields_array[x]=fields_array[x].replace("td>","")
      if x%2 == 0 and fields_array[x] not in keep_fields:
         names_array.append(fields_array[x])
         arcpy.AddField_management(FC, fields_array[x], "TEXT")

# default is all TEXT fields but I could change this later to reference the values
#now we update the values
   names_array.append("PopupInfo")
                      
   with arcpy.da.UpdateCursor(FC,names_array) as UC:

      for row in UC:

         pop_string = row[-1]
         pop_array = pop_string.split("<")
         fields_array = []
         values_array = []

         for segment in pop_array:
            if "td>" in segment and "/td>" not in segment:
               fields_array.append(segment)
            
         del fields_array[:2]
        
         for x in range(0,len(fields_array)):
            if x%2<>0:
               if fields_array[x-1] not in keep_fields:
                  fields_array[x]=fields_array[x].replace("td>","")
                  values_array.append(fields_array[x])

         for y in range(0,len(values_array)):
            try:
               row[y] = values_array[y]
               UC.updateRow(row)
            except IndexError:
               e_count = e_count + 1

You can create a psuedo addon in arcmap with script tools.

In arc catalog you can create a toolbox and create a script tool to input the fields in the script or alternatively you can just manually put in the paths to the script and run it directly.

Manual Attempts

Copy all rows in description field and paste in online validator such as [this](https://wet-boew.github.io/v4.0-ci/demos/tablevalidator/tablevalidator-en.html) and that will remove all the tags and then paste into notepad++

Mark All Fields:

Go to Search menu > Find... > Select "Mark" Tab. Activate regular expressions. Search for ^<Path> (^ is for line start). Don't forget to check "Bookmark lines" and Press "Mark All"

==> All Rows you want to keep got a Bookmark
Go to Menu "Search - Bookmark - Inverse Bookmark"

==> All Line you want to delete are bookmarked.
Go to Menu "Search - Bookmark - Remove Bookmarked lines"

==> All Bookmarked lines are deleted.

hold ctrl+alt while selecting to select vertical columns to delete sections of data

then you can paste back into a spreadsheet, save that spreadsheet as a csv and then join that to your data and field calculate new fields.

⚠️ **GitHub.com Fallback** ⚠️