HW3 ‐ Importing CSV Files and Querying Data in Cloud Databases - yentingw-gin/BIDD330_Spring2025_Reporting GitHub Wiki

Overview

CSV file is one of the most common formats to store data. They are supported by most databases and data analysis tools, making them a common choice for data import tasks. In this article, we’ll walk through how to upload a CSV file and query data in a cloud database with three different methods:

SQL Server Management Studio (SSMS)
Python (Jupyter Notebook), and
Databricks (Spark)

Each section walks through setup steps and example code, with tips for beginners.

SQL Server Management Studio (SSMS)

There are several ways to upload a CSV file to a cloud database. For example, a data practitioner can use the BULK INSERT command to load the file directly to the server, or use the Import Flat File feature to first load the data into a local database and then transfer it to the cloud. This article will focus on the latter approach, as it is one of the simplest methods.

Importing CSV Files to Local Instances

Open SSMS and connect to your local SQL Server instance
In the Object Explorer window, right-click on Databases and select New Database
In the New Database window, name the database you are about to create
Click OK
Right-click on the newly created database, select Tasks > Import data
In the SQL Server Import and Export Wizard window, select Flat File Source for Data Source
Choose a CSV file from the Browse window, review the data then click Next
In the Choose a Destination step, select Microsoft OLE DB Provider for SQL Server as the destination and pick a destination database
Review and click Next > Finish

Uploading CSV via SSMS

Backing up Local Instances & Uploading them to Cloud

Right-click on the database to upload, select Tasks > Export Data-tier Application
In the Export Setting window, click Brwose to specify the local disk
Click Next to review the summary and hit Finish
Connect to the cloud server
In the Object Explorer window, right-click on Databases and select Import Data-tier Application > Next
In the Import Settings window, select the backup file you've exported.
Review the Database Settings and Summary, click Finish

The database is in the Cloud, and you can query it using SSMS now!

Python (in Jupyter Notebook)

You can also import CSV files and query data in the cloud using Python in the Jupyter Notebook. Jupyter Notebook provides an interactive environment that allows data practitioners to view results as they code, making it easier to explore and debug data workflows.

Below is a code snippet demonstrating how to connect to the server and execute an SQL query using Python.

import pyodbc
conn = pyodbc.connect(
    'DRIVER={SQL Server};SERVER=uwc-studentsql.continuum.uw.edu\\uwcbiddsql;DATABASE=gold_ytwang;UID={placeholder};PWD={placeholder}'
    )


cursor = conn.cursor()
cursor.execute("SELECT TOP 10 * FROM [dbo].[unemployment_raw]")
for i in cursor:
    print(i)

Databricks (Spark)

Load CSV file: scala> val covid = spark.read.format("csv").option("inferSchema","true").option("header","true").load("bing_covid-19_data.csv")
Show the loaded dataset:scala> covid.show(10)
Preview schema: scala> covid.printSchema
Load this into a view: scala> covid.createOrReplaceView("v_FactCovid")
Run Spark SQL: scala> spark.sql("select COUNT(*) from v_FactCovid where admin_region_1 = 'Hawaii' ").show

How to Leverage These Techniques in the Final Project

For my final project, I am planning to create the analysis and reports using Bing COVID-19 data and Business Formation Statistics Weekly Data.

Below are the action items to prepare the datasets for analysis and reports:

Import the Bing COVID-19 and Business Formation Statistics Weekly Data to the local database using SSMS
Perform ETL, build the data warehouse tablesviews, and the stored procedures in SSMS
Once the data warehouse is built, export the Data-tier Application and import it into the cloud
Use Python and Spark to help transform data in between steps