0100 Initial Data Collection and Scoping Questions - microsoft/CSS_SQL_Networking_Tools GitHub Wiki
Narrowing the issue as much as possible and having user and machine details available at your fingertips will help the troubleshooting process.
Note: Not all questions may be applicable to all issues but serve as a list of things to consider.
- Client Machine
- Server Machine
- User Information
- Logs
- Big Picture Questions / Actions
- New or Existing Problem
- Use the SQLCHECK tool to automatically gather such information as the computer name, IP addresses, domain, driver information, Alias, SQL instances, service accounts, and SPNs to kick-start your information gathering.
- Run it on every Windows computer involved with the issue: Client, web server, SQL Server.
-
Operating System name, edition, and version (Winver):
-
SQL Server Driver or Provider name and version:
-
Machine Name:
-
IP address:
-
Domain joined to:
-
Application run-time environment (IIS, Windows Forms, Web Sphere, SSIS Job, etc.):
-
Application language:
-
Connection String:
-
What type of authentication is used to connect to the server? NTLM, Kerberos, SQL, AAD, other?
-
If the application is a server/service, does it delegate user credentials to the back-end database?
-
Is constrained delegation being used?
-
Application service account and domain:
-
Physical, virtual, cloud (e.g. IaaS vs Web App vs Web Role vs Power BI, etc.):
-
Is the client driver JDBC or running on Linux or Mac? The workflows are more Windows-oriented at the moment.
-
Does the issue only affect legacy providers, such as Provider=SQLOLEBD or Driver={SQL Server} and not SQL Native client and newer drivers or vice versa?
-
Does the issue only happen in one application or does it occur in multiple or all applications?
-
Does a UDL file fail to connect to other SQL Servers or does it only fail to the problem SQL Server?
-
Can you remote into the SQL Server and connect Management Studio?
-
Does the issue only happen when using the NETBIOS name of the server and not when using the FQDN or vice versa? Does it work using the IP address?
-
If the client is Windows 10 Enterprise Edition, is the Credential Guard feature turned on? This will interfere with full delegation scenarios.
-
Operating System name, edition, and version (Winver):
-
Database Server Name and Version:
-
Machine Name:
-
IP address:
-
Domain joined to:
-
SQL Server service account and domain:
-
SQL Server instance name:
-
Protocols enabled:
-
Port the server listens on:
-
Server pipe name (from ERRORLOG):
-
Physical, virtual, cloud (e.g. IaaS (SQL in an Azure VM) or PaaS (Azure SQL Database)):
-
Is the database stand-alone, clustered, mirrored, Always On?
-
Failover Partner name and IP:
-
Virtual cluster name or Listener name and port:
-
Virtual IP/Listener IP:
-
Is the database on Windows or on Linux or Mac? This may affect data collection.
-
Is the database located in Azure?
-
Is the server on the latest Service Pack and CU? There’s no point in debugging an issue we’ve already fixed.
-
Has SQL Server been upgraded recently to support TLS 1.2? Were the clients patched also? Has TLS 1.0 been turned off?
-
Is the SQL Server service currently running?
-
Is the SQL Browser service running?
-
Is the issue service-account specific? If you run the server using a different service account, does the problem go away?
For a Linked Server, collect Server information for both the mid-tier server and the back-end server.
For an IIS-to-SQL delegation issue, collect information on the web server, including the web.config and authentication settings.
- Does the user log into the client machine directly or do they access it remotely, e.g. via browser?
- Is the user a service, such as SQL Agent? Is the process identity being used, or a stored credential?
- What type of authentication is used to connect to the client application? Windows, Form's auth, Azure Active Directory?
- Does the user connect to the server using integrated security?
- User name and domain name:
If the user is remote to the client application, get:
-
Machine name:
-
IP address:
-
Domain joined to:
-
Is the user connecting over a VPN or via a proxy server? Does the issue happen if connecting directly?
-
If connecting to a web server, is it load balanced?
-
Are sticky sessions/session affinity being used?
-
Is the user logging into a terminal server and running the application from there?
-
Does the issue only affect users in particular organizational units (OUs)?
-
Has the user, client, or server moved to a different Organization Unit in Active Directory?
-
Does the issue only affect non-Admin users?
-
Does the issue only affect users in a particular domain?
-
All or some users?
- Exact error message and call stack:
- Collect SQL Server ERRORLOG, ERRORLOG.1, etc.
- Collect Application and System event logs from client and server.
- Collect the client application's log files and configuration file, e.g. web.config, rsreportserver.config, .config or .ini, etc.
- Get a network diagram showing computers, routers, etc.
-
Does the issue only affect database connections or does it affect web and file share connections as well? Many cases come to the SQL team because it is seen on the database server, but the problem may not be database related at all and require more general Windows or Active Directory support.
-
If the User domain, client domain, or server domain are different, what is the trust relationship between them? External, Forest, one-way, two-way, none?
-
Does the connection work correctly if all the resources are in the same domain?
-
Is the issue intermittent (or periodic) or consistent?
-
Does the issue only occur if more than one user is using the application? Does it occur more often if more users?
-
Does the issue only happen at certain times of the day or days of the week?
-
Does the issue only happen when a backup is being taken or the database is being re-indexed?
-
Does the issue affect more than one server?
-
Does the issue only affect one node in a 5 node cluster? If so, perhaps rebuilding is more efficient.
-
Does the issue affect only one or two clients out of many? If so, perhaps rebuilding is more efficient.
-
Does the issue only affect Named Pipes and not TCP or vice versa?
-
Does the issue happen when using a SQL login and TCP/IP?
-
Is there a working case that can be compared against the failing case? How are the two systems different?
Has the issue always been present (new installation) or did it used to work and started failing recently?
If it used to work, what changes were made to the environment?
Example: Installed patches, domain controllers upgraded, changed the firewall settings, decommissioned domain controllers, moved to a different Organizational Unit (OU) in the domain, etc.