SafeQuery Explained - google/mug GitHub Wiki
The StringFormat
class offers compile-time safety with interpolation-style string formatting, using named placeholders.
What you shouldn't use it for, is to create SQL queries, because string interpolation with untrusted input can lead to SQL injection attack. For example:
private static final StringFormat FIND_USER_BY_ID =
new StringFormat("SELECT * FROM Users WHERE user_id = '{user_id}'");
...
String query = FIND_USER_BY_ID.format(userIdInput);
If the userIdInput
comes from untrusted sources, it can be used to steal information about other users. The attacker can just send a string like "' OR user_id = 'victim"
, and then they can see all information about the victim.
Instead, consider using the com.google.mu.safesql
package:
SafeQuery query = SafeQuery.of(
"SELECT * FROM Users WHERE user_id = '{user_id}'", userIdInput);
Benefits provided by the safesql
package:
-
SafeQuery
automatically escapes special characters to prevent injection attack. - You can interpolate not only literal values, but tables, columns and even sub-queries.
- The same set of compile-time checks ensure that you can't make human errors (like passing the user password as the user id).
- Extra sql-aware compile-time checks ensure that your SQL template is sane.
- You can compose smaller
SafeQuery
objects to create largeSafeQuery
objects, making it easier to manage complex queries. - The
GoogleSql
class provides extra GoogleSQL conversions such as translatingjava.time.Instant
to GoogleSQL'sTIMESTAMP()
function. You can useGoogleSql
to generateSafeQuery
objects for e.g. BigQuery.
If you are using Maven, make sure to put mug-errorprone
artifact in your annotationProcessorPath
in order to get the compile-time guardrails.
Then refer to the javadoc of SafeQuery
and GoogleSql
for the specifics of the API.
Overall, there are a few best practices and examples worth mentioning.
Create a client library that wraps the interface with the underlying database. The interface should accept SafeQuery
as the input instead of raw String
so that you can better enforce the protection. The client library internally can call SafeQuery#toString()
to get the wrapped query, but the callers of the client library must all pass in SafeQuery
. Example:
class BigQueryClient {
public BigQueryResponse sendQuery(SafeQuery query) {
// ...
}
}
...
// a caller that calls BigQuery
private static final StringFormat.Template<SafeQuery> JOBS_TIMELINE_BY_PROJECT_ID =
GoogleSql.template(
"""
SELECT period_start, period_slot_ms
FROM INFORMATION_SCHEMA.JOBS_TIMELINE
WHERE period_start BETWEEN {start_time} AND {end_time}
AND project_id = '{project_id}'
""");
private final BigQueryClient bigQuery;
// ...
Instant startTime = ...;
Instant endTime = ...;
String projectId = ...;
var response = bigQuery.sendQuery(
JOBS_TIMELINE_BY_PROJECT_ID.with(startTime, endTime, projectId));
If the query expects a string literal, do quote the placeholder in the template, like the '{project_id}'
in the above example. Failing to do so will not compile and even if you manage to fool the compile-time checks, SafeQuery
will throw IllegalArgumentException
.
This makes the query template clearer in terms of what's expected to be string.
To parameterize by table name, column names etc., use backticks to quote around the placeholder (or else you get a compile-time error).
For example:
private static StringFormat.To<SafeQuery> SLOT_MS_PER_PERIOD =
GoogleSql.template(
"""
SELECT sum(period_slot_ms) as total_slot_ms FROM `{timeline_table}`
GROUP BY period_start
""");
// ...
SafeQuery query = SLOT_MS_PER_PERIOD.with("JOBS_TIMELINE_BY_PROJECT");
The above example parameterizes the query by table name, but sometimes you might want to pass in a sub query instead.
For that, you can use a smaller SafeQuery
object to interpolate the larger SafeQuery
. Example:
private static StringFormat.Template<SafeQuery> WINDOW_TIMELINE =
GoogleSql.template(
"""
SELECT * FROM ({timeline_table})
WHERE period_start BETWEEN {start_time} AND {end_time}
""");
// ...
SafeQuery timelineInWindow = WINDOW_TIMELINE.with(timelineTable, startTime, endTime);
SafeQuery query = GoogleSql.from(
"""
SELECT * FROM ({timeline_table})
ORDER BY period_start
""",
timelineInWindow);
SafeQuery
and GoogleSql
only allow types that are known to be safe (numbers, enums, Instant
, ZonedDateTime
, LocalDate
, compile-time string literals, quoted string values, backtick-quoted identifiers etc., and other SafeQuery
objects).
But sometimes you may have string values that come from trusted sources (say, from a command-line flag). These string values are not compile-time string literals, so you won't be able to pass them to SafeQuery.of()
, or use them as unquoted placeholder values.
As a work-around, you can create your own TrustedSql
class. If you tightly control the construction of this class only from trusted sources, it'll be provably safe. You can then set the system property com.google.mu.safesql.SafeQuery.trusted_sql_type
to this class name. SafeQuery
will pick it up and objects of this type will be accepted.
For example:
package com.mycompany.safesql;
public final class TrustedSql {
private final String sql;
private TrustedSql(String sql) {
this.sql = sql;
}
public static TrustedSql fromFlag(Flag<String> stringFlag) {
return new TrustedSql(stringFlag.get());
}
public static TrustedSql loadFromResource(String resourceName)
throws IOException {
...
}
/** The class is expected to be annotated with @Query("SELECT ..."). */
public static TrustedSql loadFromQueryAnnotation(Class<?> annotatedClass) {
...
}
@Override public String toString() {
return sql;
}
}
// ...
SafeQuery query = SLOT_MS_PER_PERIOD.with(
TrustedSql.loadFromResource("com/google/mycompany/all_job_timeline.sql"));
As a convenience feature, when you pass an Iterable
as the value of a template placeholder, it'll be automatically expanded, using ,
as the delimiter.
For example:
private static final StringFormat.To<SafeQuery> GET_JOBS_BY_ID =
SafeQuery.template("SELECT * FROM Jobs WHERE job_id IN ({ids})");
...
List<Integer> ids = asList(1, 2, 3);
SafeQuery query = GET_JOBS_BY_ID.with(ids);
// produces "SELECT * FROM Jobs WHERE job_id IN (1, 2, 3)
If you expect the list elements to be string values, you need to quote the placeholder (the library will throw otherwise):
List<String> ids = asList("foo", "bar's");
SafeQuery query =
SafeQuery.of("SELECT * FROM Jobs WHERE job_id IN ('{ids}')", ids);
// produces "SELECT * FROM Jobs WHERE job_id IN ('foo', 'bar\'s')