Beeline教程 - zhongjiajie/zhongjiajie.github.com GitHub Wiki

beeline教程

beeline是hiveserver2推荐的命令行工具

brief introduction

HiveServer2 supports a command shell Beeline that works with HiveServer2. It's a JDBC client that is based on the SQLLine CLI. There’s detailed SQLLine CLI - documentation of SQLLine which is applicable to Beeline as well.

Replacing the Implementation of Hive CLI Using Beeline

The Beeline shell works in both embedded mode as well as remote mode. In the embedded mode, it runs an embedded Hive (similar to Hive CLI) whereas remote mode is for connecting to a separate HiveServer2 process over Thrift. Starting in Hive 0.14, when Beeline is used with HiveServer2, it also prints the log messages from HiveServer2 for queries it executes to STDERR. Remote HiveServer2 mode is recommended for production use, as it is more secure and doesn't require direct HDFS/metastore access to be granted for users.

Operating Mode Description
Embedded The Beeline client and the Hive installation both reside on the same host machine. No TCP connectivity is required
Remote Use remote mode to support multiple, concurrent clients executing queries against the same remote Hive installation. Remote transport mode supports authentication with LDAP and Kerberos. It also supports encryption with SSL. TCP connectivity is required

In remote mode HiveServer2 only accepts valid Thrift calls – even in HTTP mode, the message body contains Thrift payloads.

hiveserver2 Authentication Schemes with TCP Transport Mode

Authentication Scheme Description
Kerberos A network authentication protocol which operates that uses the concept of 'tickets' to allow nodes in a network to securely identify themselves. Administrators must specify hive.server2.authentication=kerberos in the hive-site.xml configuration file to use this authentication scheme.
LDAP The Lightweight Directory Access Protocol, an application-layer protocol that uses the concept of 'directory services' to share information across a network. Administrators must specify hive.server2.authentication=ldap in the hive-site.xml configuration file to use this type of authentication.
PAM Pluggable Authentication Modules, or PAM, allow administrators to integrate multiple authentication schemes into a single API. Administrators must specify hive.server2.authentication=pam in the hive-site.xml configuration file to use this authentication scheme.
Custom Authentication provided by a custom implementation of the org.apache.hive.service.auth.PasswdAuthenticationProvider interface. The implementing class must be available in the classpath for HiveServer2 and its name provided as the value of the hive.server2.custom.authentication.class property in the hive-site.xml configuration property file.
None The Beeline client performs no authentication with HiveServer2. Administrators must specify hive.server2.authentication=none in the hive-site.xml configuration file to use this authentication scheme.
NoSASL While running in TCP transport mode, HiveServer2 uses the Java Simple Authentication and Security Layer (SASL) protocol to establish a security layer between the client and server. However, HiveServer2 also supports connections in TCP transfer mode that do not use the SASL protocol Administrators must specify hive.server2.authentication=nosasl in the hive-site.xml configuration file to use this authentication scheme.
  • Connecting to Hive with Beeline

    The following examples demonstrate how to use Beeline to connect to Hive for all possible variations of these modes.

    • Embedded Client Use the following syntax to connect to Hive from Beeline in embedded mode:
    !connect jdbc:hive2://
    • Remote Client with HiveServer2 TCP Transport Mode and SASL Authentication Use the following syntax to connect to HiveServer2 in TCP mode from a remote Beeline client:
    !connect jdbc:hive2://<host>:<port>/<db>

    The default port for HiveServer2 in TCP mode is 10000, and db is the name of the database to which you want to connect.

    • Remote Client with HiveServer2 TCP Transport Mode and NoSASL Authentication Clients must explicitly specify the authentication mode in their connection string when HiveServer2 runs in NoSASL mode:
    !connect jdbc:hive2://<host>:<port>/<db>;auth=noSasl hiveuser pass org.apache.hive.jdbc.HiveDriver

    If users forget to include auth=noSasl in the JDBC connection string, the JDBC client API attempts to make an SASL connection to HiveServer2. This causes an open connection that eventually results in the client crashing with an Out Of Memory error.

    • Remote Client with HiveServer2 HTTP Transport Mode Use the following syntax to connect to HiveServer2 in HTTP mode from a remote Beeline client:
    !connect jdbc:hive2://<host>:<port>/<db>;hive.server2.transport.mode=http;hive.server2.thrift.http.path=<http_endpoint>

now to use

  • connect to hive use beeline
# function one
bin/beeline
beeline> !connect jdbc:hive2://localhost:10000 username password
0: jdbc:hive2://localhost:10000> show tables;

# function two
# means you can find the command with the connection string from your UNIX shell history.
beeline -u jdbc:hive2://localhost:10000/default -n username -w password

# function three : Beeline with NoSASL connection
bin/beeline
!connect jdbc:hive2://<host>:<port>/<db>;auth=noSasl hiveuser pass
  • Beeline Commands

    command Description
    !<SQLLine command> List of SQLLine commands .Example: !quit exits the Beeline client.
    !delimiter Set the delimiter for queries written in Beeline. Multi-character delimiters are allowed, but quotation marks, slashes, and -- are not allowed. Defaults to ; Usage: !delimiter $$

## Comparing Beeline to the Hive CLI

HDP supports two Hive clients: the Hive CLI and Beeline. The primary difference between the two involves how the clients connect to Hive.

* The Hive CLI, which connects directly to HDFS and the Hive Metastore, and can be used only on a host with access to those services.

* Beeline, which connects to HiveServer2 and requires access to only one .jar file: hive-jdbc-<version>-standalone.jar.

recommends using HiveServer2 and a JDBC client (such as Beeline) as the primary way to access Hive. This approach uses SQL standard-based authorization or Ranger-based authorization. However, some users may wish to access Hive data from other applications, such as Pig. For these use cases, use the Hive CLI and storage-based authorization.

beeline部分设置

outputformat

beeline -u jdbc:hive2://host:10000/default --outputformat=[table/vertical/csv/tsv/dsv/csv2/tsv2]可以设置相应的outputformat,详细输出例子见beeline-outputformat

FAQ

Ref

⚠️ **GitHub.com Fallback** ⚠️