Analyze Data - TheLadders/pipeline GitHub Wiki
Command Line
Cassandra's CQLSH
- Query Cassandra directly inside of Docker
root@docker$ cqlsh
cqlsh> USE pipeline; SELECT fromuserid, touserid, rating, batchtime FROM real_time_ratings LIMIT 10;
fromuserid | touserid | batchtime| rating
------------+----------+----------+-----------
1 | 133 | 24671840 | 8
1 | 720 | 24671840 | 6
1 | 971 | 24671840 | 10
1 | 1095 | 24673840 | 7
1 | 1616 | 24673840 | 10
1 | 1978 | 24673840 | 7
1 | 2145 | 24673840 | 8
1 | 2211 | 24673840 | 8
1 | 3751 | 24673840 | 7
1 | 4062 | 24673840 | 3
(10 rows)
Beeline's HiveQL CLI
- Query the Hive ThriftServer directly inside of Docker
root@docker$ beeline -u jdbc:hive2://127.0.0.1:10000 -n hiveuser -p ''
0: jdbc:hive2://127.0.0.1:10000> SELECT id, gender FROM gender_json_file LIMIT 100;
Using Notebooks for Ad Hoc Data Analysis
Spark-Notebook
- Get the IP of your Docker Container
macosx-laptop$ docker-machine ip pipelinebythebay
macosx-laptop$ open http://<ip-from-above>:39000
``