Pycharm - youdar/How-to GitHub Wiki

Pycharm related tip and tricks

Connect Pycharm to Cloud Services

Some of the cloud solution use Jupiter notebooks or other tools that are ok for playing around but might not be as convenient and efficient for solution development as PyCharm.

Connect Pycharm to Google Cloud

how-to-use-pycharm-with-google-compute-engine
PyCharm notes

Setup to work with svn and private ssh key

  • Use puttygen to generate SSH-2 RSA private and public keys (do not use pass-phrase)
    • The private key should be somewhere in your windows machine (you can create a ssh folder to store the private keys)
    • The public key in "svn server name"/<username>/.ssh/authorized_keys
  • Make sure putty is in the environment variable path
  • Make sure TortuousSVN configuration is OK
    • right-click on any folder and select TortuousSVN -> Settings
    • Go to Network and in the SSH Client put "C:\Program Files\TortoiseSVN\bin\TortoisePlink.exe"
    • add to %APPDATA%\Subversion\config under [tunnels], ssh = "C:/Program Files/TortoiseSVN/bin/TortoisePlink.exe"
  • Use putty to create ssh connection for the SVN
    • Start putty
    • In the Session - Host Name enter: <user_namel>@"svn server name"
    • In Connection -> SSH -> Authentication - Private key file for authentication : browse to the private key you generated in first step
    • save Session
    • Use Pageant to save passphrase for automatic login
  • Save the session as 'xxx_svn', it will be used when checking out files from xxx svn servers

Now when you update and commit from PyCharm, it will automatically use the
from the command line you can checkout using:
svn co svn+ssh://xxx_svn/repository_name

Developing on Cloudera Hadoop using pyspark

When developing on a Cloudera Hadoop Cluster one need to:

  • setup the Project environment (remote environment)
  • set the Tools -> Deployment -> Configuration with the Hadoop Cluster information (This will allow copying the code to the Hadoop)

Set PyCharm to work with the Cloudera Hadoop PySpark

To find where is the spark on your Hadoop use

cd /etc/spark/conf
cat spark-env.sh

Then in PyCharm go to Settings -> Project interpreter -> Press setting cogwheel -> More -> Show paths for the selected interpreter and then Add the following paths (according to the PySpark location)

/opt/cloudera/parcels/CDH/lib/spark
/opt/cloudera/parcels/CDH/lib/spark/python
/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip

Then, when running Debug, you might need to add Environmental variable
SPARK_HOME with the value /opt/cloudera/parcels/CDH/lib/spark
go to Run -> Edit Configuration .

Set up python and Spark environmental parameters on Hadoop

Add the following to the Hadoop .bash_profile or .bash_login or .profile file

export PYSPARK_PYTHON="/usr/bin/python"
export SPARK_HOME="/opt/cloudera/parcels/CDH/lib/spark"
PYTHONPATH="/usr/bin/python"
PYTHONPATH="${PYTHONPATH}:$SPARK_HOME/python"
PYTHONPATH="${PYTHONPATH}:$SPARK_HOME/python/lib/py4j-0.9-src.zip"
export PYTHONPATH

The node that we are using must be both YARN and Spark gateway
Note: To deploy the python program you should use spark-submit

Set up Pycharm to work with Anaconda Python on Cloudera Hadoop

Verify that Anaconda is installed on you Cloudera Hadoop
Check that /opt/cloudera/parcels/Anaconda exists
Add the following to the Hadoop .bash_profile or .bash_login or .profile file
instead of the PYTHONPATH and PYSPARK_PYTHON above

export PYSPARK_PYTHON="/opt/cloudera/parcels/Anaconda/lib/pyspark"
export PYTHONPATH=/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages
export PATH=/opt/cloudera/parcels/Anaconda/bin:$PATH

Change the docstring style

Settings -> Tools -> Python integrated tools
change the Docstring format to Google .

Jupyter SSH tunneling

setting-up-ssh-tunnelling-for-your-jupyter-and-pycharm

⚠️ **GitHub.com Fallback** ⚠️