Connecting to a Local Node (for developers) - IKANOW/Aleph2 GitHub Wiki

This is what I had to do to connect to a local node from eclipse running on my dev PC.

This is specifically how to get the v1 sync service running locally, pointed at local node.

The project you want to run is com.ikanow.aleph2.data_import_manager.modules.DataImportManagerModule with a single arg pointing to a config file. The config file needs to look something like:

# SERVICES
service.CoreDistributedServices.interface=com.ikanow.aleph2.distributed_services.services.ICoreDistributedServices
service.CoreDistributedServices.service=com.ikanow.aleph2.distributed_services.services.CoreDistributedServices
service.StorageService.interface=com.ikanow.aleph2.data_model.interfaces.data_services.IStorageService
service.StorageService.service=com.ikanow.aleph2.storage_service_hdfs.services.HdfsStorageService
service.ManagementDbService.interface=com.ikanow.aleph2.data_model.interfaces.data_services.IManagementDbService
service.ManagementDbService.service=com.ikanow.aleph2.management_db.mongodb.services.MongoDbManagementDbService
service.CoreManagementDbService.interface=com.ikanow.aleph2.data_model.interfaces.data_services.IManagementDbService
service.CoreManagementDbService.service=com.ikanow.aleph2.management_db.services.CoreManagementDbService
service.SearchIndexService.interface=com.ikanow.aleph2.data_model.interfaces.data_services.ISearchIndexService
service.SearchIndexService.service=com.ikanow.aleph2.search_service.elasticsearch.services.ElasticsearchIndexService
service.TemporalService.interface=com.ikanow.aleph2.data_model.interfaces.data_services.ITemporalService
service.TemporalService.service=com.ikanow.aleph2.search_service.elasticsearch.services.ElasticsearchIndexService
service.ColumnarService.interface=com.ikanow.aleph2.data_model.interfaces.data_services.IColumnarService
service.ColumnarService.service=com.ikanow.aleph2.search_service.elasticsearch.services.ElasticsearchIndexService
service.SecurityService.interface=com.ikanow.aleph2.data_model.interfaces.shared_services.ISecurityService
service.SecurityService.service=com.ikanow.aleph2.data_model.interfaces.shared_services.MockSecurityService
service.StreamingEnrichmentService.interface=com.ikanow.aleph2.data_model.interfaces.data_analytics.IAnalyticsTechnologyService
service.StreamingEnrichmentService.service=com.ikanow.aleph2.analytics.storm.services.StormAnalyticTechnologyService
service.BatchEnrichmentService.interface=com.ikanow.aleph2.data_model.interfaces.data_analytics.IAnalyticsTechnologyService
service.BatchEnrichmentService.service=com.ikanow.aleph2.analytics.hadoop.services.HadoopTechnologyService

# CONFIG
# GLOBALS
# REPLACE THESE WITH FOLDERS ON YOUR LOCAL SYSTEM
globals.local_root_dir=C:/Users/Burch/Desktop/v2_dev_config/
globals.local_yarn_config_dir=C:/Users/Burch/Desktop/v2_dev_config/yarn-config/
globals.local_cached_jar_dir=C:/Users/Burch/Desktop/v2_dev_config/cached-jars/

# MANAGEMENT DB:
MongoDbManagementDbService.mongodb_connection=YOUR_MONGO_ADDRESS:PORT
MongoDbManagementDbService.v1_enabled=true

# CORE DISTRIBUTED SERVICES
CoreDistributedServices.application_name=DataImportManager
CoreDistributedServices.application_port.DataImportManager=2252

# SEARCH INDEX`
ElasticsearchCrudService.elasticsearch_connection=YOUR_ES_ADDRESS:PORT
#(use whatever cluster name is running at "elasticsearch_connection")

# DATA IMPORT MANAGER:
DataImportManager.harvest_enabled=true
DataImportManager.streaming_enrichment_enabled=true
DataImportManager.batch_enrichment_enabled=false

3 Things to note:

  1. ES address of your cluster ElasticsearchCrudService.elasticsearch_connection

  2. Mongo address of your cluster MongoDbManagementDbService.mongodb_connection

  3. Local globals folder globals.local_root_dir, globals.local_yarn_config_dir, globals.local_cached_jar_dir

For step 3, create this file structure locally, and point to it. Additionally you need to create an additional /lib/ folder and drop all the nightly jar files into it (previously I suggested you need to remove some of the libs, but I do not believe this to be the case anymore)

  • {somewhere_on_my_pc}/cached-jars/ == empty folder
  • {somewhere_on_my_pc}/lib/ == nightly libs
  • {somewhere_on_my_pc}/yarn-config/ == config from local cluster (/opt/aleph2-home/yarn-config/)
  • {somewhere_on_my_pc}/my_config_file.properties == see above

Once you've got your filesys all setup like this and the properties file created, you should be able to run the sync service as I mentioned above com.ikanow.aleph2.data_import_manager.modules.DataImportManagerModule {somewhere_on_my_pc}/my_config_file.properties

Additionally: if you are using eclipse to run the DIM, you need to add all the other aleph2_* projects to the classpath of your running application AND explicitly reference the storm dependencies jar in {somewhere_on_my_pc}/lib/aleph2_storm_dependencies-* to force it to pull in the required storm libs required for submitting jobs to storm (if you need to test this). See attached image for an example. example DIM classpath

Make sure the sync service isn't running on your nodes (as only 1 can be running currently).

Now when you create/modify any buckets on your cluster, you can debug the sync service locally.