Training Bulk Loading HTTP Based - tomgeudens/practical-neo4j GitHub Wiki

Context: Cut-and-paste commands for the Bulk Loading training (http edition).

Prerequisites

This document will assume you have a Neo4j instance up-and-running. You should also be able to reach the below files over http (which is not the same as being able to download them). Check the first Inspection command in your Neo4j Browser. If it doesn't work, try the file based edition

What Location
Movie Nodes import/movies/movies.csv
Person Nodes import/movies/people.csv
ACTED_IN Relationships import/movies/actors.csv
DIRECTED Relationships import/movies/directors.csv

Create a new database

Take a moment to appreciate how much difference the 4.x release makes here. In the old days you'd have to stop your instance, move the current data out of the way (if you wanted to keep it) and restart the instance whereas now you just ...

:USE system
CREATE DATABASE movies;
:USE movies

Inspection

How many lines in the files?

LOAD CSV FROM 'http://data.neo4j.com/intro/movies/movies.csv'
AS row
RETURN count(*);

LOAD CSV FROM 'http://data.neo4j.com/intro/movies/people.csv'
AS row
RETURN count(*);

LOAD CSV FROM 'http://data.neo4j.com/intro/movies/actors.csv'
AS row
RETURN count(*);

LOAD CSV FROM 'http://data.neo4j.com/intro/movies/directors.csv'
AS row
RETURN count(*);

What is in the files?

LOAD CSV FROM 'http://data.neo4j.com/intro/movies/movies.csv'
AS row
RETURN * LIMIT 5;

LOAD CSV FROM 'http://data.neo4j.com/intro/movies/people.csv'
AS row
RETURN * LIMIT 5;

LOAD CSV FROM 'http://data.neo4j.com/intro/movies/actors.csv'
AS row
RETURN * LIMIT 5;

LOAD CSV FROM 'http://data.neo4j.com/intro/movies/directors.csv'
AS row
RETURN * LIMIT 5;

What is in the files (again)?

LOAD CSV WITH HEADERS FROM 'http://data.neo4j.com/intro/movies/movies.csv'
AS row
RETURN row, keys(row) LIMIT 5;

LOAD CSV WITH HEADERS FROM 'http://data.neo4j.com/intro/movies/people.csv'
AS row
RETURN row, keys(row) LIMIT 5;

LOAD CSV WITH HEADERS FROM 'http://data.neo4j.com/intro/movies/actors.csv'
AS row
RETURN row, keys(row) LIMIT 5;

LOAD CSV WITH HEADERS FROM 'http://data.neo4j.com/intro/movies/directors.csv'
AS row
RETURN row, keys(row) LIMIT 5;

I don't like strings

LOAD CSV WITH HEADERS FROM 'http://data.neo4j.com/intro/movies/movies.csv'
AS row
RETURN row.title as title, toInteger(row.released) as released, row.tagline as tagline
ORDER BY released DESC LIMIT 10;

Plug

Your new bedside companion ... Cypher Reference Card

Trainer makes a mistake

So do not execute this ... unless you want to of course ...

LOAD CSV WITH HEADERS FROM 'http://data.neo4j.com/intro/movies/movies.csv' 
AS row
CREATE (m:Movie {title: row.title, released: toInteger(row.released), tagline: row.tagline})
RETURN m;

Schema

Neo4j has schema ... you didn't see that coming, right?

CREATE CONSTRAINT cu_Movie_title ON (m:Movie) ASSERT m.title IS UNIQUE;
CREATE CONSTRAINT cu_Person_name ON (p:Person) ASSERT p.name IS UNIQUE;
CREATE INDEX i_Movie_tagline FOR (m:Movie) ON (m.tagline);

CALL db.constraints();
CALL db.indexes();

Loading nodes

Movie

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'http://data.neo4j.com/intro/movies/movies.csv' AS row
CREATE (:Movie {title: row.title, released: toInteger(row.released), tagline: row.tagline});

if that doesn't work then listen to the trainer explain it first, then do this (Neo4j Browser only)

:auto
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'http://data.neo4j.com/intro/movies/movies.csv' AS row
CREATE (:Movie {title: row.title, released: toInteger(row.released), tagline: row.tagline});

Person

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'http://data.neo4j.com/intro/movies/people.csv' AS row
CREATE (:Person {name: row.name, born: toInteger(row.born)});

or

:auto
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'http://data.neo4j.com/intro/movies/people.csv' AS row
CREATE (:Person {name: row.name, born: toInteger(row.born)});

and verify (statement by statement)

MATCH (m:Movie) RETURN count(*);
MATCH (p:Person) RETURN count(*);
MATCH (m:Movie) WHERE m.title = "Something's Gotta Give" RETURN m;

Relationships

ACTED_IN Just did it once here, ignore the auto unless you're in the Neo4j Browser ...

:auto
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'http://data.neo4j.com/intro/movies/actors.csv' AS row
MATCH  (p:Person {name: row.person })
MATCH  (m:Movie  {title: row.movie})
MERGE (p)-[actedIn:ACTED_IN]->(m)
ON CREATE SET actedIn.roles = split(row.roles,';');

DIRECTED

:auto
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'http://data.neo4j.com/intro/movies/directors.csv' AS row
MATCH  (p:Person {name: row.person })
MATCH  (m:Movie  {title: row.movie})
MERGE (p)-[:DIRECTED]->(m)

And verify

MATCH (p:Person {name: "Tom Hanks"})-[a:ACTED_IN]->(m:Movie) RETURN p,a,m;