Training Bulk Loading HTTP Based - tomgeudens/practical-neo4j GitHub Wiki
Context: Cut-and-paste commands for the Bulk Loading training (http edition).
Prerequisites
This document will assume you have a Neo4j instance up-and-running. You should also be able to reach the below files over http (which is not the same as being able to download them). Check the first Inspection command in your Neo4j Browser. If it doesn't work, try the file based edition
What | Location |
---|---|
Movie Nodes | import/movies/movies.csv |
Person Nodes | import/movies/people.csv |
ACTED_IN Relationships | import/movies/actors.csv |
DIRECTED Relationships | import/movies/directors.csv |
Create a new database
Take a moment to appreciate how much difference the 4.x release makes here. In the old days you'd have to stop your instance, move the current data out of the way (if you wanted to keep it) and restart the instance whereas now you just ...
:USE system
CREATE DATABASE movies;
:USE movies
Inspection
How many lines in the files?
LOAD CSV FROM 'http://data.neo4j.com/intro/movies/movies.csv'
AS row
RETURN count(*);
LOAD CSV FROM 'http://data.neo4j.com/intro/movies/people.csv'
AS row
RETURN count(*);
LOAD CSV FROM 'http://data.neo4j.com/intro/movies/actors.csv'
AS row
RETURN count(*);
LOAD CSV FROM 'http://data.neo4j.com/intro/movies/directors.csv'
AS row
RETURN count(*);
What is in the files?
LOAD CSV FROM 'http://data.neo4j.com/intro/movies/movies.csv'
AS row
RETURN * LIMIT 5;
LOAD CSV FROM 'http://data.neo4j.com/intro/movies/people.csv'
AS row
RETURN * LIMIT 5;
LOAD CSV FROM 'http://data.neo4j.com/intro/movies/actors.csv'
AS row
RETURN * LIMIT 5;
LOAD CSV FROM 'http://data.neo4j.com/intro/movies/directors.csv'
AS row
RETURN * LIMIT 5;
What is in the files (again)?
LOAD CSV WITH HEADERS FROM 'http://data.neo4j.com/intro/movies/movies.csv'
AS row
RETURN row, keys(row) LIMIT 5;
LOAD CSV WITH HEADERS FROM 'http://data.neo4j.com/intro/movies/people.csv'
AS row
RETURN row, keys(row) LIMIT 5;
LOAD CSV WITH HEADERS FROM 'http://data.neo4j.com/intro/movies/actors.csv'
AS row
RETURN row, keys(row) LIMIT 5;
LOAD CSV WITH HEADERS FROM 'http://data.neo4j.com/intro/movies/directors.csv'
AS row
RETURN row, keys(row) LIMIT 5;
I don't like strings
LOAD CSV WITH HEADERS FROM 'http://data.neo4j.com/intro/movies/movies.csv'
AS row
RETURN row.title as title, toInteger(row.released) as released, row.tagline as tagline
ORDER BY released DESC LIMIT 10;
Plug
Your new bedside companion ... Cypher Reference Card
Trainer makes a mistake
So do not execute this ... unless you want to of course ...
LOAD CSV WITH HEADERS FROM 'http://data.neo4j.com/intro/movies/movies.csv'
AS row
CREATE (m:Movie {title: row.title, released: toInteger(row.released), tagline: row.tagline})
RETURN m;
Schema
Neo4j has schema ... you didn't see that coming, right?
CREATE CONSTRAINT cu_Movie_title ON (m:Movie) ASSERT m.title IS UNIQUE;
CREATE CONSTRAINT cu_Person_name ON (p:Person) ASSERT p.name IS UNIQUE;
CREATE INDEX i_Movie_tagline FOR (m:Movie) ON (m.tagline);
CALL db.constraints();
CALL db.indexes();
Loading nodes
Movie
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'http://data.neo4j.com/intro/movies/movies.csv' AS row
CREATE (:Movie {title: row.title, released: toInteger(row.released), tagline: row.tagline});
if that doesn't work then listen to the trainer explain it first, then do this (Neo4j Browser only)
:auto
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'http://data.neo4j.com/intro/movies/movies.csv' AS row
CREATE (:Movie {title: row.title, released: toInteger(row.released), tagline: row.tagline});
Person
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'http://data.neo4j.com/intro/movies/people.csv' AS row
CREATE (:Person {name: row.name, born: toInteger(row.born)});
or
:auto
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'http://data.neo4j.com/intro/movies/people.csv' AS row
CREATE (:Person {name: row.name, born: toInteger(row.born)});
and verify (statement by statement)
MATCH (m:Movie) RETURN count(*);
MATCH (p:Person) RETURN count(*);
MATCH (m:Movie) WHERE m.title = "Something's Gotta Give" RETURN m;
Relationships
ACTED_IN Just did it once here, ignore the auto unless you're in the Neo4j Browser ...
:auto
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'http://data.neo4j.com/intro/movies/actors.csv' AS row
MATCH (p:Person {name: row.person })
MATCH (m:Movie {title: row.movie})
MERGE (p)-[actedIn:ACTED_IN]->(m)
ON CREATE SET actedIn.roles = split(row.roles,';');
DIRECTED
:auto
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'http://data.neo4j.com/intro/movies/directors.csv' AS row
MATCH (p:Person {name: row.person })
MATCH (m:Movie {title: row.movie})
MERGE (p)-[:DIRECTED]->(m)
And verify
MATCH (p:Person {name: "Tom Hanks"})-[a:ACTED_IN]->(m:Movie) RETURN p,a,m;