Neo4J database assignment -- please see attachment for detail-- its a school assignment

About the client

United States

Est. budget $ 50
Posten on Oct 11, 2017
Time remaining 2
1 Introduction
For this assignment, you will be interacting with a set of JSON documents in
Neo4j. The JSON documents are the output of the Google Cloud Vision API
applied to images returned from a Flickr API query for interesting images related
to the text “New York”.
You will write code in a language of your choice (Python, Java, Bash, etc) to
load the JSON into the database and query it. You will submit your code, the
output of your queries, and a brief report describing your approach.
2 Install and setup Neo4j
2.1. Download Neo4j community edition for your platform from http://neo4j.
com/download/ and unzip it.
2.2. Follow the guide on using the Neo4j browser: http://neo4j.com/developer/
guide-neo4j-browser/. When prompted, set the password for user
“neo4j” to “cisc7610” so that eventually I can use your code on my
own neo4j database with the same settings. It will guide you through
the process of starting the server running locally on your computer and
connecting to it through your laptop.
2.3. Learn about Neo4j. Run the following cypher queries in the Neo4j web
interface, which generate sets of “slides” to get you familiar with Neo4j.
(a) :play intro
(b) :play concepts
(c) :play movie graph
2.4. You might also want to read this article to get started http://neo4j.com/
developer/graph-db-vs-rdbms/
2.5. After running the movie graph example, open the sidebar by clicking on
the three circles in the top left of the screen. Select the entire database by
clicking on the “*” button under “Node labels”. Take a screenshot of the
graph to be included in your report.
3 Download the data
3.1. Download the data to use for the project from http://m.mr-pc.org/t/
cisc7610/2017fa/homework2data.zip The zip file contains json documents
in directory data/json and the corresponding images in data/jpg
with the same filenames.
3.2. Extract the zip file
1
Figure 1: Schema to use for importing the JSON into neo4j
3.3. The zip file also contains a file called exampleJson.txt, which contains
a JSON document that has all of the fields that may be present in the
individual JSON documents. Note that this is not a valid JSON document
itself because in lists it only contains a single entry followed by “...”. Although
individual JSON documents may not contain all of these fields, if
they do, they will be in the structure shown in exampleJson.txt. The
conceptual schema to be used in importing the data into Neo4j is shown
in Figure 1.
3.4. The annotations from the Google Cloud Vision API are described in
the AnnotateImageResponse section of this page https://cloud.google.
com/vision/docs/reference/rest/v1/images/annotate. I have only
included the following annotations, however:
• landmarkAnnotations – identify geographical landmarks in photographs.
For the purposes of discussing entities in the database
schema, this will add Landmarks, each of which can have zero or more
Locationss.
• logoAnnotations – identify company’s logos in images. This will
add Logo entities to the schema.
• labelAnnotations – predict descriptive labels for images. This will
add Label entities to the schema.
2
• webDetection – predict the presence of web entities (things with
names) in images along with pages that contain the image and other
images that are similar. This will add WebEntity, Page, and Image
entities to the schema.
4 Write code to import the data into Neo4j
Importing JSON documents into Neo4j is rather complicated, but because of this,
querying it is straightforward. The basic idea is that different sub-structures
of a JSON document should be represented by different node types in the
database and these will all be created at the same time using a single, combined
query.
First, read this tutorial on importing JSON into Neo4j:
• https://neo4j.com/blog/cypher-load-json-from-url/
We will use the schema shown in Figure 1 for our data in Neo4j.
In order to import the JSON documents into Neo4j, we will write a query in
Neo4j’s Cypher language composed of several MERGE clauses and several FOREACH
loops. The MERGE clause finds an existing node or relationship matching a given
specification, or creates it if it doesn’t already exist. The syntax of the MERGE
clause is
MERGE (varName:LabelName {mustHaveProperty: value1})
ON CREATE SET varName.optionalProperty = value2
ON MATCH SET varName.optionalProperty = value3
In this case, the MERGE uses the name varName to refer to a node with a
label of LabelName and with a property called mustHaveProperty that has
a value of value1. If no such node exists, it is created, and the ON CREATE
SET clause is executed, setting the additional property of optionalProperty
to value2. If that node already exists, the ON MATCH SET clause is executed,
setting optionalProperty to value3. The ON CREATE SET and ON MATCH SET
clauses are both optional. Multiple property-value pairs can be specified by
separating them with commas.
The MERGE clause can also be used to find or create relationships
MERGE (n1 {property1: value1})
MERGE (n2 {property2: value2})
MERGE (n1)-[relVal:REL_LABEL {property3: value3}]->(n2)
ON CREATE SET relVar.optionalProperty = value4
ON MATCH SET relVar.optionalProperty = value5
This will create or find two nodes having the specified property-value pairs. The
nodes are called n1 and n2 for the purposes of this clause. The MERGE then finds
or creates a relationship between those two nodes with a property property3
having value value3, which it refers to as relVar. Again, if it is found the ON
3
Table 1: Number of entities that should be present after importing the data.
Label Count
Image 546
Label 201
Landmark 17
Location 26
Logo 2
Page 415
WebEntity 429
MATCH SET clause is executed and if it is created, the ON CREATE SET clause is
executed. Notice that several MERGE clauses can follow one another to build up
more complicated structures and relationships.
The FOREACH clause loops over the elements of an array, setting a variable to
be each one in turn. In the case of a JSON array, it will loop over objects in
the array allowing you to write a MERGE clause for each one. The syntax of the
FOREACH statement is
FOREACH (jsonArrayElement in json.fields.array |
MERGE (nodeName:LabelName {property: jsonArrayElement.value}))
Note the use of the extra parentheses around the whole statement except
for the FOREACH keyword and the pipe character | separating the loop setup
from the loop body. The JSON structure is navigated using field names separated
by periods without any quotation marks or other punctuation, so the
statement json.fields.array will index into the array object within the
fields object within the json object. This statement will loop over the array
json.fields.array setting a variable called jsonArrayElement to each value in
turn and will then find or create a node called nodeName with label LabelName and
property property equal to the property value from jsonArrayElement.
The following resources could be helpful in importing the JSON data into
neo4j
• http://neo4j.com/docs/developer-manual/current/cypher/clauses/
merge/
• http://neo4j.com/docs/developer-manual/current/cypher/clauses/
foreach/
• https://neo4j.com/developer/language-guides/
Write a query to import a single JSON document into Neo4j and run it on each
of the JSON documents. Make sure the query is performed programmatically in
your code so that you can re-run it if necessary. After importing the JSON into
Neo4j, the total number of each label should be as shown in Table 1.
4
5 Write code to query Neo4j
For each of the following, write a Neo4j query in the Cypher language to retrieve
the requested information. Make sure the query is performed in your code and
the results of the query are saved in your results file and clearly identified as to
which query produced them.
5.1. Count the total number of Images that were annotated by the Google API
(i.e., corresponding to one JSON file). It should be 100.
5.2. Count the total number of Images, Labels, Landmarks, Locations, Logos,
Pages, and WebEntitys in the database. You should get the counts shown
in Table 1. If you don’t, you may have a problem with your import query.
5.3. List the URLs of all of the Images that are associated with the Label with
an id of “/m/015kr” (which has the description “bridge”)
5.4. List the descriptions of all of the WebEntitys that are applied to at least
two of the same Images as the Label with an id of “/m/015kr” (which
has the description “bridge”) along with the count of how many images
they share. These two articles might be useful
• https://simonthordal.github.io/neo4j/2016/06/24/finding-common-neighbors-in-neo4j/
• https://stackoverflow.com/q/18138752/2037288
5.5. List the descriptions of the two Logos in the dataset and the URLs of the
Images they are found in.
5.6. List the URLs of Images associated with Landmarks that are not “New
York” (id “/m/059rby”) or “New York City” (id “/m/02nd ”) along with
the description of the Landmark that they are associated with
5.7. List the descriptions of the 10 Labels that have been applied to the most
Images along with the number of Images each has been applied to
5.8. List the URLs of the 10 Pages that are linked to the most Images through
the webEntities.pagesWithMatchingImages JSON property along with
the number of Images linked to each one.
5.9. List the URLs of all of the pairs of Images that appear together on at least
three Pages through the webEntities.pagesWithMatchingImages JSON
property.
6 Write a brief document describing your project
6.1. Describe the language that you implemented your code in
6.2. Include instructions for how to run your code to populate the database
and to query the database
6.3. Include the screenshot of the movie database graph
5
6.4. Include the results from each of the queries
6.5. Describe any problems that you ran into in the course of this project
7 Submit this homework
Submit the following via the dropbox on Blackboard
• Your writeup, including the screenshot and the results of each of the
queries.
• A zip file containing your source code, an executable, and instructions for
running your executable to populate the database and query it
6

Attachments


//