# estimating_variation - Gandhie/AICS-Project GitHub Wiki

Discussions of commit `f0d590c3d384f8ca92dd2b4ece7a15f54bc78793`

(Mehdi's code)

Amelie and Simon

- Where: variation of bounding boxes of targets and landmarks
- Functional relations: (i) there will be a high variation of targets/landmarks geometrically; (ii) actually there will be low variation of targets/landmarks geometrically because relations are much more restricted to particular objects and therefore restricted to particular locations
- Bounding boxes:
- Currently, normalisation against the image dimensions (ensures that all images are of the same dimension); the same two objects would appear spatially very different if the image is taken from close and from afar
- We project the x, y, w, h into a 100x100 mask, i.e. a matrix of 0 and 1
- To do:
- How similar are different relations in terms of landmarks and targets? Plot similar graphs for every target-landmark pair; what do they look like? This will be qualitative, observational evidence.

- Estimate the similarity between two graphs using cosine?
- Estimate the variation of targets/landmarks of a particular relation and then rank all prepositions by this variation.
- What variation?
~~Currently stdev is calculated for the entire x, y, h, w: hence also on the height and widths of objects; but objects are the same between the relations; use~~`prepositions_bboxes_mask[p].std(0).mean()`

? Mehdi made a mistake here?- Solution using cosine: create a mask with targets (or landmarks); compare targets pairwise with cosine; take the stdev of the resulting cosine similarities

`tar1 tar2 tar3 tar1 * * * tar2 * * tar3 *`

- What: visual similarity of targets and landmarks
- For each relation extract visual features of targets and visual features of landmarks
`prepositions_bboxes[p].append([v_target, v_landmark])`

- For every preposition p calculate cosine similarity between the visual features of every
`v_target`

(and the same for for`v_landmark`

) - Estimate the variation of cosine for
`v_tragets`

(and for v_landmarks separately) - Is there a difference between relations, i.e. are targets fo fuctional relations more similar than those of geometric ones?
- Rank the relations by the resulting targets (or landmark) variation: do we get clustering of functional vs geometric relations?

- For each relation extract visual features of targets and visual features of landmarks