bash join - ghdrako/doc_snipets GitHub Wiki

Join using grep

$ cat joinlines.sh
inputfile=test1.csv
outputfile=joinedlines.csv
tmpfile2=tmpfile2
# patterns to match:
klm1=1,KLM,̎
klm5=5,KLM,̎
xyz1=1,XYZ,̎
xyz5=5,XYZ,̎
#output:
#klm1,xyz1
#klm5,xyz5
# step 1: match patterns with CSV file:
klm1line="`grep $klm1 $inputfile`"
klm5line="`grep $klm5 $inputfile`"
xyz1line="`grep $xyz1 $inputfile`"
# $xyz5 matches 2 lines (we want first line):
grep $xyz5 $inputfile > $tmpfile2
xyz5line=head -1 $tmpfile2
echo klm1line: $klm1line
echo klm5line: $klm5line
echo xyz1line: $xyz1line
echo xyz5line: $xyz5line
# step 3: create summary file:
echo $klm1line | tr -d \n > $outputfile
echo $xyz1line >> $outputfile
echo $klm5line | tr -d \n >> $outputfile
echo $xyz5line >> $outputfile
echo; echo

Output:
1,KLM,,1.4,,0.8,,1.2,,1.1,,,2.2,,,1.41,XYZ,,4.03,3.96,,3.99,,3.84,4.12,,,,4.04,,
5,KLM,,0.7,,0.8,,1.0,0.8,,0.5,,,1.1,,5,KLM,,0.03,,0.03,,0.04,0.04,,0.02,,,0.04,,5,XYZ,,4.73,,4.48,,4.49,4.40,,,4.59,,,4.63,

Join command

The join command in UNIX is a command line utility for joining lines of two files on a common field. It can be used to join two files by selecting fields within the line and joining the files on them. The result is written to standard output.The files to be joined should be sorted according to the tagged field for the matchups to work properly.

Syntax:

$join [OPTION] FILE1 FILE2

Join two files on first field

cat foodtypes.txt
1 Protein
2 Carbohydrate
3 Fat

cat foods.txt
1 Cheese 
2 Potato
3 Butter

#These files share a join field as the first field and can be joined.

join foodtypes foods.txt
1 Protein Cheese
2 Carbohydrate Potato
3 Fat Butter

join two files on different fields

cat wine.txt
Red Beaunes France
White Reisling Germany
Red Riocha Spain

cat reviews.txt
Beaunes Great!
Reisling Terrible!
Riocha Meh

# In wine.txt 2nd field and in reviews.txt is 1st field

join -1 2 -2 1 wine.txt reviews.txt
Beaunes Red France Great!
Reisling White Germany Terrible!
Riocha Red Spain Meh

If in both file comon field in the smae position -j

$join -1 2 -2 2 file1.txt file2.txt
# or 
$join -j2 file1.txt file2.txt

Join expects that files will be sorted before joining. Running join on unsorted files results in an error.

join -1 2 -2 1 <(sort -k 2 wine.txt) <(sort reviews.txt)
Beaunes Red France Great!
Reisling White Germany Terrible!
Riocha Red Spain Meh

In order to remove this error/warning then we have not sorted files to use --nocheck-order

Specify a field separator for joining

cat names.csv
1,John Smith,London
2,Arthur Dent, Newcastle
3,Sophie Smith,London

cat transactions.csv
£1234,Deposit,John Smith
£4534,Withdrawal,Arthur Dent
£4675,Deposit,Sophie Smith

# Using the -t option the comma can set as the delimiter.

join -1 2 -2 3 -t , names.csv transactions.csv
John Smith,1,London,£1234,Deposit
Arthur Dent,2, Newcastle,£4534,Withdrawal
Sophie Smith,3,London,£4675,Deposit

Specify the output format

To specify the order the list of fields are passed to -o. For this example this is -o 1.1,1.2,1.3,2.2,2.1. This formats the output in the order desired.

join -1 2 -2 3 -t , -o 1.1,1.2,1.3,2.2,2.1 names.csv transactions.csv
1,John Smith,London,Deposit,£1234
2,Arthur Dent, Newcastle,Withdrawal,£4534
3,Sophie Smith,London,Deposit,£4675

Ignoring case in compare

$cat file1.txt
A AAYUSH
B APAAR
C HEMANT
D KARTIK

$cat file2.txt
a 101
b 102
c 103
d 104

$join -i file1.txt file2.txt
A AAYUSH 101
B APAAR 102
C HEMANT 103
D KARTIK 104

Display unpairable fields (left/right jon)

$cat file1.txt
1 AAYUSH
2 APAAR
3 HEMANT
4 KARTIK
5 DEEPAK

$cat file2.txt
1 101
2 102
3 103
4 104

$join file1.txt file2.txt
1 AAYUSH 101
2 APAAR 102
3 HEMANT 103
4 KARTIK 104


$join file1.txt file2.txt -a 1
1 AAYUSH 101
2 APAAR 102
3 HEMANT 103
4 KARTIK 104
5 DEEPAK

Example

how to join two files using "Join" command with one common field

file1:
Toronto:12439755:1076359:July 1, 1867:6
Quebec City:7560592:1542056:July 1, 1867:5
Halifax:938134:55284:July 1, 1867:4
Fredericton:751400:72908:July 1, 1867:3
Winnipeg:1170300:647797:July 15, 1870:7
Victoria:4168123:944735:July 20, 1871:10
Charlottetown:137900:5660:July 1, 1873:2
Regina:996194:651036:September 1, 1905:8
Edmonton:3183312:661848:September 1, 1905:9
St.John's:517000:405212:March 31, 1949:1
Yellowknife:42800:1346106:July 15, 1870:11
Whitehorse:31200:482443:June 13, 1898:12
Iqaluit:29300:2093190:April 1, 1999:13
file2:
Alberta:9:AB
British Comumbia:10:BC
Manitoba:7:MB
New Brunswick:3:NB
Newfoundland and Labrador:1:NL
Northwest Territories:11:NT
Nova Scotia:4:NS
Nunavut:13:NU
Ontario:6:ON
Prince Edward Island:2:PE
Quebec:5:QC
Saskatchewan:8:SK
Yukon Territories:12:YT
bash-3.2$ join -t: -1 5 -2 2 -o 1.1 2.1 2.2 2.3 <(sort -t: -k5 file1) <(sort -t: -k2 file2)
Victoria:British Comumbia:10:BC
Yellowknife:Northwest Territories:11:NT
Whitehorse:Yukon Territories:12:YT
Iqaluit:Nunavut:13:NU
Charlottetown:Prince Edward Island:2:PE
Fredericton:New Brunswick:3:NB
Halifax:Nova Scotia:4:NS
Quebec City:Quebec:5:QC
Toronto:Ontario:6:ON
Winnipeg:Manitoba:7:MB
Regina:Saskatchewan:8:SK
Edmonton:Alberta:9:AB