bash join - ghdrako/doc_snipets GitHub Wiki

Join using grep

$ cat joinlines.sh
inputfile=test1.csv
outputfile=joinedlines.csv
tmpfile2=tmpfile2
# patterns to match:
klm1=1,KLM,̎
klm5=5,KLM,̎
xyz1=1,XYZ,̎
xyz5=5,XYZ,̎
#output:
#klm1,xyz1
#klm5,xyz5
# step 1: match patterns with CSV file:
klm1line="`grep $klm1 $inputfile`"
klm5line="`grep $klm5 $inputfile`"
xyz1line="`grep $xyz1 $inputfile`"
# $xyz5 matches 2 lines (we want first line):
grep $xyz5 $inputfile > $tmpfile2
xyz5line=head -1 $tmpfile2
echo klm1line: $klm1line
echo klm5line: $klm5line
echo xyz1line: $xyz1line
echo xyz5line: $xyz5line
# step 3: create summary file:
echo $klm1line | tr -d \n > $outputfile
echo $xyz1line >> $outputfile
echo $klm5line | tr -d \n >> $outputfile
echo $xyz5line >> $outputfile
echo; echo

Output:
1,KLM,,1.4,,0.8,,1.2,,1.1,,,2.2,,,1.41,XYZ,,4.03,3.96,,3.99,,3.84,4.12,,,,4.04,,
5,KLM,,0.7,,0.8,,1.0,0.8,,0.5,,,1.1,,5,KLM,,0.03,,0.03,,0.04,0.04,,0.02,,,0.04,,5,XYZ,,4.73,,4.48,,4.49,4.40,,,4.59,,,4.63,

Join command

The join command in UNIX is a command line utility for joining lines of two files on a common field. It can be used to join two files by selecting fields within the line and joining the files on them. The result is written to standard output.

Syntax:

$join [OPTION] FILE1 FILE2

Join two files on first field

cat foodtypes.txt
1 Protein
2 Carbohydrate
3 Fat

cat foods.txt
1 Cheese 
2 Potato
3 Butter

#These files share a join field as the first field and can be joined.

join foodtypes foods.txt
1 Protein Cheese
2 Carbohydrate Potato
3 Fat Butter

join two files on different fields

cat wine.txt
Red Beaunes France
White Reisling Germany
Red Riocha Spain

cat reviews.txt
Beaunes Great!
Reisling Terrible!
Riocha Meh

# In wine.txt 2nd field and in reviews.txt is 1st field

join -1 2 -2 1 wine.txt reviews.txt
Beaunes Red France Great!
Reisling White Germany Terrible!
Riocha Red Spain Meh

If in both file comon field in the smae position -j

$join -1 2 -2 2 file1.txt file2.txt
# or 
$join -j2 file1.txt file2.txt

Join expects that files will be sorted before joining. Running join on unsorted files results in an error.

join -1 2 -2 1 <(sort -k 2 wine.txt) <(sort reviews.txt)
Beaunes Red France Great!
Reisling White Germany Terrible!
Riocha Red Spain Meh

In order to remove this error/warning then we have not sorted files to use --nocheck-order

Specify a field separator for joining

cat names.csv
1,John Smith,London
2,Arthur Dent, Newcastle
3,Sophie Smith,London

cat transactions.csv
£1234,Deposit,John Smith
£4534,Withdrawal,Arthur Dent
£4675,Deposit,Sophie Smith

# Using the -t option the comma can set as the delimiter.

join -1 2 -2 3 -t , names.csv transactions.csv
John Smith,1,London,£1234,Deposit
Arthur Dent,2, Newcastle,£4534,Withdrawal
Sophie Smith,3,London,£4675,Deposit

Specify the output format

To specify the order the list of fields are passed to -o. For this example this is -o 1.1,1.2,1.3,2.2,2.1. This formats the output in the order desired.

join -1 2 -2 3 -t , -o 1.1,1.2,1.3,2.2,2.1 names.csv transactions.csv
1,John Smith,London,Deposit,£1234
2,Arthur Dent, Newcastle,Withdrawal,£4534
3,Sophie Smith,London,Deposit,£4675

Ignoring case in compare

$cat file1.txt
A AAYUSH
B APAAR
C HEMANT
D KARTIK

$cat file2.txt
a 101
b 102
c 103
d 104

$join -i file1.txt file2.txt
A AAYUSH 101
B APAAR 102
C HEMANT 103
D KARTIK 104

Display unpairable fields (left/right jon)

$cat file1.txt
1 AAYUSH
2 APAAR
3 HEMANT
4 KARTIK
5 DEEPAK

$cat file2.txt
1 101
2 102
3 103
4 104

$join file1.txt file2.txt
1 AAYUSH 101
2 APAAR 102
3 HEMANT 103
4 KARTIK 104


$join file1.txt file2.txt -a 1
1 AAYUSH 101
2 APAAR 102
3 HEMANT 103
4 KARTIK 104
5 DEEPAK

Example

how to join two files using "Join" command with one common field

file1:
Toronto:12439755:1076359:July 1, 1867:6
Quebec City:7560592:1542056:July 1, 1867:5
Halifax:938134:55284:July 1, 1867:4
Fredericton:751400:72908:July 1, 1867:3
Winnipeg:1170300:647797:July 15, 1870:7
Victoria:4168123:944735:July 20, 1871:10
Charlottetown:137900:5660:July 1, 1873:2
Regina:996194:651036:September 1, 1905:8
Edmonton:3183312:661848:September 1, 1905:9
St.John's:517000:405212:March 31, 1949:1
Yellowknife:42800:1346106:July 15, 1870:11
Whitehorse:31200:482443:June 13, 1898:12
Iqaluit:29300:2093190:April 1, 1999:13
file2:
Alberta:9:AB
British Comumbia:10:BC
Manitoba:7:MB
New Brunswick:3:NB
Newfoundland and Labrador:1:NL
Northwest Territories:11:NT
Nova Scotia:4:NS
Nunavut:13:NU
Ontario:6:ON
Prince Edward Island:2:PE
Quebec:5:QC
Saskatchewan:8:SK
Yukon Territories:12:YT
bash-3.2$ join -t: -1 5 -2 2 -o 1.1 2.1 2.2 2.3 <(sort -t: -k5 file1) <(sort -t: -k2 file2)
Victoria:British Comumbia:10:BC
Yellowknife:Northwest Territories:11:NT
Whitehorse:Yukon Territories:12:YT
Iqaluit:Nunavut:13:NU
Charlottetown:Prince Edward Island:2:PE
Fredericton:New Brunswick:3:NB
Halifax:Nova Scotia:4:NS
Quebec City:Quebec:5:QC
Toronto:Ontario:6:ON
Winnipeg:Manitoba:7:MB
Regina:Saskatchewan:8:SK
Edmonton:Alberta:9:AB