bash join - ghdrako/doc_snipets GitHub Wiki
Join using grep
$ cat joinlines.sh
inputfile=test1.csv
outputfile=joinedlines.csv
tmpfile2=tmpfile2
# patterns to match:
klm1=1,KLM,̎
klm5=5,KLM,̎
xyz1=1,XYZ,̎
xyz5=5,XYZ,̎
#output:
#klm1,xyz1
#klm5,xyz5
# step 1: match patterns with CSV file:
klm1line="`grep $klm1 $inputfile`"
klm5line="`grep $klm5 $inputfile`"
xyz1line="`grep $xyz1 $inputfile`"
# $xyz5 matches 2 lines (we want first line):
grep $xyz5 $inputfile > $tmpfile2
xyz5line=head -1 $tmpfile2
echo klm1line: $klm1line
echo klm5line: $klm5line
echo xyz1line: $xyz1line
echo xyz5line: $xyz5line
# step 3: create summary file:
echo $klm1line | tr -d \n > $outputfile
echo $xyz1line >> $outputfile
echo $klm5line | tr -d \n >> $outputfile
echo $xyz5line >> $outputfile
echo; echo
Output:
1,KLM,,1.4,,0.8,,1.2,,1.1,,,2.2,,,1.41,XYZ,,4.03,3.96,,3.99,,3.84,4.12,,,,4.04,,
5,KLM,,0.7,,0.8,,1.0,0.8,,0.5,,,1.1,,5,KLM,,0.03,,0.03,,0.04,0.04,,0.02,,,0.04,,5,XYZ,,4.73,,4.48,,4.49,4.40,,,4.59,,,4.63,
Join command
The join command in UNIX is a command line utility for joining lines of two files on a common field. It can be used to join two files by selecting fields within the line and joining the files on them. The result is written to standard output.The files to be joined should be sorted according to the tagged field for the matchups to work properly.
Syntax:
$join [OPTION] FILE1 FILE2
Join two files on first field
cat foodtypes.txt
1 Protein
2 Carbohydrate
3 Fat
cat foods.txt
1 Cheese
2 Potato
3 Butter
#These files share a join field as the first field and can be joined.
join foodtypes foods.txt
1 Protein Cheese
2 Carbohydrate Potato
3 Fat Butter
join two files on different fields
cat wine.txt
Red Beaunes France
White Reisling Germany
Red Riocha Spain
cat reviews.txt
Beaunes Great!
Reisling Terrible!
Riocha Meh
# In wine.txt 2nd field and in reviews.txt is 1st field
join -1 2 -2 1 wine.txt reviews.txt
Beaunes Red France Great!
Reisling White Germany Terrible!
Riocha Red Spain Meh
If in both file comon field in the smae position -j
$join -1 2 -2 2 file1.txt file2.txt
# or
$join -j2 file1.txt file2.txt
Join expects that files will be sorted before joining. Running join on unsorted files results in an error.
join -1 2 -2 1 <(sort -k 2 wine.txt) <(sort reviews.txt)
Beaunes Red France Great!
Reisling White Germany Terrible!
Riocha Red Spain Meh
In order to remove this error/warning then we have not sorted files to use --nocheck-order
Specify a field separator for joining
cat names.csv
1,John Smith,London
2,Arthur Dent, Newcastle
3,Sophie Smith,London
cat transactions.csv
£1234,Deposit,John Smith
£4534,Withdrawal,Arthur Dent
£4675,Deposit,Sophie Smith
# Using the -t option the comma can set as the delimiter.
join -1 2 -2 3 -t , names.csv transactions.csv
John Smith,1,London,£1234,Deposit
Arthur Dent,2, Newcastle,£4534,Withdrawal
Sophie Smith,3,London,£4675,Deposit
Specify the output format
To specify the order the list of fields are passed to -o. For this example this is -o 1.1,1.2,1.3,2.2,2.1. This formats the output in the order desired.
join -1 2 -2 3 -t , -o 1.1,1.2,1.3,2.2,2.1 names.csv transactions.csv
1,John Smith,London,Deposit,£1234
2,Arthur Dent, Newcastle,Withdrawal,£4534
3,Sophie Smith,London,Deposit,£4675
Ignoring case in compare
$cat file1.txt
A AAYUSH
B APAAR
C HEMANT
D KARTIK
$cat file2.txt
a 101
b 102
c 103
d 104
$join -i file1.txt file2.txt
A AAYUSH 101
B APAAR 102
C HEMANT 103
D KARTIK 104
Display unpairable fields (left/right jon)
$cat file1.txt
1 AAYUSH
2 APAAR
3 HEMANT
4 KARTIK
5 DEEPAK
$cat file2.txt
1 101
2 102
3 103
4 104
$join file1.txt file2.txt
1 AAYUSH 101
2 APAAR 102
3 HEMANT 103
4 KARTIK 104
$join file1.txt file2.txt -a 1
1 AAYUSH 101
2 APAAR 102
3 HEMANT 103
4 KARTIK 104
5 DEEPAK
Example
how to join two files using "Join" command with one common field
file1:
Toronto:12439755:1076359:July 1, 1867:6
Quebec City:7560592:1542056:July 1, 1867:5
Halifax:938134:55284:July 1, 1867:4
Fredericton:751400:72908:July 1, 1867:3
Winnipeg:1170300:647797:July 15, 1870:7
Victoria:4168123:944735:July 20, 1871:10
Charlottetown:137900:5660:July 1, 1873:2
Regina:996194:651036:September 1, 1905:8
Edmonton:3183312:661848:September 1, 1905:9
St.John's:517000:405212:March 31, 1949:1
Yellowknife:42800:1346106:July 15, 1870:11
Whitehorse:31200:482443:June 13, 1898:12
Iqaluit:29300:2093190:April 1, 1999:13
file2:
Alberta:9:AB
British Comumbia:10:BC
Manitoba:7:MB
New Brunswick:3:NB
Newfoundland and Labrador:1:NL
Northwest Territories:11:NT
Nova Scotia:4:NS
Nunavut:13:NU
Ontario:6:ON
Prince Edward Island:2:PE
Quebec:5:QC
Saskatchewan:8:SK
Yukon Territories:12:YT
bash-3.2$ join -t: -1 5 -2 2 -o 1.1 2.1 2.2 2.3 <(sort -t: -k5 file1) <(sort -t: -k2 file2)
Victoria:British Comumbia:10:BC
Yellowknife:Northwest Territories:11:NT
Whitehorse:Yukon Territories:12:YT
Iqaluit:Nunavut:13:NU
Charlottetown:Prince Edward Island:2:PE
Fredericton:New Brunswick:3:NB
Halifax:Nova Scotia:4:NS
Quebec City:Quebec:5:QC
Toronto:Ontario:6:ON
Winnipeg:Manitoba:7:MB
Regina:Saskatchewan:8:SK
Edmonton:Alberta:9:AB