FAQ
overflow

Great Answers to
Questions About Everything

QUESTION

How can we create a Vlookup function, similar to what we have in Excel? So if we have a table a and another b, we could add columns to table a, based on a common key to both, in order to create a table c.

Tables

{ asked by Murta }

ANSWER

Inspired in this solution from @Mr.Wizard. That is the function that I made in order to simulate Excel vLookup:

vLookup[data1_, data2_, pk1_:1, pk2_:1,null_:Null] := Module[{index,pickList},
    pickList=Complement[Range[Length[data2[[1]]]],{pk2}];
    SetAttributes[index,Listable];
    (index[#[[pk2]]]=#[[pickList]])&/@data2;
    index[_]=ConstantArray[null,Length[pickList]];
    Join[data1,index@data1[[All,pk1]],2]
]

This function worked just nice, let's do some time test. Doing:

a = RandomInteger[1000, {10000, 2}];
b1 = RandomInteger[1000, {10, 2}];
b2 = RandomInteger[1000, {100, 2}];
b3 = RandomInteger[1000, {1000, 2}];
b4 = RandomInteger[1000, {10000, 2}];

t1 = vLookup[a, b1] // AbsoluteTiming // #[[1]] &;
t2 = vLookup[a, b2] // AbsoluteTiming // #[[1]] &;
t3 = vLookup[a, b3] // AbsoluteTiming // #[[1]] &;
t4 = vLookup[a, b4] // AbsoluteTiming // #[[1]] &;

{t1, t2, t3, t4}
Ratios[{t1, t2, t3, t4}]

and get:

{0.016816, 0.013729, 0.018197, 0.056836}
{0.816, 1.325, 3.123}

Use index Downvalues is a nice way create a "indexed table" in Mathematica, so we can grow b quantity of lines in a order of magnitude with no linear impact in the algorithm time performance. Maybe it could be done with Dispatch and Rule list too.

It's important to state that when the b key value repeats, this vLookup takes the line of the last key value. Excel would return the first value.

Update 1

Instead of use:

(index[#[[pk2]]]=#[[pickList]])&/@data2;

I get a little better performace using

Scan[(index[#[[pk2]]]=#[[pickList]])&,data2];

Update 2

There is a new version with some new implementation:

  1. Option instead of arguments. I think that the code get a much better to read and control.
  2. Match key columns could be more then one, so you could use "Key1" -> {1, 2} and "Key2" -> {2, 3} as match columns, and not just one.
  3. Columns option, where you can control witch column you want to retrive in your vlookup operation.

Here is the code:

Options[vLookup2]={"Key1"->{1},"Key2"->{1},"NullValue"->Null,"Columns"->Automatic};

vLookup2[data1_,data2_,OptionsPattern[]]:=Module[{index,pickList,pKey1,pKey2,null,columns},
    null=OptionValue["NullValue"];
    pKey1=OptionValue["Key1"]//List//Flatten;
    pKey2=OptionValue["Key2"]//List//Flatten;
    columns=OptionValue["Columns"]//List//Flatten;
    pickList=If[columns==={Automatic},Complement[Range[Length[data2[[1]]]],pKey2],columns];
    Scan[(index[#[[pKey2]]]=#[[pickList]])&,data2];
    index[_]=ConstantArray[null,Length[pickList]];
    Join[data1,index/@data1[[All,pKey1]],2]
]

The use is very simple:

a=RandomInteger[5,{20,3}];
b=RandomInteger[5,{10,3}];
vLookup2[a,b,"Key1"-> {1,2},"Key2"-> {2,3},"NullValue"->Null,"Columns"->{2,3}]//TableForm

Update 3 (Dispatch with Rule)

I did not resisted to do the code using Dispatch. As follows:

Options[vLookup3]={"Key1"->{1},"Key2"->{1},"NullValue"->Null,"Columns"->Automatic};

vLookup3[data1_,data2_,OptionsPattern[]]:=Module[{index,pickList,pKey1,pKey2,null,columns,rule},
    null=OptionValue["NullValue"];
    pKey1=OptionValue["Key1"]//List//Flatten;
    pKey2=OptionValue["Key2"]//List//Flatten;
    columns=OptionValue["Columns"]//List//Flatten;
    pickList=If[columns==={Automatic},Complement[Range[Length[data2[[1]]]],pKey2],columns];
    rule=#[[pKey2]]->#[[pickList]]&/@data2;
    rule=Dispatch[Reverse[rule]~Join~{_-> ConstantArray[null,Length[pickList]]}];
    Join[data1,#/.rule&/@data1[[All,pKey1]],2]
]

Code tips for tuning and simplifications are always welcome!

{ answered by Murta }
Tweet