Running multiple conditions at once in R









up vote
-3
down vote

favorite












I wrote a code to apply a function to a data frame input:



 set.seed(1234) 
n = 5000000
input <- as.matrix(data.frame(c1 = sample(1:10, n, replace = T), c2 = sample(1:10, n, replace = T), c3 = sample(1:10, n, replace = T), c4 = sample(1:10, n, replace = T)))

system.time(
test <- input %>%
split(1:nrow(input)) %>%
map(~ func1(.x, 2, 2, "test_1")) %>%
do.call("rbind", .))

## Here is the function used:

func1 <- function(dataC, PR, DB, MT) (c3 == newc1 && c4 == newc2))

if(c3 == newc1 && c4 == newc2)
mat_V[choiceC[1], choiceC[2]] <- NaN
## print(mat_V)


choiceC <- which(mat_V == max(mat_V, na.rm = TRUE), arr.ind = TRUE)
## print(choiceC)
## If there are several maximum values
if(nrow(choiceC) > 1)
choiceC <- choiceC[sample(1:nrow(choiceC), 1), ]


if(choiceC[1]==1 & choiceC[2]==1)

newC <- matrix(c(x = c1 - 1, y = c2 + 1), ncol = 2)

else if(choiceC[1]==1 & choiceC[2]==2)

newC <- matrix(c(x = c1, y = c2 + 1), ncol = 2)

else if(choiceC[1]==1 & choiceC[2]==3)

newC <- matrix(c(x = c1 + 1, y = c2 + 1), ncol = 2)

else if(choiceC[1]==2 & choiceC[2]==1)

newC <- matrix(c(x = c1 - 1, y = c2), ncol = 2)

else if(choiceC[1]==2 & choiceC[2]==3)

newC <- matrix(c(x = c1 + 1, y = c2), ncol = 2)

else if(choiceC[1]==3 & choiceC[2]==1)

newC <- matrix(c(x = c1 - 1, y = c2 - 1), ncol = 2)

else if(choiceC[1]==3 & choiceC[2]==2)

newC <- matrix(c(x = c1, y = c2 - 1), ncol = 2)

else if(choiceC[1]==3 & choiceC[2]==3)

newC <- matrix(c(x = c1 + 1, y = c2 - 1), ncol = 2)


newc1 <- as.vector(newC[,1])
newc2 <- as.vector(newC[,2])



return(newC)




The code works for small datasets but when the data frame contains more than 1 million rows, it is very slow. I think that there are many lines of code repeated in the function (e.g., condition if else) which decrease the speed. Are there ways to do all calculations in the function at once? I would really appreciate for any advice.










share|improve this question















This question has an open bounty worth +50
reputation from Pierre ending ending at 2018-11-16 17:03:57Z">in 2 days.


This question has not received enough attention.











  • 3




    I guess you would get more (and faster) answers if you provide a minimal reproducible example as code instead of asking to refactor your "production" code. Just my opinion...
    – R Yoda
    Nov 9 at 17:16










  • Please add: Which packages are you using? Is it OK to use data.table instead of data.frame? First impression for optimization: Looping over each row is very inefficient, vectorization would be faster (e. g. ifelse instead of if)
    – R Yoda
    Nov 9 at 17:25






  • 2




    While one could inspect your code line by line and try to improve it, it would be much better if you clearly described (in your question) what your function does in words; perhaps then it will become clear that it's better to completely rewrite it rather than to debate on individual things like ifelse vs if.
    – Julius Vainora
    Nov 9 at 19:31















up vote
-3
down vote

favorite












I wrote a code to apply a function to a data frame input:



 set.seed(1234) 
n = 5000000
input <- as.matrix(data.frame(c1 = sample(1:10, n, replace = T), c2 = sample(1:10, n, replace = T), c3 = sample(1:10, n, replace = T), c4 = sample(1:10, n, replace = T)))

system.time(
test <- input %>%
split(1:nrow(input)) %>%
map(~ func1(.x, 2, 2, "test_1")) %>%
do.call("rbind", .))

## Here is the function used:

func1 <- function(dataC, PR, DB, MT) (c3 == newc1 && c4 == newc2))

if(c3 == newc1 && c4 == newc2)
mat_V[choiceC[1], choiceC[2]] <- NaN
## print(mat_V)


choiceC <- which(mat_V == max(mat_V, na.rm = TRUE), arr.ind = TRUE)
## print(choiceC)
## If there are several maximum values
if(nrow(choiceC) > 1)
choiceC <- choiceC[sample(1:nrow(choiceC), 1), ]


if(choiceC[1]==1 & choiceC[2]==1)

newC <- matrix(c(x = c1 - 1, y = c2 + 1), ncol = 2)

else if(choiceC[1]==1 & choiceC[2]==2)

newC <- matrix(c(x = c1, y = c2 + 1), ncol = 2)

else if(choiceC[1]==1 & choiceC[2]==3)

newC <- matrix(c(x = c1 + 1, y = c2 + 1), ncol = 2)

else if(choiceC[1]==2 & choiceC[2]==1)

newC <- matrix(c(x = c1 - 1, y = c2), ncol = 2)

else if(choiceC[1]==2 & choiceC[2]==3)

newC <- matrix(c(x = c1 + 1, y = c2), ncol = 2)

else if(choiceC[1]==3 & choiceC[2]==1)

newC <- matrix(c(x = c1 - 1, y = c2 - 1), ncol = 2)

else if(choiceC[1]==3 & choiceC[2]==2)

newC <- matrix(c(x = c1, y = c2 - 1), ncol = 2)

else if(choiceC[1]==3 & choiceC[2]==3)

newC <- matrix(c(x = c1 + 1, y = c2 - 1), ncol = 2)


newc1 <- as.vector(newC[,1])
newc2 <- as.vector(newC[,2])



return(newC)




The code works for small datasets but when the data frame contains more than 1 million rows, it is very slow. I think that there are many lines of code repeated in the function (e.g., condition if else) which decrease the speed. Are there ways to do all calculations in the function at once? I would really appreciate for any advice.










share|improve this question















This question has an open bounty worth +50
reputation from Pierre ending ending at 2018-11-16 17:03:57Z">in 2 days.


This question has not received enough attention.











  • 3




    I guess you would get more (and faster) answers if you provide a minimal reproducible example as code instead of asking to refactor your "production" code. Just my opinion...
    – R Yoda
    Nov 9 at 17:16










  • Please add: Which packages are you using? Is it OK to use data.table instead of data.frame? First impression for optimization: Looping over each row is very inefficient, vectorization would be faster (e. g. ifelse instead of if)
    – R Yoda
    Nov 9 at 17:25






  • 2




    While one could inspect your code line by line and try to improve it, it would be much better if you clearly described (in your question) what your function does in words; perhaps then it will become clear that it's better to completely rewrite it rather than to debate on individual things like ifelse vs if.
    – Julius Vainora
    Nov 9 at 19:31













up vote
-3
down vote

favorite









up vote
-3
down vote

favorite











I wrote a code to apply a function to a data frame input:



 set.seed(1234) 
n = 5000000
input <- as.matrix(data.frame(c1 = sample(1:10, n, replace = T), c2 = sample(1:10, n, replace = T), c3 = sample(1:10, n, replace = T), c4 = sample(1:10, n, replace = T)))

system.time(
test <- input %>%
split(1:nrow(input)) %>%
map(~ func1(.x, 2, 2, "test_1")) %>%
do.call("rbind", .))

## Here is the function used:

func1 <- function(dataC, PR, DB, MT) (c3 == newc1 && c4 == newc2))

if(c3 == newc1 && c4 == newc2)
mat_V[choiceC[1], choiceC[2]] <- NaN
## print(mat_V)


choiceC <- which(mat_V == max(mat_V, na.rm = TRUE), arr.ind = TRUE)
## print(choiceC)
## If there are several maximum values
if(nrow(choiceC) > 1)
choiceC <- choiceC[sample(1:nrow(choiceC), 1), ]


if(choiceC[1]==1 & choiceC[2]==1)

newC <- matrix(c(x = c1 - 1, y = c2 + 1), ncol = 2)

else if(choiceC[1]==1 & choiceC[2]==2)

newC <- matrix(c(x = c1, y = c2 + 1), ncol = 2)

else if(choiceC[1]==1 & choiceC[2]==3)

newC <- matrix(c(x = c1 + 1, y = c2 + 1), ncol = 2)

else if(choiceC[1]==2 & choiceC[2]==1)

newC <- matrix(c(x = c1 - 1, y = c2), ncol = 2)

else if(choiceC[1]==2 & choiceC[2]==3)

newC <- matrix(c(x = c1 + 1, y = c2), ncol = 2)

else if(choiceC[1]==3 & choiceC[2]==1)

newC <- matrix(c(x = c1 - 1, y = c2 - 1), ncol = 2)

else if(choiceC[1]==3 & choiceC[2]==2)

newC <- matrix(c(x = c1, y = c2 - 1), ncol = 2)

else if(choiceC[1]==3 & choiceC[2]==3)

newC <- matrix(c(x = c1 + 1, y = c2 - 1), ncol = 2)


newc1 <- as.vector(newC[,1])
newc2 <- as.vector(newC[,2])



return(newC)




The code works for small datasets but when the data frame contains more than 1 million rows, it is very slow. I think that there are many lines of code repeated in the function (e.g., condition if else) which decrease the speed. Are there ways to do all calculations in the function at once? I would really appreciate for any advice.










share|improve this question













I wrote a code to apply a function to a data frame input:



 set.seed(1234) 
n = 5000000
input <- as.matrix(data.frame(c1 = sample(1:10, n, replace = T), c2 = sample(1:10, n, replace = T), c3 = sample(1:10, n, replace = T), c4 = sample(1:10, n, replace = T)))

system.time(
test <- input %>%
split(1:nrow(input)) %>%
map(~ func1(.x, 2, 2, "test_1")) %>%
do.call("rbind", .))

## Here is the function used:

func1 <- function(dataC, PR, DB, MT) (c3 == newc1 && c4 == newc2))

if(c3 == newc1 && c4 == newc2)
mat_V[choiceC[1], choiceC[2]] <- NaN
## print(mat_V)


choiceC <- which(mat_V == max(mat_V, na.rm = TRUE), arr.ind = TRUE)
## print(choiceC)
## If there are several maximum values
if(nrow(choiceC) > 1)
choiceC <- choiceC[sample(1:nrow(choiceC), 1), ]


if(choiceC[1]==1 & choiceC[2]==1)

newC <- matrix(c(x = c1 - 1, y = c2 + 1), ncol = 2)

else if(choiceC[1]==1 & choiceC[2]==2)

newC <- matrix(c(x = c1, y = c2 + 1), ncol = 2)

else if(choiceC[1]==1 & choiceC[2]==3)

newC <- matrix(c(x = c1 + 1, y = c2 + 1), ncol = 2)

else if(choiceC[1]==2 & choiceC[2]==1)

newC <- matrix(c(x = c1 - 1, y = c2), ncol = 2)

else if(choiceC[1]==2 & choiceC[2]==3)

newC <- matrix(c(x = c1 + 1, y = c2), ncol = 2)

else if(choiceC[1]==3 & choiceC[2]==1)

newC <- matrix(c(x = c1 - 1, y = c2 - 1), ncol = 2)

else if(choiceC[1]==3 & choiceC[2]==2)

newC <- matrix(c(x = c1, y = c2 - 1), ncol = 2)

else if(choiceC[1]==3 & choiceC[2]==3)

newC <- matrix(c(x = c1 + 1, y = c2 - 1), ncol = 2)


newc1 <- as.vector(newC[,1])
newc2 <- as.vector(newC[,2])



return(newC)




The code works for small datasets but when the data frame contains more than 1 million rows, it is very slow. I think that there are many lines of code repeated in the function (e.g., condition if else) which decrease the speed. Are there ways to do all calculations in the function at once? I would really appreciate for any advice.







r






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 7 at 15:33









Pierre

30111




30111






This question has an open bounty worth +50
reputation from Pierre ending ending at 2018-11-16 17:03:57Z">in 2 days.


This question has not received enough attention.








This question has an open bounty worth +50
reputation from Pierre ending ending at 2018-11-16 17:03:57Z">in 2 days.


This question has not received enough attention.









  • 3




    I guess you would get more (and faster) answers if you provide a minimal reproducible example as code instead of asking to refactor your "production" code. Just my opinion...
    – R Yoda
    Nov 9 at 17:16










  • Please add: Which packages are you using? Is it OK to use data.table instead of data.frame? First impression for optimization: Looping over each row is very inefficient, vectorization would be faster (e. g. ifelse instead of if)
    – R Yoda
    Nov 9 at 17:25






  • 2




    While one could inspect your code line by line and try to improve it, it would be much better if you clearly described (in your question) what your function does in words; perhaps then it will become clear that it's better to completely rewrite it rather than to debate on individual things like ifelse vs if.
    – Julius Vainora
    Nov 9 at 19:31













  • 3




    I guess you would get more (and faster) answers if you provide a minimal reproducible example as code instead of asking to refactor your "production" code. Just my opinion...
    – R Yoda
    Nov 9 at 17:16










  • Please add: Which packages are you using? Is it OK to use data.table instead of data.frame? First impression for optimization: Looping over each row is very inefficient, vectorization would be faster (e. g. ifelse instead of if)
    – R Yoda
    Nov 9 at 17:25






  • 2




    While one could inspect your code line by line and try to improve it, it would be much better if you clearly described (in your question) what your function does in words; perhaps then it will become clear that it's better to completely rewrite it rather than to debate on individual things like ifelse vs if.
    – Julius Vainora
    Nov 9 at 19:31








3




3




I guess you would get more (and faster) answers if you provide a minimal reproducible example as code instead of asking to refactor your "production" code. Just my opinion...
– R Yoda
Nov 9 at 17:16




I guess you would get more (and faster) answers if you provide a minimal reproducible example as code instead of asking to refactor your "production" code. Just my opinion...
– R Yoda
Nov 9 at 17:16












Please add: Which packages are you using? Is it OK to use data.table instead of data.frame? First impression for optimization: Looping over each row is very inefficient, vectorization would be faster (e. g. ifelse instead of if)
– R Yoda
Nov 9 at 17:25




Please add: Which packages are you using? Is it OK to use data.table instead of data.frame? First impression for optimization: Looping over each row is very inefficient, vectorization would be faster (e. g. ifelse instead of if)
– R Yoda
Nov 9 at 17:25




2




2




While one could inspect your code line by line and try to improve it, it would be much better if you clearly described (in your question) what your function does in words; perhaps then it will become clear that it's better to completely rewrite it rather than to debate on individual things like ifelse vs if.
– Julius Vainora
Nov 9 at 19:31





While one could inspect your code line by line and try to improve it, it would be much better if you clearly described (in your question) what your function does in words; perhaps then it will become clear that it's better to completely rewrite it rather than to debate on individual things like ifelse vs if.
– Julius Vainora
Nov 9 at 19:31













1 Answer
1






active

oldest

votes

















up vote
6
down vote













First a bit of tough love but I strongly encourage you to cover your bases, your code is a concentrate of bad practices and you'll get a huge ROI by spending a bit of time studying vectorisation etc... Consider also posting this on https://codereview.stackexchange.com/questions/tagged/r next time as it is a more appropriate question for there.



Your bottleneck is not the nested ifs but the inadequate use of expand.grid.



You create in your codes data frames through expand.grid, that you improperly call listC (they're not lists). Then this costly data.frame is only used for its number of rows, which you get with dim(listC)[1] which would be more idiomatic typed nrow(listC).



This value (dim(listC)[1]) can only be PR^2 or 3*PR in practice, so you could compute those first and just reuse them.



The nested ifs can be replaced with nested switch statements, more readable, and
by testing the first choice only once we're also more efficient.



It allows us to see that you forgot one condition in your code. See your improved code below.



When looking at it once it's more tidy, we see that we could actually replace it by simply newC <- c(c1 - 2 + choice[2], c2 + 2 - choice[1]).



Additional observations



  • comment your code, not for us, for you (and then for us when you decide to post a question)


  • c2 <- as.vector(dataC[2]) can be replaced by c2 <- dataC[[2]]

  • A matrix of 2 columns and one row can be built by t(c(1,2)) instead of matrix(c(x = 1, y = 2), ncol = 2), but if you're going to use as.vector on it in the end, do c(1,2) in the first place

  • the code could probably be optimized much further

modified code



func1 <- function(dataC, PR, DB, MT)

c1 <- dataC[[1]]
c2 <- dataC[[2]]
c3 <- dataC[[3]]
c4 <- dataC[[4]]

fun <- if(MT=="test_1") mean else if(MT=="test_2") harmonic.mean
fun2 <- function(size,mult)
fun(sample(1:10, size = size, replace = TRUE)) * mult

pr_sq <- PR^2
pr_3 <- 3*PR
sqrt_2_DB <- sqrt(2) * DB
V1 <- fun2(pr_sq, sqrt_2_DB)
V2 <- fun2(pr_3, DB)
V3 <- fun2(pr_sq, sqrt_2_DB)
V4 <- fun2(pr_3, DB)
V5 <- 0
V6 <- fun2(pr_3, DB)
V7 <- fun2(pr_sq, sqrt_2_DB)
V8 <- fun2(pr_3, DB)
V9 <- fun2(pr_sq, sqrt_2_DB)

inv <- 1/c(V1, V2, V3, V4, V6, V7, V8, V9)
tot <- sum(inv, na.rm = TRUE)
mat_V <- matrix(data = c(inv[1:4], V5, inv[5:8]) / tot,
nrow = 3, ncol = 3, byrow = TRUE)

newC <- NULL
while(is.null(newC)





share|improve this answer






















  • Note I corrected an error, I had left an extra sqrt(2)
    – Moody_Mudskipper
    yesterday










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53192662%2frunning-multiple-conditions-at-once-in-r%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
6
down vote













First a bit of tough love but I strongly encourage you to cover your bases, your code is a concentrate of bad practices and you'll get a huge ROI by spending a bit of time studying vectorisation etc... Consider also posting this on https://codereview.stackexchange.com/questions/tagged/r next time as it is a more appropriate question for there.



Your bottleneck is not the nested ifs but the inadequate use of expand.grid.



You create in your codes data frames through expand.grid, that you improperly call listC (they're not lists). Then this costly data.frame is only used for its number of rows, which you get with dim(listC)[1] which would be more idiomatic typed nrow(listC).



This value (dim(listC)[1]) can only be PR^2 or 3*PR in practice, so you could compute those first and just reuse them.



The nested ifs can be replaced with nested switch statements, more readable, and
by testing the first choice only once we're also more efficient.



It allows us to see that you forgot one condition in your code. See your improved code below.



When looking at it once it's more tidy, we see that we could actually replace it by simply newC <- c(c1 - 2 + choice[2], c2 + 2 - choice[1]).



Additional observations



  • comment your code, not for us, for you (and then for us when you decide to post a question)


  • c2 <- as.vector(dataC[2]) can be replaced by c2 <- dataC[[2]]

  • A matrix of 2 columns and one row can be built by t(c(1,2)) instead of matrix(c(x = 1, y = 2), ncol = 2), but if you're going to use as.vector on it in the end, do c(1,2) in the first place

  • the code could probably be optimized much further

modified code



func1 <- function(dataC, PR, DB, MT)

c1 <- dataC[[1]]
c2 <- dataC[[2]]
c3 <- dataC[[3]]
c4 <- dataC[[4]]

fun <- if(MT=="test_1") mean else if(MT=="test_2") harmonic.mean
fun2 <- function(size,mult)
fun(sample(1:10, size = size, replace = TRUE)) * mult

pr_sq <- PR^2
pr_3 <- 3*PR
sqrt_2_DB <- sqrt(2) * DB
V1 <- fun2(pr_sq, sqrt_2_DB)
V2 <- fun2(pr_3, DB)
V3 <- fun2(pr_sq, sqrt_2_DB)
V4 <- fun2(pr_3, DB)
V5 <- 0
V6 <- fun2(pr_3, DB)
V7 <- fun2(pr_sq, sqrt_2_DB)
V8 <- fun2(pr_3, DB)
V9 <- fun2(pr_sq, sqrt_2_DB)

inv <- 1/c(V1, V2, V3, V4, V6, V7, V8, V9)
tot <- sum(inv, na.rm = TRUE)
mat_V <- matrix(data = c(inv[1:4], V5, inv[5:8]) / tot,
nrow = 3, ncol = 3, byrow = TRUE)

newC <- NULL
while(is.null(newC)





share|improve this answer






















  • Note I corrected an error, I had left an extra sqrt(2)
    – Moody_Mudskipper
    yesterday














up vote
6
down vote













First a bit of tough love but I strongly encourage you to cover your bases, your code is a concentrate of bad practices and you'll get a huge ROI by spending a bit of time studying vectorisation etc... Consider also posting this on https://codereview.stackexchange.com/questions/tagged/r next time as it is a more appropriate question for there.



Your bottleneck is not the nested ifs but the inadequate use of expand.grid.



You create in your codes data frames through expand.grid, that you improperly call listC (they're not lists). Then this costly data.frame is only used for its number of rows, which you get with dim(listC)[1] which would be more idiomatic typed nrow(listC).



This value (dim(listC)[1]) can only be PR^2 or 3*PR in practice, so you could compute those first and just reuse them.



The nested ifs can be replaced with nested switch statements, more readable, and
by testing the first choice only once we're also more efficient.



It allows us to see that you forgot one condition in your code. See your improved code below.



When looking at it once it's more tidy, we see that we could actually replace it by simply newC <- c(c1 - 2 + choice[2], c2 + 2 - choice[1]).



Additional observations



  • comment your code, not for us, for you (and then for us when you decide to post a question)


  • c2 <- as.vector(dataC[2]) can be replaced by c2 <- dataC[[2]]

  • A matrix of 2 columns and one row can be built by t(c(1,2)) instead of matrix(c(x = 1, y = 2), ncol = 2), but if you're going to use as.vector on it in the end, do c(1,2) in the first place

  • the code could probably be optimized much further

modified code



func1 <- function(dataC, PR, DB, MT)

c1 <- dataC[[1]]
c2 <- dataC[[2]]
c3 <- dataC[[3]]
c4 <- dataC[[4]]

fun <- if(MT=="test_1") mean else if(MT=="test_2") harmonic.mean
fun2 <- function(size,mult)
fun(sample(1:10, size = size, replace = TRUE)) * mult

pr_sq <- PR^2
pr_3 <- 3*PR
sqrt_2_DB <- sqrt(2) * DB
V1 <- fun2(pr_sq, sqrt_2_DB)
V2 <- fun2(pr_3, DB)
V3 <- fun2(pr_sq, sqrt_2_DB)
V4 <- fun2(pr_3, DB)
V5 <- 0
V6 <- fun2(pr_3, DB)
V7 <- fun2(pr_sq, sqrt_2_DB)
V8 <- fun2(pr_3, DB)
V9 <- fun2(pr_sq, sqrt_2_DB)

inv <- 1/c(V1, V2, V3, V4, V6, V7, V8, V9)
tot <- sum(inv, na.rm = TRUE)
mat_V <- matrix(data = c(inv[1:4], V5, inv[5:8]) / tot,
nrow = 3, ncol = 3, byrow = TRUE)

newC <- NULL
while(is.null(newC)





share|improve this answer






















  • Note I corrected an error, I had left an extra sqrt(2)
    – Moody_Mudskipper
    yesterday












up vote
6
down vote










up vote
6
down vote









First a bit of tough love but I strongly encourage you to cover your bases, your code is a concentrate of bad practices and you'll get a huge ROI by spending a bit of time studying vectorisation etc... Consider also posting this on https://codereview.stackexchange.com/questions/tagged/r next time as it is a more appropriate question for there.



Your bottleneck is not the nested ifs but the inadequate use of expand.grid.



You create in your codes data frames through expand.grid, that you improperly call listC (they're not lists). Then this costly data.frame is only used for its number of rows, which you get with dim(listC)[1] which would be more idiomatic typed nrow(listC).



This value (dim(listC)[1]) can only be PR^2 or 3*PR in practice, so you could compute those first and just reuse them.



The nested ifs can be replaced with nested switch statements, more readable, and
by testing the first choice only once we're also more efficient.



It allows us to see that you forgot one condition in your code. See your improved code below.



When looking at it once it's more tidy, we see that we could actually replace it by simply newC <- c(c1 - 2 + choice[2], c2 + 2 - choice[1]).



Additional observations



  • comment your code, not for us, for you (and then for us when you decide to post a question)


  • c2 <- as.vector(dataC[2]) can be replaced by c2 <- dataC[[2]]

  • A matrix of 2 columns and one row can be built by t(c(1,2)) instead of matrix(c(x = 1, y = 2), ncol = 2), but if you're going to use as.vector on it in the end, do c(1,2) in the first place

  • the code could probably be optimized much further

modified code



func1 <- function(dataC, PR, DB, MT)

c1 <- dataC[[1]]
c2 <- dataC[[2]]
c3 <- dataC[[3]]
c4 <- dataC[[4]]

fun <- if(MT=="test_1") mean else if(MT=="test_2") harmonic.mean
fun2 <- function(size,mult)
fun(sample(1:10, size = size, replace = TRUE)) * mult

pr_sq <- PR^2
pr_3 <- 3*PR
sqrt_2_DB <- sqrt(2) * DB
V1 <- fun2(pr_sq, sqrt_2_DB)
V2 <- fun2(pr_3, DB)
V3 <- fun2(pr_sq, sqrt_2_DB)
V4 <- fun2(pr_3, DB)
V5 <- 0
V6 <- fun2(pr_3, DB)
V7 <- fun2(pr_sq, sqrt_2_DB)
V8 <- fun2(pr_3, DB)
V9 <- fun2(pr_sq, sqrt_2_DB)

inv <- 1/c(V1, V2, V3, V4, V6, V7, V8, V9)
tot <- sum(inv, na.rm = TRUE)
mat_V <- matrix(data = c(inv[1:4], V5, inv[5:8]) / tot,
nrow = 3, ncol = 3, byrow = TRUE)

newC <- NULL
while(is.null(newC)





share|improve this answer














First a bit of tough love but I strongly encourage you to cover your bases, your code is a concentrate of bad practices and you'll get a huge ROI by spending a bit of time studying vectorisation etc... Consider also posting this on https://codereview.stackexchange.com/questions/tagged/r next time as it is a more appropriate question for there.



Your bottleneck is not the nested ifs but the inadequate use of expand.grid.



You create in your codes data frames through expand.grid, that you improperly call listC (they're not lists). Then this costly data.frame is only used for its number of rows, which you get with dim(listC)[1] which would be more idiomatic typed nrow(listC).



This value (dim(listC)[1]) can only be PR^2 or 3*PR in practice, so you could compute those first and just reuse them.



The nested ifs can be replaced with nested switch statements, more readable, and
by testing the first choice only once we're also more efficient.



It allows us to see that you forgot one condition in your code. See your improved code below.



When looking at it once it's more tidy, we see that we could actually replace it by simply newC <- c(c1 - 2 + choice[2], c2 + 2 - choice[1]).



Additional observations



  • comment your code, not for us, for you (and then for us when you decide to post a question)


  • c2 <- as.vector(dataC[2]) can be replaced by c2 <- dataC[[2]]

  • A matrix of 2 columns and one row can be built by t(c(1,2)) instead of matrix(c(x = 1, y = 2), ncol = 2), but if you're going to use as.vector on it in the end, do c(1,2) in the first place

  • the code could probably be optimized much further

modified code



func1 <- function(dataC, PR, DB, MT)

c1 <- dataC[[1]]
c2 <- dataC[[2]]
c3 <- dataC[[3]]
c4 <- dataC[[4]]

fun <- if(MT=="test_1") mean else if(MT=="test_2") harmonic.mean
fun2 <- function(size,mult)
fun(sample(1:10, size = size, replace = TRUE)) * mult

pr_sq <- PR^2
pr_3 <- 3*PR
sqrt_2_DB <- sqrt(2) * DB
V1 <- fun2(pr_sq, sqrt_2_DB)
V2 <- fun2(pr_3, DB)
V3 <- fun2(pr_sq, sqrt_2_DB)
V4 <- fun2(pr_3, DB)
V5 <- 0
V6 <- fun2(pr_3, DB)
V7 <- fun2(pr_sq, sqrt_2_DB)
V8 <- fun2(pr_3, DB)
V9 <- fun2(pr_sq, sqrt_2_DB)

inv <- 1/c(V1, V2, V3, V4, V6, V7, V8, V9)
tot <- sum(inv, na.rm = TRUE)
mat_V <- matrix(data = c(inv[1:4], V5, inv[5:8]) / tot,
nrow = 3, ncol = 3, byrow = TRUE)

newC <- NULL
while(is.null(newC)






share|improve this answer














share|improve this answer



share|improve this answer








edited 4 hours ago

























answered Nov 10 at 11:56









Moody_Mudskipper

20.2k32458




20.2k32458











  • Note I corrected an error, I had left an extra sqrt(2)
    – Moody_Mudskipper
    yesterday
















  • Note I corrected an error, I had left an extra sqrt(2)
    – Moody_Mudskipper
    yesterday















Note I corrected an error, I had left an extra sqrt(2)
– Moody_Mudskipper
yesterday




Note I corrected an error, I had left an extra sqrt(2)
– Moody_Mudskipper
yesterday

















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53192662%2frunning-multiple-conditions-at-once-in-r%23new-answer', 'question_page');

);

Post as a guest














































































Popular posts from this blog

Top Tejano songwriter Luis Silva dead of heart attack at 64

ReactJS Fetched API data displays live - need Data displayed static

Evgeni Malkin