Efficient way to storage large lists in a file(s)












7















Let us assume that I need to storage in a file, or files, data with the following form



alist = 
Table[Table[RandomReal[{0, 1}, 2], {i, 1, 200}], {j, 1, 10000}];
blist =
Table[Table[{RandomReal[{0, 1}, 6], RandomReal,
RandomReal[{0, 1}, 6], RandomReal[{0, 1}, 6]}, {i, 1, 200}], {j,
1, 10000}];
clist = RandomReal[{0, 1}, 10000];


In a few seconds, Mathematica is able to create these lists.



However, when I try to export this data to a .txt file, using



Export["C:\pathname\abclists.txt", {alist, blist, clist}];


My computer starts to freeze, and I have to abort the evalutation. When I check the result, Mathematica has created a 600MB .txt file.



Is there a more efficient way to storage data in a file or files? I have no preference for the file type, as long as I can easily rebuild the original data, preserving the list levels.










share|improve this question




















  • 1





    Have you tried DumpSave?

    – Roman
    2 days ago
















7















Let us assume that I need to storage in a file, or files, data with the following form



alist = 
Table[Table[RandomReal[{0, 1}, 2], {i, 1, 200}], {j, 1, 10000}];
blist =
Table[Table[{RandomReal[{0, 1}, 6], RandomReal,
RandomReal[{0, 1}, 6], RandomReal[{0, 1}, 6]}, {i, 1, 200}], {j,
1, 10000}];
clist = RandomReal[{0, 1}, 10000];


In a few seconds, Mathematica is able to create these lists.



However, when I try to export this data to a .txt file, using



Export["C:\pathname\abclists.txt", {alist, blist, clist}];


My computer starts to freeze, and I have to abort the evalutation. When I check the result, Mathematica has created a 600MB .txt file.



Is there a more efficient way to storage data in a file or files? I have no preference for the file type, as long as I can easily rebuild the original data, preserving the list levels.










share|improve this question




















  • 1





    Have you tried DumpSave?

    – Roman
    2 days ago














7












7








7








Let us assume that I need to storage in a file, or files, data with the following form



alist = 
Table[Table[RandomReal[{0, 1}, 2], {i, 1, 200}], {j, 1, 10000}];
blist =
Table[Table[{RandomReal[{0, 1}, 6], RandomReal,
RandomReal[{0, 1}, 6], RandomReal[{0, 1}, 6]}, {i, 1, 200}], {j,
1, 10000}];
clist = RandomReal[{0, 1}, 10000];


In a few seconds, Mathematica is able to create these lists.



However, when I try to export this data to a .txt file, using



Export["C:\pathname\abclists.txt", {alist, blist, clist}];


My computer starts to freeze, and I have to abort the evalutation. When I check the result, Mathematica has created a 600MB .txt file.



Is there a more efficient way to storage data in a file or files? I have no preference for the file type, as long as I can easily rebuild the original data, preserving the list levels.










share|improve this question
















Let us assume that I need to storage in a file, or files, data with the following form



alist = 
Table[Table[RandomReal[{0, 1}, 2], {i, 1, 200}], {j, 1, 10000}];
blist =
Table[Table[{RandomReal[{0, 1}, 6], RandomReal,
RandomReal[{0, 1}, 6], RandomReal[{0, 1}, 6]}, {i, 1, 200}], {j,
1, 10000}];
clist = RandomReal[{0, 1}, 10000];


In a few seconds, Mathematica is able to create these lists.



However, when I try to export this data to a .txt file, using



Export["C:\pathname\abclists.txt", {alist, blist, clist}];


My computer starts to freeze, and I have to abort the evalutation. When I check the result, Mathematica has created a 600MB .txt file.



Is there a more efficient way to storage data in a file or files? I have no preference for the file type, as long as I can easily rebuild the original data, preserving the list levels.







export compression storage






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 2 days ago









gwr

7,52122457




7,52122457










asked 2 days ago









An old man in the sea.An old man in the sea.

937718




937718








  • 1





    Have you tried DumpSave?

    – Roman
    2 days ago














  • 1





    Have you tried DumpSave?

    – Roman
    2 days ago








1




1





Have you tried DumpSave?

– Roman
2 days ago





Have you tried DumpSave?

– Roman
2 days ago










1 Answer
1






active

oldest

votes


















8














Using ByteCount will tell you to expect more than 1 GB:



ByteCount @ { alist, blist, clist }

(* 1 136 880 448 *)


Compress



I would thus try Compress:



cexpr = Compress @ { alist, blist, clist };
(* Export["pathname\data.m", cexpr ]; *)
ByteCount @ cexpr

(* 437 359 280 *)


I am getting a 433 MB file (which roughly matches ByteCount). You can Uncompress the expression after loading.



BinarySerialize



Another possibility as of Version 11.1 or later is BinarySerialize:



bexpr = BinarySerialize @ { alist, blist, clist };
ByteCount @ bexpr

(* 378 170 128 *)


So we are down to about 378 MB (the file is 312 MB on my computer). You can use BinaryDeserialize to get the original expression again (see below for explicit instructions for writing/reading binary data).



If we give the option PerformanceGoal -> "Size"



bexpr = BinarySerialize[ {alist,blist,clist}, PerformanceGoal -> "Size" ];
ByteCount @ bexpr

(* 327 494 245 *)


we are down to about 327 MB.



Writing and Reading Binary Data



The documentation tells you how to write/read binary data:



stream = OpenWrite[ "pathname\data.mx", BinaryFormat -> True ];
BinaryWrite[ stream, bexpr ];
Close @ stream;


Reading the data:



data = BinaryDeserialize @ ByteArray @ BinaryReadList[ "pathname\data.mx", "Byte" ]; 





share|improve this answer


























  • Thanks for your answer. To load the data, would we use ReadList or Import? If it's with import, could you show me how? The documentation talks about 'elements', but I don't know what they are, and how they relate to our compressed data. I tried using ReadList, but I had to abort the evaluation...

    – An old man in the sea.
    2 days ago








  • 1





    Considering that the structure contains $4.2times10^7$ fully random elements with 8 bytes each, I don't think you can do much better, even with an approach that stores the structure of the data more efficiently

    – Lukas Lang
    2 days ago













  • @LukasLang Yes, that should indeed be mentioned: The random data here are a "worst case" - but the problem still is a general one.

    – gwr
    2 days ago






  • 1





    Thanks gwr. I've tried BinarySerialize, but Mathematica doesn't recognize it as a built-in function... Also, I've noticed that if we separate the file into 3, for each list, then the combined time for import, export, compress, and uncompress are much lower in total...

    – An old man in the sea.
    2 days ago











  • @Anoldmaninthesea. One has to play around with it. I added the information, that BinarySerialize is only available for Version 11.1 or later.

    – gwr
    2 days ago











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "387"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f189136%2fefficient-way-to-storage-large-lists-in-a-files%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









8














Using ByteCount will tell you to expect more than 1 GB:



ByteCount @ { alist, blist, clist }

(* 1 136 880 448 *)


Compress



I would thus try Compress:



cexpr = Compress @ { alist, blist, clist };
(* Export["pathname\data.m", cexpr ]; *)
ByteCount @ cexpr

(* 437 359 280 *)


I am getting a 433 MB file (which roughly matches ByteCount). You can Uncompress the expression after loading.



BinarySerialize



Another possibility as of Version 11.1 or later is BinarySerialize:



bexpr = BinarySerialize @ { alist, blist, clist };
ByteCount @ bexpr

(* 378 170 128 *)


So we are down to about 378 MB (the file is 312 MB on my computer). You can use BinaryDeserialize to get the original expression again (see below for explicit instructions for writing/reading binary data).



If we give the option PerformanceGoal -> "Size"



bexpr = BinarySerialize[ {alist,blist,clist}, PerformanceGoal -> "Size" ];
ByteCount @ bexpr

(* 327 494 245 *)


we are down to about 327 MB.



Writing and Reading Binary Data



The documentation tells you how to write/read binary data:



stream = OpenWrite[ "pathname\data.mx", BinaryFormat -> True ];
BinaryWrite[ stream, bexpr ];
Close @ stream;


Reading the data:



data = BinaryDeserialize @ ByteArray @ BinaryReadList[ "pathname\data.mx", "Byte" ]; 





share|improve this answer


























  • Thanks for your answer. To load the data, would we use ReadList or Import? If it's with import, could you show me how? The documentation talks about 'elements', but I don't know what they are, and how they relate to our compressed data. I tried using ReadList, but I had to abort the evaluation...

    – An old man in the sea.
    2 days ago








  • 1





    Considering that the structure contains $4.2times10^7$ fully random elements with 8 bytes each, I don't think you can do much better, even with an approach that stores the structure of the data more efficiently

    – Lukas Lang
    2 days ago













  • @LukasLang Yes, that should indeed be mentioned: The random data here are a "worst case" - but the problem still is a general one.

    – gwr
    2 days ago






  • 1





    Thanks gwr. I've tried BinarySerialize, but Mathematica doesn't recognize it as a built-in function... Also, I've noticed that if we separate the file into 3, for each list, then the combined time for import, export, compress, and uncompress are much lower in total...

    – An old man in the sea.
    2 days ago











  • @Anoldmaninthesea. One has to play around with it. I added the information, that BinarySerialize is only available for Version 11.1 or later.

    – gwr
    2 days ago
















8














Using ByteCount will tell you to expect more than 1 GB:



ByteCount @ { alist, blist, clist }

(* 1 136 880 448 *)


Compress



I would thus try Compress:



cexpr = Compress @ { alist, blist, clist };
(* Export["pathname\data.m", cexpr ]; *)
ByteCount @ cexpr

(* 437 359 280 *)


I am getting a 433 MB file (which roughly matches ByteCount). You can Uncompress the expression after loading.



BinarySerialize



Another possibility as of Version 11.1 or later is BinarySerialize:



bexpr = BinarySerialize @ { alist, blist, clist };
ByteCount @ bexpr

(* 378 170 128 *)


So we are down to about 378 MB (the file is 312 MB on my computer). You can use BinaryDeserialize to get the original expression again (see below for explicit instructions for writing/reading binary data).



If we give the option PerformanceGoal -> "Size"



bexpr = BinarySerialize[ {alist,blist,clist}, PerformanceGoal -> "Size" ];
ByteCount @ bexpr

(* 327 494 245 *)


we are down to about 327 MB.



Writing and Reading Binary Data



The documentation tells you how to write/read binary data:



stream = OpenWrite[ "pathname\data.mx", BinaryFormat -> True ];
BinaryWrite[ stream, bexpr ];
Close @ stream;


Reading the data:



data = BinaryDeserialize @ ByteArray @ BinaryReadList[ "pathname\data.mx", "Byte" ]; 





share|improve this answer


























  • Thanks for your answer. To load the data, would we use ReadList or Import? If it's with import, could you show me how? The documentation talks about 'elements', but I don't know what they are, and how they relate to our compressed data. I tried using ReadList, but I had to abort the evaluation...

    – An old man in the sea.
    2 days ago








  • 1





    Considering that the structure contains $4.2times10^7$ fully random elements with 8 bytes each, I don't think you can do much better, even with an approach that stores the structure of the data more efficiently

    – Lukas Lang
    2 days ago













  • @LukasLang Yes, that should indeed be mentioned: The random data here are a "worst case" - but the problem still is a general one.

    – gwr
    2 days ago






  • 1





    Thanks gwr. I've tried BinarySerialize, but Mathematica doesn't recognize it as a built-in function... Also, I've noticed that if we separate the file into 3, for each list, then the combined time for import, export, compress, and uncompress are much lower in total...

    – An old man in the sea.
    2 days ago











  • @Anoldmaninthesea. One has to play around with it. I added the information, that BinarySerialize is only available for Version 11.1 or later.

    – gwr
    2 days ago














8












8








8







Using ByteCount will tell you to expect more than 1 GB:



ByteCount @ { alist, blist, clist }

(* 1 136 880 448 *)


Compress



I would thus try Compress:



cexpr = Compress @ { alist, blist, clist };
(* Export["pathname\data.m", cexpr ]; *)
ByteCount @ cexpr

(* 437 359 280 *)


I am getting a 433 MB file (which roughly matches ByteCount). You can Uncompress the expression after loading.



BinarySerialize



Another possibility as of Version 11.1 or later is BinarySerialize:



bexpr = BinarySerialize @ { alist, blist, clist };
ByteCount @ bexpr

(* 378 170 128 *)


So we are down to about 378 MB (the file is 312 MB on my computer). You can use BinaryDeserialize to get the original expression again (see below for explicit instructions for writing/reading binary data).



If we give the option PerformanceGoal -> "Size"



bexpr = BinarySerialize[ {alist,blist,clist}, PerformanceGoal -> "Size" ];
ByteCount @ bexpr

(* 327 494 245 *)


we are down to about 327 MB.



Writing and Reading Binary Data



The documentation tells you how to write/read binary data:



stream = OpenWrite[ "pathname\data.mx", BinaryFormat -> True ];
BinaryWrite[ stream, bexpr ];
Close @ stream;


Reading the data:



data = BinaryDeserialize @ ByteArray @ BinaryReadList[ "pathname\data.mx", "Byte" ]; 





share|improve this answer















Using ByteCount will tell you to expect more than 1 GB:



ByteCount @ { alist, blist, clist }

(* 1 136 880 448 *)


Compress



I would thus try Compress:



cexpr = Compress @ { alist, blist, clist };
(* Export["pathname\data.m", cexpr ]; *)
ByteCount @ cexpr

(* 437 359 280 *)


I am getting a 433 MB file (which roughly matches ByteCount). You can Uncompress the expression after loading.



BinarySerialize



Another possibility as of Version 11.1 or later is BinarySerialize:



bexpr = BinarySerialize @ { alist, blist, clist };
ByteCount @ bexpr

(* 378 170 128 *)


So we are down to about 378 MB (the file is 312 MB on my computer). You can use BinaryDeserialize to get the original expression again (see below for explicit instructions for writing/reading binary data).



If we give the option PerformanceGoal -> "Size"



bexpr = BinarySerialize[ {alist,blist,clist}, PerformanceGoal -> "Size" ];
ByteCount @ bexpr

(* 327 494 245 *)


we are down to about 327 MB.



Writing and Reading Binary Data



The documentation tells you how to write/read binary data:



stream = OpenWrite[ "pathname\data.mx", BinaryFormat -> True ];
BinaryWrite[ stream, bexpr ];
Close @ stream;


Reading the data:



data = BinaryDeserialize @ ByteArray @ BinaryReadList[ "pathname\data.mx", "Byte" ]; 






share|improve this answer














share|improve this answer



share|improve this answer








edited 2 days ago

























answered 2 days ago









gwrgwr

7,52122457




7,52122457













  • Thanks for your answer. To load the data, would we use ReadList or Import? If it's with import, could you show me how? The documentation talks about 'elements', but I don't know what they are, and how they relate to our compressed data. I tried using ReadList, but I had to abort the evaluation...

    – An old man in the sea.
    2 days ago








  • 1





    Considering that the structure contains $4.2times10^7$ fully random elements with 8 bytes each, I don't think you can do much better, even with an approach that stores the structure of the data more efficiently

    – Lukas Lang
    2 days ago













  • @LukasLang Yes, that should indeed be mentioned: The random data here are a "worst case" - but the problem still is a general one.

    – gwr
    2 days ago






  • 1





    Thanks gwr. I've tried BinarySerialize, but Mathematica doesn't recognize it as a built-in function... Also, I've noticed that if we separate the file into 3, for each list, then the combined time for import, export, compress, and uncompress are much lower in total...

    – An old man in the sea.
    2 days ago











  • @Anoldmaninthesea. One has to play around with it. I added the information, that BinarySerialize is only available for Version 11.1 or later.

    – gwr
    2 days ago



















  • Thanks for your answer. To load the data, would we use ReadList or Import? If it's with import, could you show me how? The documentation talks about 'elements', but I don't know what they are, and how they relate to our compressed data. I tried using ReadList, but I had to abort the evaluation...

    – An old man in the sea.
    2 days ago








  • 1





    Considering that the structure contains $4.2times10^7$ fully random elements with 8 bytes each, I don't think you can do much better, even with an approach that stores the structure of the data more efficiently

    – Lukas Lang
    2 days ago













  • @LukasLang Yes, that should indeed be mentioned: The random data here are a "worst case" - but the problem still is a general one.

    – gwr
    2 days ago






  • 1





    Thanks gwr. I've tried BinarySerialize, but Mathematica doesn't recognize it as a built-in function... Also, I've noticed that if we separate the file into 3, for each list, then the combined time for import, export, compress, and uncompress are much lower in total...

    – An old man in the sea.
    2 days ago











  • @Anoldmaninthesea. One has to play around with it. I added the information, that BinarySerialize is only available for Version 11.1 or later.

    – gwr
    2 days ago

















Thanks for your answer. To load the data, would we use ReadList or Import? If it's with import, could you show me how? The documentation talks about 'elements', but I don't know what they are, and how they relate to our compressed data. I tried using ReadList, but I had to abort the evaluation...

– An old man in the sea.
2 days ago







Thanks for your answer. To load the data, would we use ReadList or Import? If it's with import, could you show me how? The documentation talks about 'elements', but I don't know what they are, and how they relate to our compressed data. I tried using ReadList, but I had to abort the evaluation...

– An old man in the sea.
2 days ago






1




1





Considering that the structure contains $4.2times10^7$ fully random elements with 8 bytes each, I don't think you can do much better, even with an approach that stores the structure of the data more efficiently

– Lukas Lang
2 days ago







Considering that the structure contains $4.2times10^7$ fully random elements with 8 bytes each, I don't think you can do much better, even with an approach that stores the structure of the data more efficiently

– Lukas Lang
2 days ago















@LukasLang Yes, that should indeed be mentioned: The random data here are a "worst case" - but the problem still is a general one.

– gwr
2 days ago





@LukasLang Yes, that should indeed be mentioned: The random data here are a "worst case" - but the problem still is a general one.

– gwr
2 days ago




1




1





Thanks gwr. I've tried BinarySerialize, but Mathematica doesn't recognize it as a built-in function... Also, I've noticed that if we separate the file into 3, for each list, then the combined time for import, export, compress, and uncompress are much lower in total...

– An old man in the sea.
2 days ago





Thanks gwr. I've tried BinarySerialize, but Mathematica doesn't recognize it as a built-in function... Also, I've noticed that if we separate the file into 3, for each list, then the combined time for import, export, compress, and uncompress are much lower in total...

– An old man in the sea.
2 days ago













@Anoldmaninthesea. One has to play around with it. I added the information, that BinarySerialize is only available for Version 11.1 or later.

– gwr
2 days ago





@Anoldmaninthesea. One has to play around with it. I added the information, that BinarySerialize is only available for Version 11.1 or later.

– gwr
2 days ago


















draft saved

draft discarded




















































Thanks for contributing an answer to Mathematica Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f189136%2fefficient-way-to-storage-large-lists-in-a-files%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

An IMO inspired problem

Management

Investment