Efficient way to storage large lists in a file(s)
Let us assume that I need to storage in a file, or files, data with the following form
alist =
Table[Table[RandomReal[{0, 1}, 2], {i, 1, 200}], {j, 1, 10000}];
blist =
Table[Table[{RandomReal[{0, 1}, 6], RandomReal,
RandomReal[{0, 1}, 6], RandomReal[{0, 1}, 6]}, {i, 1, 200}], {j,
1, 10000}];
clist = RandomReal[{0, 1}, 10000];
In a few seconds, Mathematica is able to create these lists.
However, when I try to export this data to a .txt file, using
Export["C:\pathname\abclists.txt", {alist, blist, clist}];
My computer starts to freeze, and I have to abort the evalutation. When I check the result, Mathematica has created a 600MB .txt file.
Is there a more efficient way to storage data in a file or files? I have no preference for the file type, as long as I can easily rebuild the original data, preserving the list levels.
export compression storage
add a comment |
Let us assume that I need to storage in a file, or files, data with the following form
alist =
Table[Table[RandomReal[{0, 1}, 2], {i, 1, 200}], {j, 1, 10000}];
blist =
Table[Table[{RandomReal[{0, 1}, 6], RandomReal,
RandomReal[{0, 1}, 6], RandomReal[{0, 1}, 6]}, {i, 1, 200}], {j,
1, 10000}];
clist = RandomReal[{0, 1}, 10000];
In a few seconds, Mathematica is able to create these lists.
However, when I try to export this data to a .txt file, using
Export["C:\pathname\abclists.txt", {alist, blist, clist}];
My computer starts to freeze, and I have to abort the evalutation. When I check the result, Mathematica has created a 600MB .txt file.
Is there a more efficient way to storage data in a file or files? I have no preference for the file type, as long as I can easily rebuild the original data, preserving the list levels.
export compression storage
1
Have you triedDumpSave
?
– Roman
2 days ago
add a comment |
Let us assume that I need to storage in a file, or files, data with the following form
alist =
Table[Table[RandomReal[{0, 1}, 2], {i, 1, 200}], {j, 1, 10000}];
blist =
Table[Table[{RandomReal[{0, 1}, 6], RandomReal,
RandomReal[{0, 1}, 6], RandomReal[{0, 1}, 6]}, {i, 1, 200}], {j,
1, 10000}];
clist = RandomReal[{0, 1}, 10000];
In a few seconds, Mathematica is able to create these lists.
However, when I try to export this data to a .txt file, using
Export["C:\pathname\abclists.txt", {alist, blist, clist}];
My computer starts to freeze, and I have to abort the evalutation. When I check the result, Mathematica has created a 600MB .txt file.
Is there a more efficient way to storage data in a file or files? I have no preference for the file type, as long as I can easily rebuild the original data, preserving the list levels.
export compression storage
Let us assume that I need to storage in a file, or files, data with the following form
alist =
Table[Table[RandomReal[{0, 1}, 2], {i, 1, 200}], {j, 1, 10000}];
blist =
Table[Table[{RandomReal[{0, 1}, 6], RandomReal,
RandomReal[{0, 1}, 6], RandomReal[{0, 1}, 6]}, {i, 1, 200}], {j,
1, 10000}];
clist = RandomReal[{0, 1}, 10000];
In a few seconds, Mathematica is able to create these lists.
However, when I try to export this data to a .txt file, using
Export["C:\pathname\abclists.txt", {alist, blist, clist}];
My computer starts to freeze, and I have to abort the evalutation. When I check the result, Mathematica has created a 600MB .txt file.
Is there a more efficient way to storage data in a file or files? I have no preference for the file type, as long as I can easily rebuild the original data, preserving the list levels.
export compression storage
export compression storage
edited 2 days ago
gwr
7,52122457
7,52122457
asked 2 days ago
An old man in the sea.An old man in the sea.
937718
937718
1
Have you triedDumpSave
?
– Roman
2 days ago
add a comment |
1
Have you triedDumpSave
?
– Roman
2 days ago
1
1
Have you tried
DumpSave
?– Roman
2 days ago
Have you tried
DumpSave
?– Roman
2 days ago
add a comment |
1 Answer
1
active
oldest
votes
Using ByteCount
will tell you to expect more than 1 GB:
ByteCount @ { alist, blist, clist }
(* 1 136 880 448 *)
Compress
I would thus try Compress
:
cexpr = Compress @ { alist, blist, clist };
(* Export["pathname\data.m", cexpr ]; *)
ByteCount @ cexpr
(* 437 359 280 *)
I am getting a 433 MB file (which roughly matches ByteCount
). You can Uncompress
the expression after loading.
BinarySerialize
Another possibility as of Version 11.1 or later is BinarySerialize
:
bexpr = BinarySerialize @ { alist, blist, clist };
ByteCount @ bexpr
(* 378 170 128 *)
So we are down to about 378 MB (the file is 312 MB on my computer). You can use BinaryDeserialize
to get the original expression again (see below for explicit instructions for writing/reading binary data).
If we give the option PerformanceGoal -> "Size"
bexpr = BinarySerialize[ {alist,blist,clist}, PerformanceGoal -> "Size" ];
ByteCount @ bexpr
(* 327 494 245 *)
we are down to about 327 MB.
Writing and Reading Binary Data
The documentation tells you how to write/read binary data:
stream = OpenWrite[ "pathname\data.mx", BinaryFormat -> True ];
BinaryWrite[ stream, bexpr ];
Close @ stream;
Reading the data:
data = BinaryDeserialize @ ByteArray @ BinaryReadList[ "pathname\data.mx", "Byte" ];
Thanks for your answer. To load the data, would we use ReadList or Import? If it's with import, could you show me how? The documentation talks about 'elements', but I don't know what they are, and how they relate to our compressed data. I tried using ReadList, but I had to abort the evaluation...
– An old man in the sea.
2 days ago
1
Considering that the structure contains $4.2times10^7$ fully random elements with 8 bytes each, I don't think you can do much better, even with an approach that stores the structure of the data more efficiently
– Lukas Lang
2 days ago
@LukasLang Yes, that should indeed be mentioned: The random data here are a "worst case" - but the problem still is a general one.
– gwr
2 days ago
1
Thanks gwr. I've tried BinarySerialize, but Mathematica doesn't recognize it as a built-in function... Also, I've noticed that if we separate the file into 3, for each list, then the combined time for import, export, compress, and uncompress are much lower in total...
– An old man in the sea.
2 days ago
@Anoldmaninthesea. One has to play around with it. I added the information, that BinarySerialize is only available for Version 11.1 or later.
– gwr
2 days ago
|
show 1 more comment
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "387"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f189136%2fefficient-way-to-storage-large-lists-in-a-files%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Using ByteCount
will tell you to expect more than 1 GB:
ByteCount @ { alist, blist, clist }
(* 1 136 880 448 *)
Compress
I would thus try Compress
:
cexpr = Compress @ { alist, blist, clist };
(* Export["pathname\data.m", cexpr ]; *)
ByteCount @ cexpr
(* 437 359 280 *)
I am getting a 433 MB file (which roughly matches ByteCount
). You can Uncompress
the expression after loading.
BinarySerialize
Another possibility as of Version 11.1 or later is BinarySerialize
:
bexpr = BinarySerialize @ { alist, blist, clist };
ByteCount @ bexpr
(* 378 170 128 *)
So we are down to about 378 MB (the file is 312 MB on my computer). You can use BinaryDeserialize
to get the original expression again (see below for explicit instructions for writing/reading binary data).
If we give the option PerformanceGoal -> "Size"
bexpr = BinarySerialize[ {alist,blist,clist}, PerformanceGoal -> "Size" ];
ByteCount @ bexpr
(* 327 494 245 *)
we are down to about 327 MB.
Writing and Reading Binary Data
The documentation tells you how to write/read binary data:
stream = OpenWrite[ "pathname\data.mx", BinaryFormat -> True ];
BinaryWrite[ stream, bexpr ];
Close @ stream;
Reading the data:
data = BinaryDeserialize @ ByteArray @ BinaryReadList[ "pathname\data.mx", "Byte" ];
Thanks for your answer. To load the data, would we use ReadList or Import? If it's with import, could you show me how? The documentation talks about 'elements', but I don't know what they are, and how they relate to our compressed data. I tried using ReadList, but I had to abort the evaluation...
– An old man in the sea.
2 days ago
1
Considering that the structure contains $4.2times10^7$ fully random elements with 8 bytes each, I don't think you can do much better, even with an approach that stores the structure of the data more efficiently
– Lukas Lang
2 days ago
@LukasLang Yes, that should indeed be mentioned: The random data here are a "worst case" - but the problem still is a general one.
– gwr
2 days ago
1
Thanks gwr. I've tried BinarySerialize, but Mathematica doesn't recognize it as a built-in function... Also, I've noticed that if we separate the file into 3, for each list, then the combined time for import, export, compress, and uncompress are much lower in total...
– An old man in the sea.
2 days ago
@Anoldmaninthesea. One has to play around with it. I added the information, that BinarySerialize is only available for Version 11.1 or later.
– gwr
2 days ago
|
show 1 more comment
Using ByteCount
will tell you to expect more than 1 GB:
ByteCount @ { alist, blist, clist }
(* 1 136 880 448 *)
Compress
I would thus try Compress
:
cexpr = Compress @ { alist, blist, clist };
(* Export["pathname\data.m", cexpr ]; *)
ByteCount @ cexpr
(* 437 359 280 *)
I am getting a 433 MB file (which roughly matches ByteCount
). You can Uncompress
the expression after loading.
BinarySerialize
Another possibility as of Version 11.1 or later is BinarySerialize
:
bexpr = BinarySerialize @ { alist, blist, clist };
ByteCount @ bexpr
(* 378 170 128 *)
So we are down to about 378 MB (the file is 312 MB on my computer). You can use BinaryDeserialize
to get the original expression again (see below for explicit instructions for writing/reading binary data).
If we give the option PerformanceGoal -> "Size"
bexpr = BinarySerialize[ {alist,blist,clist}, PerformanceGoal -> "Size" ];
ByteCount @ bexpr
(* 327 494 245 *)
we are down to about 327 MB.
Writing and Reading Binary Data
The documentation tells you how to write/read binary data:
stream = OpenWrite[ "pathname\data.mx", BinaryFormat -> True ];
BinaryWrite[ stream, bexpr ];
Close @ stream;
Reading the data:
data = BinaryDeserialize @ ByteArray @ BinaryReadList[ "pathname\data.mx", "Byte" ];
Thanks for your answer. To load the data, would we use ReadList or Import? If it's with import, could you show me how? The documentation talks about 'elements', but I don't know what they are, and how they relate to our compressed data. I tried using ReadList, but I had to abort the evaluation...
– An old man in the sea.
2 days ago
1
Considering that the structure contains $4.2times10^7$ fully random elements with 8 bytes each, I don't think you can do much better, even with an approach that stores the structure of the data more efficiently
– Lukas Lang
2 days ago
@LukasLang Yes, that should indeed be mentioned: The random data here are a "worst case" - but the problem still is a general one.
– gwr
2 days ago
1
Thanks gwr. I've tried BinarySerialize, but Mathematica doesn't recognize it as a built-in function... Also, I've noticed that if we separate the file into 3, for each list, then the combined time for import, export, compress, and uncompress are much lower in total...
– An old man in the sea.
2 days ago
@Anoldmaninthesea. One has to play around with it. I added the information, that BinarySerialize is only available for Version 11.1 or later.
– gwr
2 days ago
|
show 1 more comment
Using ByteCount
will tell you to expect more than 1 GB:
ByteCount @ { alist, blist, clist }
(* 1 136 880 448 *)
Compress
I would thus try Compress
:
cexpr = Compress @ { alist, blist, clist };
(* Export["pathname\data.m", cexpr ]; *)
ByteCount @ cexpr
(* 437 359 280 *)
I am getting a 433 MB file (which roughly matches ByteCount
). You can Uncompress
the expression after loading.
BinarySerialize
Another possibility as of Version 11.1 or later is BinarySerialize
:
bexpr = BinarySerialize @ { alist, blist, clist };
ByteCount @ bexpr
(* 378 170 128 *)
So we are down to about 378 MB (the file is 312 MB on my computer). You can use BinaryDeserialize
to get the original expression again (see below for explicit instructions for writing/reading binary data).
If we give the option PerformanceGoal -> "Size"
bexpr = BinarySerialize[ {alist,blist,clist}, PerformanceGoal -> "Size" ];
ByteCount @ bexpr
(* 327 494 245 *)
we are down to about 327 MB.
Writing and Reading Binary Data
The documentation tells you how to write/read binary data:
stream = OpenWrite[ "pathname\data.mx", BinaryFormat -> True ];
BinaryWrite[ stream, bexpr ];
Close @ stream;
Reading the data:
data = BinaryDeserialize @ ByteArray @ BinaryReadList[ "pathname\data.mx", "Byte" ];
Using ByteCount
will tell you to expect more than 1 GB:
ByteCount @ { alist, blist, clist }
(* 1 136 880 448 *)
Compress
I would thus try Compress
:
cexpr = Compress @ { alist, blist, clist };
(* Export["pathname\data.m", cexpr ]; *)
ByteCount @ cexpr
(* 437 359 280 *)
I am getting a 433 MB file (which roughly matches ByteCount
). You can Uncompress
the expression after loading.
BinarySerialize
Another possibility as of Version 11.1 or later is BinarySerialize
:
bexpr = BinarySerialize @ { alist, blist, clist };
ByteCount @ bexpr
(* 378 170 128 *)
So we are down to about 378 MB (the file is 312 MB on my computer). You can use BinaryDeserialize
to get the original expression again (see below for explicit instructions for writing/reading binary data).
If we give the option PerformanceGoal -> "Size"
bexpr = BinarySerialize[ {alist,blist,clist}, PerformanceGoal -> "Size" ];
ByteCount @ bexpr
(* 327 494 245 *)
we are down to about 327 MB.
Writing and Reading Binary Data
The documentation tells you how to write/read binary data:
stream = OpenWrite[ "pathname\data.mx", BinaryFormat -> True ];
BinaryWrite[ stream, bexpr ];
Close @ stream;
Reading the data:
data = BinaryDeserialize @ ByteArray @ BinaryReadList[ "pathname\data.mx", "Byte" ];
edited 2 days ago
answered 2 days ago
gwrgwr
7,52122457
7,52122457
Thanks for your answer. To load the data, would we use ReadList or Import? If it's with import, could you show me how? The documentation talks about 'elements', but I don't know what they are, and how they relate to our compressed data. I tried using ReadList, but I had to abort the evaluation...
– An old man in the sea.
2 days ago
1
Considering that the structure contains $4.2times10^7$ fully random elements with 8 bytes each, I don't think you can do much better, even with an approach that stores the structure of the data more efficiently
– Lukas Lang
2 days ago
@LukasLang Yes, that should indeed be mentioned: The random data here are a "worst case" - but the problem still is a general one.
– gwr
2 days ago
1
Thanks gwr. I've tried BinarySerialize, but Mathematica doesn't recognize it as a built-in function... Also, I've noticed that if we separate the file into 3, for each list, then the combined time for import, export, compress, and uncompress are much lower in total...
– An old man in the sea.
2 days ago
@Anoldmaninthesea. One has to play around with it. I added the information, that BinarySerialize is only available for Version 11.1 or later.
– gwr
2 days ago
|
show 1 more comment
Thanks for your answer. To load the data, would we use ReadList or Import? If it's with import, could you show me how? The documentation talks about 'elements', but I don't know what they are, and how they relate to our compressed data. I tried using ReadList, but I had to abort the evaluation...
– An old man in the sea.
2 days ago
1
Considering that the structure contains $4.2times10^7$ fully random elements with 8 bytes each, I don't think you can do much better, even with an approach that stores the structure of the data more efficiently
– Lukas Lang
2 days ago
@LukasLang Yes, that should indeed be mentioned: The random data here are a "worst case" - but the problem still is a general one.
– gwr
2 days ago
1
Thanks gwr. I've tried BinarySerialize, but Mathematica doesn't recognize it as a built-in function... Also, I've noticed that if we separate the file into 3, for each list, then the combined time for import, export, compress, and uncompress are much lower in total...
– An old man in the sea.
2 days ago
@Anoldmaninthesea. One has to play around with it. I added the information, that BinarySerialize is only available for Version 11.1 or later.
– gwr
2 days ago
Thanks for your answer. To load the data, would we use ReadList or Import? If it's with import, could you show me how? The documentation talks about 'elements', but I don't know what they are, and how they relate to our compressed data. I tried using ReadList, but I had to abort the evaluation...
– An old man in the sea.
2 days ago
Thanks for your answer. To load the data, would we use ReadList or Import? If it's with import, could you show me how? The documentation talks about 'elements', but I don't know what they are, and how they relate to our compressed data. I tried using ReadList, but I had to abort the evaluation...
– An old man in the sea.
2 days ago
1
1
Considering that the structure contains $4.2times10^7$ fully random elements with 8 bytes each, I don't think you can do much better, even with an approach that stores the structure of the data more efficiently
– Lukas Lang
2 days ago
Considering that the structure contains $4.2times10^7$ fully random elements with 8 bytes each, I don't think you can do much better, even with an approach that stores the structure of the data more efficiently
– Lukas Lang
2 days ago
@LukasLang Yes, that should indeed be mentioned: The random data here are a "worst case" - but the problem still is a general one.
– gwr
2 days ago
@LukasLang Yes, that should indeed be mentioned: The random data here are a "worst case" - but the problem still is a general one.
– gwr
2 days ago
1
1
Thanks gwr. I've tried BinarySerialize, but Mathematica doesn't recognize it as a built-in function... Also, I've noticed that if we separate the file into 3, for each list, then the combined time for import, export, compress, and uncompress are much lower in total...
– An old man in the sea.
2 days ago
Thanks gwr. I've tried BinarySerialize, but Mathematica doesn't recognize it as a built-in function... Also, I've noticed that if we separate the file into 3, for each list, then the combined time for import, export, compress, and uncompress are much lower in total...
– An old man in the sea.
2 days ago
@Anoldmaninthesea. One has to play around with it. I added the information, that BinarySerialize is only available for Version 11.1 or later.
– gwr
2 days ago
@Anoldmaninthesea. One has to play around with it. I added the information, that BinarySerialize is only available for Version 11.1 or later.
– gwr
2 days ago
|
show 1 more comment
Thanks for contributing an answer to Mathematica Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f189136%2fefficient-way-to-storage-large-lists-in-a-files%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Have you tried
DumpSave
?– Roman
2 days ago