행렬 차원을 보존하면서 numpy 배열을 직렬화하려면 어떻게해야합니까?

numpy.array.tostring은 사용자가 numpy.array.reshape를 호출하도록 요구하면서 행렬 크기에 대한 정보를 유지하는 것 같지 않습니다 (이 질문 참조).

이 정보를 보존하면서 numpy 배열을 JSON 형식으로 직렬화하는 방법이 있습니까?

참고 : 배열에는 int, float 또는 bool이 포함될 수 있습니다. 전치 배열을 기대하는 것이 합리적입니다.

주 2 : 이것은 스트림 정보를 사용하여 스톰 토폴로지를 통해 numpy 배열을 전달하려는 목적으로 수행됩니다. 이러한 정보가 관련성있게 종료되는 경우에 대비하십시오.

해결법

==============================
1.pickle.dumps 또는 numpy.save는 엔디안 문제, 인접하지 않은 배열 또는 이상한 튜플 dtypes가있는 경우에도 임의의 NumPy 배열을 재구성하는 데 필요한 모든 정보를 인코딩합니다. 엔디안 문제는 아마도 가장 중요합니다. 배열을 빅 엔디 언 머신에로드했기 때문에 array ([1])가 갑자기 배열이되기를 원하지 않습니다 ([16777216]). pickle은 아마 더 편리한 옵션 일 것이다. save는 npy 형식의 논리로 주어진다.

pickle.dumps 또는 numpy.save는 엔디안 문제, 인접하지 않은 배열 또는 이상한 튜플 dtypes가있는 경우에도 임의의 NumPy 배열을 재구성하는 데 필요한 모든 정보를 인코딩합니다. 엔디안 문제는 아마도 가장 중요합니다. 배열을 빅 엔디 언 머신에로드했기 때문에 array ([1])가 갑자기 배열이되기를 원하지 않습니다 ([16777216]). pickle은 아마 더 편리한 옵션 일 것이다. save는 npy 형식의 논리로 주어진다.

피클 옵션 :
```
import pickle
a = # some NumPy array
serialized = pickle.dumps(a, protocol=0) # protocol 0 is printable ASCII
deserialized_a = pickle.loads(serialized)
```
numpy.save는 바이너리 형식을 사용하며 파일에 쓰기가 필요하지만 StringIO를 사용하여이 문제를 해결할 수 있습니다.
```
a = # any NumPy array
memfile = StringIO.StringIO()
numpy.save(memfile, a)
memfile.seek(0)
serialized = json.dumps(memfile.read().decode('latin-1'))
# latin-1 maps byte n to unicode code point n
```
그리고 deserialize :
```
memfile = StringIO.StringIO()
memfile.write(json.loads(serialized).encode('latin-1'))
memfile.seek(0)
a = numpy.load(memfile)
```
==============================
2.편집 : 질문의 의견을 읽을 수있는이 솔루션은 "일반"numpy 배열 (수레, ints, bool ...) 및 다중 형식 구조 배열을 다루지 않습니다.

편집 : 질문의 의견을 읽을 수있는이 솔루션은 "일반"numpy 배열 (수레, ints, bool ...) 및 다중 형식 구조 배열을 다루지 않습니다.

모든 차원과 데이터 형식의 배열을 직렬화하는 솔루션

내가 아는 한 모든 데이터 유형과 차원을 가진 열세 번째 배열을 간단하게 직렬화 할 수는 없지만 데이터 형식, 차원 및 정보를 목록 표현에 저장 한 다음 JSON을 사용하여 직렬화 할 수 있습니다.

필요한 수입 :
```
import json
import base64
```
인코딩을 위해 (nparray는 데이터 형식과 차원의 일부 배열)을 사용할 수 있습니다.
```
json.dumps([str(nparray.dtype), base64.b64encode(nparray), nparray.shape])
```
그런 다음 데이터 유형 및 모양의 목록 표현뿐만 아니라 base64로 인코딩 된 배열 데이터 / 내용을 포함하여 데이터의 JSON 덤프 (문자열)를 가져옵니다.

그리고 이것을 디코딩하기 위해 작업을합니다 (encStr은 인코딩 된 JSON 문자열이며 어딘가에서로드됩니다).
```
# get the encoded json dump
enc = json.loads(encStr)

# build the numpy data type
dataType = numpy.dtype(enc[0])

# decode the base64 encoded numpy array data and create a new numpy array with this data & type
dataArray = numpy.frombuffer(base64.decodestring(enc[1]), dataType)

# if the array had more than one data set it has to be reshaped
if len(enc) > 2:
     dataArray.reshape(enc[2])   # return the reshaped numpy array containing several data sets
```
JSON 덤프는 여러 가지 이유로 효율적이고 상호 호환이 가능하지만 JSON을 사용하면 모든 유형 및 모든 차원의 배열을 저장하고로드하려는 경우 예상치 못한 결과가 발생합니다.

이 솔루션은 형식이나 차원에 관계없이 numpy 배열을 저장하고로드하며 올바르게 복원합니다 (데이터 유형, 차원, ...).

몇 달 전에 여러 솔루션을 시도했지만 이것이 내가 만났던 유일한 효율적이고 다재다능한 솔루션이었습니다.

==============================

3.Msgpack-numpy가 유용한 코드를 찾았습니다. https://github.com/lebedov/msgpack-numpy/blob/master/msgpack_numpy.py

Msgpack-numpy가 유용한 코드를 찾았습니다. https://github.com/lebedov/msgpack-numpy/blob/master/msgpack_numpy.py

serialize 된 dict를 약간 수정하고 serialize 된 크기를 줄이기 위해 base64 인코딩을 추가했습니다.

json과 동일한 인터페이스 (load (s), dump (s) 제공)를 사용하면 json 직렬화에 드롭 인 대체를 제공 할 수 있습니다.

이 같은 논리를 확장하여 datetime 객체와 같이 자동이 아닌 순차 화를 추가 할 수 있습니다.

편집하다 필자는이 작업을 수행하는 일반적인 모듈 식 파서를 작성했습니다. https://github.com/someones/jaweson

내 코드는 다음과 같습니다.

np_json.py

from json import *
import json
import numpy as np
import base64

def to_json(obj):
    if isinstance(obj, (np.ndarray, np.generic)):
        if isinstance(obj, np.ndarray):
            return {
                '__ndarray__': base64.b64encode(obj.tostring()),
                'dtype': obj.dtype.str,
                'shape': obj.shape,
            }
        elif isinstance(obj, (np.bool_, np.number)):
            return {
                '__npgeneric__': base64.b64encode(obj.tostring()),
                'dtype': obj.dtype.str,
            }
    if isinstance(obj, set):
        return {'__set__': list(obj)}
    if isinstance(obj, tuple):
        return {'__tuple__': list(obj)}
    if isinstance(obj, complex):
        return {'__complex__': obj.__repr__()}

    # Let the base class default method raise the TypeError
    raise TypeError('Unable to serialise object of type {}'.format(type(obj)))


def from_json(obj):
    # check for numpy
    if isinstance(obj, dict):
        if '__ndarray__' in obj:
            return np.fromstring(
                base64.b64decode(obj['__ndarray__']),
                dtype=np.dtype(obj['dtype'])
            ).reshape(obj['shape'])
        if '__npgeneric__' in obj:
            return np.fromstring(
                base64.b64decode(obj['__npgeneric__']),
                dtype=np.dtype(obj['dtype'])
            )[0]
        if '__set__' in obj:
            return set(obj['__set__'])
        if '__tuple__' in obj:
            return tuple(obj['__tuple__'])
        if '__complex__' in obj:
            return complex(obj['__complex__'])

    return obj

# over-write the load(s)/dump(s) functions
def load(*args, **kwargs):
    kwargs['object_hook'] = from_json
    return json.load(*args, **kwargs)


def loads(*args, **kwargs):
    kwargs['object_hook'] = from_json
    return json.loads(*args, **kwargs)


def dump(*args, **kwargs):
    kwargs['default'] = to_json
    return json.dump(*args, **kwargs)


def dumps(*args, **kwargs):
    kwargs['default'] = to_json
    return json.dumps(*args, **kwargs)

다음을 수행 할 수 있어야합니다.

import numpy as np
import np_json as json
np_data = np.zeros((10,10), dtype=np.float32)
new_data = json.loads(json.dumps(np_data))
assert (np_data == new_data).all()

==============================
4.사람이 읽을 수 있어야하고 이것이 숫자가없는 배열이라는 것을 알고 있다면 :

사람이 읽을 수 있어야하고 이것이 숫자가없는 배열이라는 것을 알고 있다면 :
```
import numpy as np; 
import json;

a = np.random.normal(size=(50,120,150))
a_reconstructed = np.asarray(json.loads(json.dumps(a.tolist())))
print np.allclose(a,a_reconstructed)
print (a==a_reconstructed).all()
```
배열 크기가 커지면 가장 효율적이지 않을 수도 있지만 더 작은 배열에서는 효과적 일 수 있습니다.
==============================
5.Msgpack의 직렬화 성능은 다음과 같습니다. http://www.benfrederickson.com/dont-pickle-your-data/

Msgpack의 직렬화 성능은 다음과 같습니다. http://www.benfrederickson.com/dont-pickle-your-data/

msgpack-numpy를 사용하십시오. https://github.com/lebedov/msgpack-numpy를 참조하십시오.

그것을 설치하십시오 :
```
pip install msgpack-numpy
```
그때:
```
import msgpack
import msgpack_numpy as m
import numpy as np

x = np.random.rand(5)
x_enc = msgpack.packb(x, default=m.encode)
x_rec = msgpack.unpackb(x_enc, object_hook=m.decode)
```
==============================
6.traitschema 시도 https://traitschema.readthedocs.io/en/latest/

traitschema 시도 https://traitschema.readthedocs.io/en/latest/
==============================
7.numpy.array_repr 또는 numpy.array_str을 사용해보십시오.

numpy.array_repr 또는 numpy.array_str을 사용해보십시오.

from https://stackoverflow.com/questions/30698004/how-can-i-serialize-a-numpy-array-while-preserving-matrix-dimensions by cc-by-sa and MIT license

'PYTHON' 카테고리의 다른 글

[PYTHON] VectorAssembler의 출력에서 Spark ML의 열 이름으로 다시 기능을 매핑하는 방법은 무엇입니까? (0)	2018.11.15
[PYTHON] 특이 매트릭스에 대한 효율적이고 비단뱀 체크 (0)	2018.11.14
[PYTHON] Python pandas to_excel 'utf8'코덱에서 바이트를 디코딩 할 수 없습니다. (0)	2018.11.14
[PYTHON] 판다를 사용하여 행당 다른 수의 열로 CSV 가져 오기 (0)	2018.11.14
[PYTHON] 파이썬 패키지 내부에서 (정적 인) 파일을 읽는 방법? (0)	2018.11.14

복붙노트

[PYTHON] 행렬 차원을 보존하면서 numpy 배열을 직렬화하려면 어떻게해야합니까?

행렬 차원을 보존하면서 numpy 배열을 직렬화하려면 어떻게해야합니까?

해결법

2.편집 : 질문의 의견을 읽을 수있는이 솔루션은 "일반"numpy 배열 (수레, ints, bool ...) 및 다중 형식 구조 배열을 다루지 않습니다.

3.Msgpack-numpy가 유용한 코드를 찾았습니다. https://github.com/lebedov/msgpack-numpy/blob/master/msgpack_numpy.py

4.사람이 읽을 수 있어야하고 이것이 숫자가없는 배열이라는 것을 알고 있다면 :

5.Msgpack의 직렬화 성능은 다음과 같습니다. http://www.benfrederickson.com/dont-pickle-your-data/

6.traitschema 시도 https://traitschema.readthedocs.io/en/latest/

7.numpy.array_repr 또는 numpy.array_str을 사용해보십시오.

'PYTHON' 카테고리의 다른 글

티스토리툴바