How to Read in Text File Into Matrix of Numbers in Python

xi. Reading and Writing Information Files: ndarrays

By Bernd Klein. Last modified: 01 Feb 2022.

There are lots of means for reading from file and writing to information files in numpy. We will discuss the different ways and corresponding functions in this chapter:

  • savetxt
  • loadtxt
  • tofile
  • fromfile
  • salve
  • load
  • genfromtxt

Saving textfiles with savetxt

Scrabble with the Text Numpy, read, write, array

The first two functions we will cover are savetxt and loadtxt.

In the following simple case, we define an array x and save it as a textfile with savetxt:

            import            numpy            as            np            x            =            np            .            array            ([[            1            ,            ii            ,            3            ],            [            4            ,            5            ,            six            ],            [            7            ,            viii            ,            9            ]],            np            .            int32            )            np            .            savetxt            (            "test.txt"            ,            x            )          

The file "test.txt" is a textfile and its content looks like this:

          [email protected]:~/Dropbox/notebooks/numpy$ more exam.txt 1.000000000000000000e+00 2.000000000000000000e+00 3.000000000000000000e+00 four.000000000000000000e+00 v.000000000000000000e+00 6.000000000000000000e+00 7.000000000000000000e+00 8.000000000000000000e+00 9.000000000000000000e+00        

Attention: The above output has been created on the Linux control prompt!

It's also possible to print the array in a special format, similar for case with 3 decimal places or as integers, which are preceded with leading blanks, if the number of digits is less than iv digits. For this purpose we assign a format string to the 3rd parameter 'fmt'. We saw in our first instance that the default delimeter is a blank. We tin change this behaviour by assigning a cord to the parameter "delimiter". In well-nigh cases this string volition consist solely of a single character but it can be a sequence of character, like a smiley " :-) " every bit well:

            np            .            savetxt            (            "test2.txt"            ,            x            ,            fmt            =            "            %2.3f            "            ,            delimiter            =            ","            )            np            .            savetxt            (            "test3.txt"            ,            x            ,            fmt            =            "            %04d            "            ,            delimiter            =            " :-) "            )          

The newly created files look like this:

          [email protected]:~/Dropbox/notebooks/numpy$ more test2.txt  1.000,ii.000,3.000 iv.000,v.000,6.000 vii.000,eight.000,9.000          [email protected]:~/Dropbox/notebooks/numpy$ more than test3.txt  0001 :-) 0002 :-) 0003 0004 :-) 0005 :-) 0006 0007 :-) 0008 :-) 0009        

The consummate syntax of savetxt looks like this:

savetxt(fname, X, fmt='%.18e', delimiter=' ', newline='\northward', header='', footer='', comments='# ')        
Parameter Meaning
10 array_like Data to be saved to a text file.
fmt str or sequence of strs, optional
A unmarried format (%10.5f), a sequence of formats, or a multi-format string, e.thousand. 'Iteration %d -- %ten.5f', in which case 'delimiter' is ignored. For complex 'X', the legal options for 'fmt' are:
a) a single specifier, "fmt='%.4e'", resulting in numbers formatted similar "' (%s+%sj)' % (fmt, fmt)"
b) a full cord specifying every existent and imaginary part, e.m. "' %.4e %+.4j %.4e %+.4j %.4e %+.4j'" for 3 columns
c) a list of specifiers, 1 per column - in this example, the real and imaginary part must have carve up specifiers, due east.1000. "['%.3e + %.3ej', '(%.15e%+.15ej)']" for 2 columns
delimiter A string used for separating the columns.
newline A string (e.g. "\n", "\r\n" or ",\n") which will finish a line instead of the default line catastrophe
header A String that will be written at the offset of the file.
footer A String that will be written at the end of the file.
comments A Cord that volition be prepended to the 'header' and 'footer' strings, to mark them equally comments. The hash tag '#' is used as the default.

Loading Textfiles with loadtxt

We will read in now the file "test.txt", which we have written in our previous subchapter:

              y              =              np              .              loadtxt              (              "test.txt"              )              impress              (              y              )            

OUTPUT:

[[ 1.  2.  3.]  [ 4.  5.  half dozen.]  [ 7.  8.  ix.]]            
              y              =              np              .              loadtxt              (              "test2.txt"              ,              delimiter              =              ","              )              impress              (              y              )            

OUTPUT:

[[ 1.  two.  3.]  [ 4.  5.  6.]  [ 7.  viii.  9.]]            

Nothing new, if we read in our text, in which we used a smiley to separator:

              y              =              np              .              loadtxt              (              "test3.txt"              ,              delimiter              =              " :-) "              )              print              (              y              )            

OUTPUT:

[[ 1.  two.  three.]  [ 4.  5.  half dozen.]  [ vii.  8.  ix.]]            

It's also possible to choose the columns by index:

              y              =              np              .              loadtxt              (              "test3.txt"              ,              delimiter              =              " :-) "              ,              usecols              =              (              0              ,              ii              ))              impress              (              y              )            

OUTPUT:

[[ ane.  3.]  [ 4.  half-dozen.]  [ seven.  9.]]            

We will read in our next example the file "times_and_temperatures.txt", which we take created in our chapter on Generators of our Python tutorial. Every line contains a time in the format "hh::mm::ss" and random temperatures between 10.0 and 25.0 degrees. Nosotros accept to convert the time cord into float numbers. The time volition be in minutes with seconds in the hundred. We define first a function which converts "hh::mm::ss" into minutes:

              def              time2float_minutes              (              time              ):              if              type              (              fourth dimension              )              ==              bytes              :              time              =              time              .              decode              ()              t              =              time              .              divide              (              ":"              )              minutes              =              bladder              (              t              [              0              ])              *              60              +              float              (              t              [              ane              ])              +              float              (              t              [              2              ])              *              0.05              /              3              return              minutes              for              t              in              [              "06:00:ten"              ,              "06:27:45"              ,              "12:59:59"              ]:              print              (              time2float_minutes              (              t              ))            

OUTPUT:

360.1666666666667 387.75 779.9833333333333            

You might have noticed that nosotros check the type of fourth dimension for binary. The reason for this is the use of our function "time2float_minutes in loadtxt in the following instance. The keyword parameter converters contains a dictionary which can agree a function for a column (the key of the column corresponds to the key of the dictionary) to convert the string data of this column into a float. The string data is a byte string. That is why we had to transfer it into a a unicode string in our function:

              y              =              np              .              loadtxt              (              "times_and_temperatures.txt"              ,              converters              =              {              0              :              time2float_minutes              })              print              (              y              )            

OUTPUT:

[[  360.     twenty.1]  [  361.5    16.1]  [  363.     16.9]  ...,   [ 1375.five    22.5]  [ 1377.     11.1]  [ 1378.v    xv.2]]            
            # delimiter = ";" , # i.e. apply ";" equally delimiter instead of whitespace                      

tofile

tofile is a role to write the content of an assortment to a file both in binary, which is the default, and text format.

A.tofile(fid, sep="", format="%s")

The data of the A ndarry is e'er written in 'C' order, regardless of the order of A.

The information file written past this method tin be reloaded with the part fromfile().

Parameter Meaning
fid tin can be either an open file object, or a string containing a filename.
sep The string 'sep' defines the separator between array items for text output. If it is empty (''), a binary file is written, equivalent to file.write(a.tostring()).
format Format cord for text file output. Each entry in the array is formatted to text by outset converting it to the closest Python type, and then using 'format' % item.

Remark:

Data on endianness and precision is lost. Therefore information technology may not be a good thought to use the function to archive data or transport data between machines with different endianness. Some of these bug tin be overcome by outputting the data as text files, at the expense of speed and file size.

              dt              =              np              .              dtype              ([(              'time'              ,              [(              'min'              ,              int              ),              (              'sec'              ,              int              )]),              (              'temp'              ,              float              )])              x              =              np              .              zeros              ((              1              ,),              dtype              =              dt              )              x              [              'time'              ][              'min'              ]              =              10              x              [              'temp'              ]              =              98.25              impress              (              10              )              fh              =              open              (              "test6.txt"              ,              "bw"              )              x              .              tofile              (              fh              )            

OUTPUT:

Live Python training

instructor-led training course

Upcoming online Courses

Enrol hither

fromfile

fromfile to read in information, which has been written with the tofile office. It's possible to read binary data, if the data type is known. It's likewise possible to parse simply formatted text files. The data from the file is turned into an assortment.

The general syntax looks like this:

numpy.fromfile(file, dtype=bladder, count=-i, sep='')

Parameter Pregnant
file 'file' can be either a file object or the name of the file to read.
dtype defines the data type of the array, which will be constructed from the file information. For binary files, it is used to decide the size and byte-guild of the items in the file.
count defines the number of items, which will be read. -1 means all items will be read.
sep The cord 'sep' defines the separator between the items, if the file is a text file. If information technology is empty (''), the file will be treated as a binary file. A space (" ") in a separator matches zero or more whitespace characters. A separator consisting solely of spaces has to match at to the lowest degree one whitespace.
              fh              =              open              (              "test4.txt"              ,              "rb"              )              np              .              fromfile              (              fh              ,              dtype              =              dt              )            

OUTPUT:

array([((4294967296, 12884901890), 1.0609978957e-313),        ((30064771078, 38654705672), 2.33419537056e-313),        ((55834574860, 64424509454), iii.60739284543e-313),        ((81604378642, 90194313236), 4.8805903203e-313),        ((107374182424, 115964117018), half-dozen.1537877952e-313),        ((133143986206, 141733920800), 7.42698527006e-313),        ((158913789988, 167503724582), 8.70018274493e-313),        ((184683593770, 193273528364), 9.9733802198e-313)],        dtype=[('fourth dimension', [('min', '<i8'), ('sec', '<i8')]), ('temp', '<f8')])
              import              numpy              as              np              import              os              # platform dependent: difference between Linux and Windows              #data = np.arange(50, dtype=np.int)              data              =              np              .              arange              (              50              ,              dtype              =              np              .              int32              )              data              .              tofile              (              "test4.txt"              )              fh              =              open up              (              "test4.txt"              ,              "rb"              )              # iv * 32 = 128              fh              .              seek              (              128              ,              os              .              SEEK_SET              )              x              =              np              .              fromfile              (              fh              ,              dtype              =              np              .              int32              )              print              (              ten              )            

OUTPUT:

[32 33 34 35 36 37 38 39 xl 41 42 43 44 45 46 47 48 49]            

Attention:

It tin crusade problems to use tofile and fromfile for data storage, because the binary files generated are not platform contained. There is no byte-order or information-blazon information saved by tofile. Information tin can be stored in the platform independent .npy format using save and load instead.

Best Practice to Load and Salve Data

The recommended way to store and load data with Numpy in Python consists in using load and save. Nosotros too use a temporary file in the following :

              import              numpy              as              np              print              (              x              )              from              tempfile              import              TemporaryFile              outfile              =              TemporaryFile              ()              x              =              np              .              arange              (              10              )              np              .              save              (              outfile              ,              ten              )              outfile              .              seek              (              0              )              # Only needed here to simulate endmost & reopening file              np              .              load              (              outfile              )            

OUTPUT:

[32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49] assortment([0, 1, two, 3, iv, v, 6, 7, 8, ix])

and nonetheless another style: genfromtxt

In that location is yet another style to read tabular input from file to create arrays. As the name implies, the input file is supposed to exist a text file. The text file can exist in the form of an annal file as well. genfromtxt can procedure the archive formats gzip and bzip2. The blazon of the archive is determined by the extension of the file, i.eastward. '.gz' for gzip and bz2' for an bzip2.

genfromtxt is slower than loadtxt, but information technology is capable of coping with missing information. It processes the file data in two passes. At start it converts the lines of the file into strings. Thereupon information technology converts the strings into the requested data type. loadtxt on the other paw works in one go, which is the reason, why it is faster.

recfromcsv(fname, **kwargs)

This is not really some other way to read in csv data. 'recfromcsv' basically a shortcut for

np.genfromtxt(filename, delimiter=",", dtype=None)

Live Python training

instructor-led training course

Upcoming online Courses

Enrol here

woodmysecutage.blogspot.com

Source: https://python-course.eu/numerical-programming/reading-and-writing-data-files-ndarrays.php

0 Response to "How to Read in Text File Into Matrix of Numbers in Python"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel