13.1. `csv` — CSV (逗号分隔值) 文件读写 ¶

2.3 版新增。

The so-called CSV (Comma Separated Values) format is the most common import and export format for spreadsheets and databases. There is no “CSV standard”, so the format is operationally defined by the many applications which read and write it. The lack of a standard means that subtle differences often exist in the data produced and consumed by different applications. These differences can make it annoying to process CSV files from multiple sources. Still, while the delimiters and quoting characters vary, the overall format is similar enough that it is possible to write a single module which can efficiently manipulate such data, hiding the details of reading and writing the data from the programmer.

The csv module implements classes to read and write tabular data in CSV format. It allows programmers to say, “write this data in the format preferred by Excel,” or “read data from this file which was generated by Excel,” without knowing the precise details of the CSV format used by Excel. Programmers can also describe the CSV formats understood by other applications or define their own special-purpose CSV formats.

The csv 模块的 reader and writer objects read and write sequences. Programmers can also read and write data in dictionary form using the DictReader and DictWriter 类。

注意

This version of the csv module doesn’t support Unicode input. Also, there are currently some issues regarding ASCII NUL characters. Accordingly, all input should be UTF-8 or printable ASCII to be safe; see the examples in section 范例 .

另请参阅

PEP 305 - CSV (逗号分隔值) 文件 API

13.1.1. 模块内容 ¶

The csv 模块定义了下列函数：

csv. reader ( csvfile , dialect='excel' , **fmtparams ) ¶

csv. writer ( csvfile , dialect='excel' , **fmtparams ) ¶

csv. register_dialect ( 名称 , [ dialect , ] **fmtparams ) ¶

csv. unregister_dialect ( 名称 ) ¶

csv. get_dialect ( 名称 ) ¶

csv. list_dialects ( ) ¶

csv. field_size_limit ( [ new_limit ] ) ¶

The csv module defines the following classes:

class csv. DictReader ( f , fieldnames=None , restkey=None , restval=None , dialect='excel' , *args , **kwds ) ¶

class csv. DictWriter ( f , fieldnames , restval='' , extrasaction='raise' , dialect='excel' , *args , **kwds ) ¶

class csv. Dialect ¶

class csv. excel ¶

class csv. excel_tab ¶

class csv. Sniffer ¶

sniff ( sample , delimiters=None ) ¶

has_header ( sample ) ¶

An example for Sniffer use:

with open('example.csv', 'rb') as csvfile:
    dialect = csv.Sniffer().sniff(csvfile.read(1024))
    csvfile.seek(0)
    reader = csv.reader(csvfile, dialect)
    # ... process CSV file contents here ...

The csv module defines the following constants:

csv. QUOTE_ALL ¶

csv. QUOTE_MINIMAL ¶

csv. QUOTE_NONNUMERIC ¶

csv. QUOTE_NONE ¶

The csv 模块定义以下异常：

exception csv. Error ¶

13.1.2. 方言和格式化参数 ¶

To make it easier to specify the format of input and output records, specific formatting parameters are grouped together into dialects. A dialect is a subclass of the Dialect class having a set of specific methods and a single validate() method. When creating reader or writer objects, the programmer can specify a string or a subclass of the Dialect class as the dialect parameter. In addition to, or instead of, the dialect parameter, the programmer can also specify individual formatting parameters, which have the same names as the attributes defined below for the Dialect 类。

Dialects support the following attributes:

Dialect. delimiter ¶

Dialect. doublequote ¶

Dialect. escapechar ¶

Dialect. lineterminator ¶

Dialect. quotechar ¶

Dialect. quoting ¶

Dialect. skipinitialspace ¶

Dialect. strict ¶: 当 True , raise exception Error on bad CSV input. The default is False .

13.1.3. 读取器对象 ¶

Reader objects ( DictReader instances and objects returned by the reader() function) have the following public methods:

csvreader. next ( ) ¶: Return the next row of the reader’s iterable object as a list, parsed according to the current dialect.

Reader objects have the following public attributes:

csvreader. dialect ¶: A read-only description of the dialect in use by the parser.

csvreader. line_num ¶

The number of lines read from the source iterator. This is not the same as the number of records returned, as records can span multiple lines.

2.5 版新增。

DictReader objects have the following public attribute:

csvreader. fieldnames ¶

If not passed as a parameter when creating the object, this attribute is initialized upon first access or when the first record is read from the file.

Changed in version 2.6.

13.1.4. 写入器对象 ¶

Writer objects ( DictWriter instances and objects returned by the writer() function) have the following public methods. A row must be a sequence of strings or numbers for Writer objects and a dictionary mapping fieldnames to strings or numbers (by passing them through str() first) for DictWriter objects. Note that complex numbers are written out surrounded by parens. This may cause some problems for other programs which read CSV files (assuming they support complex numbers at all).

csvwriter. writerow ( row ) ¶: 写入 row parameter to the writer’s file object, formatted according to the current dialect.

csvwriter. writerows ( rows ) ¶: Write all elements in rows (an iterable of row objects as described above) to the writer’s file object, formatted according to the current dialect.

Writer objects have the following public attribute:

csvwriter. dialect ¶: A read-only description of the dialect in use by the writer.

DictWriter objects have the following public method:

DictWriter. writeheader ( ) ¶

Write a row with the field names (as specified in the constructor).

2.7 版新增。

13.1.5. 范例 ¶

The simplest example of reading a CSV file:

import csv
with open('some.csv', 'rb') as f:
    reader = csv.reader(f)
    for row in reader:
        print row

Reading a file with an alternate format:

import csv
with open('passwd', 'rb') as f:
    reader = csv.reader(f, delimiter=':', quoting=csv.QUOTE_NONE)
    for row in reader:
        print row

The corresponding simplest possible writing example is:

import csv
with open('some.csv', 'wb') as f:
    writer = csv.writer(f)
    writer.writerows(someiterable)

Registering a new dialect:

import csv
csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)
with open('passwd', 'rb') as f:
    reader = csv.reader(f, 'unixpwd')

A slightly more advanced use of the reader — catching and reporting errors:

import csv, sys
filename = 'some.csv'
with open(filename, 'rb') as f:
    reader = csv.reader(f)
    try:
        for row in reader:
            print row
    except csv.Error as e:
        sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))

And while the module doesn’t directly support parsing strings, it can easily be done:

import csv
for row in csv.reader(['one,two,three']):
    print row

The csv module doesn’t directly support reading and writing Unicode, but it is 8-bit-clean save for some problems with ASCII NUL characters. So you can write functions or classes that handle the encoding and decoding for you as long as you avoid encodings like UTF-16 that use NULs. UTF-8 is recommended.

unicode_csv_reader() below is a generator that wraps csv.reader to handle Unicode CSV data (a list of Unicode strings). utf_8_encoder() 是 generator that encodes the Unicode strings as UTF-8, one string (or row) at a time. The encoded strings are parsed by the CSV reader, and unicode_csv_reader() decodes the UTF-8-encoded cells back into Unicode:

import csv
def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs):
    # csv.py doesn't do Unicode; encode temporarily as UTF-8:
    csv_reader = csv.reader(utf_8_encoder(unicode_csv_data),
                            dialect=dialect, **kwargs)
    for row in csv_reader:
        # decode UTF-8 back to Unicode, cell by cell:
        yield [unicode(cell, 'utf-8') for cell in row]
def utf_8_encoder(unicode_csv_data):
    for line in unicode_csv_data:
        yield line.encode('utf-8')

For all other encodings the following UnicodeReader and UnicodeWriter classes can be used. They take an additional encoding parameter in their constructor and make sure that the data passes the real reader or writer encoded as UTF-8:

import csv, codecs, cStringIO
class UTF8Recoder:
    """
    Iterator that reads an encoded stream and reencodes the input to UTF-8
    """
    def __init__(self, f, encoding):
        self.reader = codecs.getreader(encoding)(f)
    def __iter__(self):
        return self
    def next(self):
        return self.reader.next().encode("utf-8")
class UnicodeReader:
    """
    A CSV reader which will iterate over lines in the CSV file "f",
    which is encoded in the given encoding.
    """
    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        f = UTF8Recoder(f, encoding)
        self.reader = csv.reader(f, dialect=dialect, **kwds)
    def next(self):
        row = self.reader.next()
        return [unicode(s, "utf-8") for s in row]
    def __iter__(self):
        return self
class UnicodeWriter:
    """
    A CSV writer which will write rows to CSV file "f",
    which is encoded in the given encoding.
    """
    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        # Redirect output to a queue
        self.queue = cStringIO.StringIO()
        self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
        self.stream = f
        self.encoder = codecs.getincrementalencoder(encoding)()
    def writerow(self, row):
        self.writer.writerow([s.encode("utf-8") for s in row])
        # Fetch UTF-8 output from the queue ...
        data = self.queue.getvalue()
        data = data.decode("utf-8")
        # ... and reencode it into the target encoding
        data = self.encoder.encode(data)
        # write to the target stream
        self.stream.write(data)
        # empty queue
        self.queue.truncate(0)
    def writerows(self, rows):
        for row in rows:
            self.writerow(row)

13.1. csv — CSV (逗号分隔值) 文件读写 ¶

13.1.1. 模块内容 ¶

13.1.2. 方言和格式化参数 ¶

13.1.3. 读取器对象 ¶

13.1.4. 写入器对象 ¶

13.1.5. 范例 ¶

内容表

上一话题

下一话题

本页

快速搜索

13.1. `csv` — CSV (逗号分隔值) 文件读写 ¶