<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 18.10.21 09:42, felix wrote:<br>
</div>
<blockquote type="cite" cite="mid:YW0lXu4%2FjEZxMoo1@medium.hauri">
<pre class="moz-quote-pre" wrap="">
Ça f'sait longtemps que je n'ai pas posté un truc...
Il s'agit de lire un fichier CSV conforme au
RFC 4180 Common Format and MIME Type for Comma-Separated Values (CSV) Files</pre>
</blockquote>
<p>Bon... S'il existe un RFC, celui-ci laisse aussi la porte ouverte
concernant les "implémentations" qui peuvent varier et qui
n'entrent donc pas dans la définition de ce standard. Entre autre,
la doc du RFC 4180 dit clairement :</p>
<pre class="newpage" style="font-size: 13.3333px; margin-top: 0px; margin-bottom: 0px; break-before: page; color: rgb(0, 0, 0); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">Interoperability considerations:
Due to lack of a single specification, there are considerable
differences among implementations. Implementors should "be
conservative in what you do, be liberal in what you accept from
others" (<a href="https://datatracker.ietf.org/doc/html/rfc793">RFC 793</a> [<a href="https://datatracker.ietf.org/doc/html/rfc4180#ref-8" title=""Transmission Control Protocol"">8</a>]) when processing CSV files. An attempt at a
common definition can be found in <a href="https://datatracker.ietf.org/doc/html/rfc4180#section-2">Section 2</a>.
Aussi, la documentation du module bash CSV dit aussi :
<span style="color: rgb(85, 85, 85); font-family: system-ui, -apple-system, "segoe ui", roboto, ubuntu, cantarell, "noto sans", sans-serif, "apple color emoji", "segoe ui emoji", "segoe ui symbol", "noto color emoji"; font-size: medium; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgba(45, 119, 136, 0.08); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;"><span> </span></span><strong style="box-sizing: border-box; font-weight: bolder; color: rgba(var(--rgb-primary),0.95); font-family: system-ui, -apple-system, "segoe ui", roboto, ubuntu, cantarell, "noto sans", sans-serif, "apple color emoji", "segoe ui emoji", "segoe ui symbol", "noto color emoji"; font-size: medium; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgba(45, 119, 136, 0.08); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">This method is recommended only for simple CSV files</strong><span style="color: rgb(85, 85, 85); font-family: system-ui, -apple-system, "segoe ui", roboto, ubuntu, cantarell, "noto sans", sans-serif, "apple color emoji", "segoe ui emoji", "segoe ui symbol", "noto color emoji"; font-size: medium; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgba(45, 119, 136, 0.08); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;"><span> </span>with no text fields containing extra comma<span> </span></span><code style="box-sizing: border-box; font-family: menlo, consolas, "roboto mono", "ubuntu monospace", "noto mono", "oxygen mono", "liberation mono", monospace, "apple color emoji", "segoe ui emoji", "segoe ui symbol", "noto color emoji"; font-size: medium; white-space: pre-wrap; overflow-wrap: break-word; display: inline; padding: 0.3em; background-color: var(--code-bg-color); color: var(--code-fg-color); line-height: 1.8em; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">,</code><span style="color: rgb(85, 85, 85); font-family: system-ui, -apple-system, "segoe ui", roboto, ubuntu, cantarell, "noto sans", sans-serif, "apple color emoji", "segoe ui emoji", "segoe ui symbol", "noto color emoji"; font-size: medium; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgba(45, 119, 136, 0.08); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;"><span> </span>delimiter, or return lines. For more complex CSV support, see the next section to<span> </span></span><a href="https://www.shell-tips.com/bash/how-to-parse-csv-file/#using-the-awk-command-line" style="box-sizing: border-box; background-color: rgba(45, 119, 136, 0.08); touch-action: manipulation; color: var(--interactive); font-family: system-ui, -apple-system, "segoe ui", roboto, ubuntu, cantarell, "noto sans", sans-serif, "apple color emoji", "segoe ui emoji", "segoe ui symbol", "noto color emoji"; font-size: medium; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px;">parse CSV with AWK<img src="https://www.shell-tips.com/icons/link-variant.svg" alt="icon mdi-link-variant" aria-hidden="true" class="icon top" style="box-sizing: border-box; vertical-align: top; border-style: none; height: auto; max-width: 100%; filter: var(--filter-primary); opacity: 0.8;" width="10" height="10"></a><span style="color: rgb(85, 85, 85); font-family: system-ui, -apple-system, "segoe ui", roboto, ubuntu, cantarell, "noto sans", sans-serif, "apple color emoji", "segoe ui emoji", "segoe ui symbol", "noto color emoji"; font-size: medium; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgba(45, 119, 136, 0.08); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;">.
</span></pre>
<p><span style="color: rgb(85, 85, 85); font-family: system-ui,
-apple-system, "segoe ui", roboto, ubuntu, cantarell,
"noto sans", sans-serif, "apple color
emoji", "segoe ui emoji", "segoe ui
symbol", "noto color emoji"; font-size: medium;
font-style: normal; font-variant-ligatures: normal;
font-variant-caps: normal; font-weight: 400; letter-spacing:
normal; orphans: 2; text-align: start; text-indent: 0px;
text-transform: none; white-space: normal; widows: 2;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
background-color: rgba(45, 119, 136, 0.08);
text-decoration-thickness: initial; text-decoration-style:
initial; text-decoration-color: initial; display: inline
!important; float: none;"></span>J'avais d'ailleurs
immédiatement pensé à AWK lorsque j'ai vu ton exemple :-)</p>
<p>En regardant plus en détail le module CSV de Python, j'ai trouvé
ce qui suit :</p>
<p><dt id="csv.Dialect" style="color: rgb(34, 34, 34); font-family:
"Lucida Grande", Arial, sans-serif; font-size: 16px;
font-style: normal; font-variant-ligatures: normal;
font-variant-caps: normal; font-weight: 400; letter-spacing:
normal; orphans: 2; text-align: start; text-indent: 0px;
text-transform: none; white-space: normal; widows: 2;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
background-color: rgb(255, 255, 255); text-decoration-thickness:
initial; text-decoration-style: initial; text-decoration-color:
initial;"><em class="property">class<span> </span></em><code
class="sig-prename descclassname" style="background-color:
transparent; padding: 0px 1px; font-size: 15.44px;
font-family: "monospace", monospace; border-radius:
3px;">csv.</code><code class="sig-name descname"
style="background-color: transparent; padding: 0px 1px;
font-size: 19.2px; font-family: "monospace",
monospace; font-weight: bold; border-radius: 3px;">Dialect</code></dt>
<dd style="margin-top: 3px; margin-bottom: 0px; margin-left: 30px;
hyphens: auto; text-align: left; line-height: 1.4; color:
rgb(34, 34, 34); font-family: "Lucida Grande", Arial,
sans-serif; font-size: 16px; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
font-weight: 400; letter-spacing: normal; orphans: 2;
text-indent: 0px; text-transform: none; white-space: normal;
widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px;
background-color: rgb(255, 255, 255); text-decoration-thickness:
initial; text-decoration-style: initial; text-decoration-color:
initial;">
<p style="margin-top: 0px; hyphens: auto; text-align: left;
line-height: 1.4;">The<span> </span><a class="reference
internal"
href="https://docs.python.org/3/library/csv.html#csv.Dialect"
title="csv.Dialect" style="color: rgb(0, 114, 170);
text-decoration: none;"><code class="xref py py-class
docutils literal notranslate" style="background-color:
transparent; padding: 0px 1px; font-size: 15.44px;
font-family: "monospace", monospace;
font-weight: normal; border-radius: 3px;"><span
class="pre" style="hyphens: none;">Dialect</span></code></a><span> </span>class
is a container class whose attributes contain information for
how to handle doublequotes, whitespace, delimiters, etc. Due
to the lack of a strict CSV specification, different
applications produce subtly different CSV data.<span> </span><a
class="reference internal"
href="https://docs.python.org/3/library/csv.html#csv.Dialect"
title="csv.Dialect" style="color: rgb(0, 114, 170);
text-decoration: none;"><code class="xref py py-class
docutils literal notranslate" style="background-color:
transparent; padding: 0px 1px; font-size: 15.44px;
font-family: "monospace", monospace;
font-weight: normal; border-radius: 3px;"><span
class="pre" style="hyphens: none;">Dialect</span></code></a><span> </span>instances
define how<span> </span><a class="reference internal"
href="https://docs.python.org/3/library/csv.html#csv.reader"
title="csv.reader" style="color: rgb(0, 114, 170);
text-decoration: none;"><code class="xref py py-class
docutils literal notranslate" style="background-color:
transparent; padding: 0px 1px; font-size: 15.44px;
font-family: "monospace", monospace;
font-weight: normal; border-radius: 3px;"><span
class="pre" style="hyphens: none;">reader</span></code></a><span> </span>and<span> </span><a
class="reference internal"
href="https://docs.python.org/3/library/csv.html#csv.writer"
title="csv.writer" style="color: rgb(0, 114, 170);
text-decoration: none;"><code class="xref py py-class
docutils literal notranslate" style="background-color:
transparent; padding: 0px 1px; font-size: 15.44px;
font-family: "monospace", monospace;
font-weight: normal; border-radius: 3px;"><span
class="pre" style="hyphens: none;">writer</span></code></a><span> </span>instances
behave.</p>
<p style="hyphens: auto; text-align: left; line-height: 1.4;">All
available<span> </span><a class="reference internal"
href="https://docs.python.org/3/library/csv.html#csv.Dialect"
title="csv.Dialect" style="color: rgb(0, 114, 170);
text-decoration: none;"><code class="xref py py-class
docutils literal notranslate" style="background-color:
transparent; padding: 0px 1px; font-size: 15.44px;
font-family: "monospace", monospace;
font-weight: normal; border-radius: 3px;"><span
class="pre" style="hyphens: none;">Dialect</span></code></a><span> </span>names
are returned by<span> </span><a class="reference internal"
href="https://docs.python.org/3/library/csv.html#csv.list_dialects"
title="csv.list_dialects" style="color: rgb(0, 114, 170);
text-decoration: none;"><code class="xref py py-func
docutils literal notranslate" style="background-color:
transparent; padding: 0px 1px; font-size: 15.44px;
font-family: "monospace", monospace;
font-weight: normal; border-radius: 3px;"><span
class="pre" style="hyphens: none;">list_dialects()</span></code></a>,
and they can be registered with specific<span> </span><a
class="reference internal"
href="https://docs.python.org/3/library/csv.html#csv.reader"
title="csv.reader" style="color: rgb(0, 114, 170);
text-decoration: none;"><code class="xref py py-class
docutils literal notranslate" style="background-color:
transparent; padding: 0px 1px; font-size: 15.44px;
font-family: "monospace", monospace;
font-weight: normal; border-radius: 3px;"><span
class="pre" style="hyphens: none;">reader</span></code></a><span> </span>and<span> </span><a
class="reference internal"
href="https://docs.python.org/3/library/csv.html#csv.writer"
title="csv.writer" style="color: rgb(0, 114, 170);
text-decoration: none;"><code class="xref py py-class
docutils literal notranslate" style="background-color:
transparent; padding: 0px 1px; font-size: 15.44px;
font-family: "monospace", monospace;
font-weight: normal; border-radius: 3px;"><span
class="pre" style="hyphens: none;">writer</span></code></a><span> </span>classes
through their initializer (<code class="docutils literal
notranslate" style="background-color: rgb(236, 240, 243);
padding: 0px 1px; font-size: 15.44px; font-family:
"monospace", monospace; border-radius: 3px;"><span
class="pre" style="hyphens: none;">__init__</span></code>)
functions like this:</p>
<div class="highlight-python3 notranslate" style="clear: both;
margin: 1em 0px 0px; position: relative;">
<div class="highlight" style="background: rgb(248, 248, 248);">
<pre style="overflow: auto hidden; clear: both; padding: 5px; background-color: rgb(238, 255, 204); color: rgb(51, 51, 51); line-height: 19.3px; border: 1px solid rgb(170, 204, 153); font-family: "monospace", monospace; font-size: 15.44px; margin: 0px; border-radius: 3px;"><span></span><span class="kn" style="color: rgb(0, 128, 0); font-weight: bold;">import</span> <span class="nn" style="color: rgb(0, 0, 255); font-weight: bold;">csv</span>
<span class="k" style="color: rgb(0, 128, 0); font-weight: bold;">with</span> <span class="nb" style="color: rgb(0, 128, 0);">open</span><span class="p">(</span><span class="s1" style="color: rgb(186, 33, 33);">'students.csv'</span><span class="p">,</span> <span class="s1" style="color: rgb(186, 33, 33);">'w'</span><span class="p">,</span> <span class="n">newline</span><span class="o" style="color: rgb(102, 102, 102);">=</span><span class="s1" style="color: rgb(186, 33, 33);">''</span><span class="p">)</span> <span class="k" style="color: rgb(0, 128, 0); font-weight: bold;">as</span> <span class="n">csvfile</span><span class="p">:</span>
<span class="n">writer</span> <span class="o" style="color: rgb(102, 102, 102);">=</span> <span class="n">csv</span><span class="o" style="color: rgb(102, 102, 102);">.</span><span class="n">writer</span><span class="p">(</span><span class="n">csvfile</span><span class="p">,</span> <span class="n">dialect</span><span class="o" style="color: rgb(102, 102, 102);">=</span><span class="s1" style="color: rgb(186, 33, 33);">'unix'</span><span class="p">)</span>
<span class="o" style="color: rgb(102, 102, 102);">^^^^^^^^^^^^^^</span></pre>
</div>
</div>
</dd>
</p>
<p>Ma conclusion est qu'une implémentation en Python nous mettra
mieux à l'abri des variantes de CSV, en étant capable d'écrire une
code plus lisible et évitant de gonfler le code pour traiter les
cas spéciaux.
</p>
<blockquote type="cite" cite="mid:YW0lXu4%2FjEZxMoo1@medium.hauri">
<pre class="moz-quote-pre" wrap="">Depuis un moment, on peut utiliser des modules chargeables dans bash, dans
l'arborescence de la distribution, il y a un dossier d'exemples, avec
de nombreux modules chargeables.
</pre>
</blockquote>
<p>Merci pour cette info. En effet, ces commandes sont exécutées
plus rapidement que celles qui se trouvent dans /usr/bin en
évitant de faire un fork/exec. C'est donc un vrai plus au niveau
performance, en plus d'apporter des commandes qui ne se trouvent
justement pas en standard (comme csv).</p>
<p>dc<br>
</p>
</body>
</html>