Folded Block Scalars - yaml/YAML2 GitHub Wiki

Folded Block Scalars should be removed from the language. They offer almost no abilities not offered by the other forms, and yet are hardly ever implemented correctly. Folded scalars have a lot of edge cases.

folded: >
  This content is
  folded and has trailing newline.
quoted:
  "This content is
  folded and has trailing newline.\n"

NOTE: We should write some tests to see which implementations get this right.

Current YAML Behaviour

This:

x: >
  foo
  bar
  baz

Produces:

{"x": "foo bar baz\n"}

This:

x: >
  foo
   bar
  baz

Produces:

{"x": "foo\n bar\nbaz\n"}

That is probably too precise. It uses a wiki-ish syntaxism that doesn't belong in YAML. No emitter would produce it. And no human would remember how it works. So it is not useful.

Proposal to replace folded with new quoting rules

Proposal 1: Using "

The only thing that folded offers us over quoted folding, is a trailing newline.

Currently:

x: "foo
  "
y: "foo "

produce the same values. There is probably no usage of the first form. So we should make this work:

x: "foo
  "
y: "foo\n"
z: >
  foo

all produce the same value.

Further, these two are currently the same:

x: "
  foo"
y: " foo"

You would never see the first in real life. So we make it:

x: "
  foo"
y: "foo"

Then we can make the following work:

x1: "
  foo"
y1: "foo"
z1: >-
  foo
x2: "
  foo
  "
y2: "foo\n"
z2: >
  foo

This means that when you have a folded paragraph, you don't need to put the first line on the same line as the key (to avoid the extra space).

folded paragraph: "
  This means that when you have a folded paragraph, you don't need to put
  the first line on the same line as the key (to avoid the extra space)."

Looks great! We have completely obviated any usefulness of the folded scalar form.

Getting rid of the folded form that nobody really understands, will be a good move for YAML, whose detractors think it is too complicated.

Proposal 2: Using `::` as block mode declaration, with modifiers commands

Instead of using | to indicate newline preserved block scalar and > for folded block scalar, lets use :: to indicate block mode, then modifiers to change it's behaviour (Default behaviour should be "newline preserved" as that is what most people would expect)

PSUDO-CODE::

	IF `::` THEN // work out which block mode
	
		//// Let's avoid too much complexity in parsing logic...
		//IF `'` THEN explicitly folded block mode (with automatic indent level detection)
		//IF `"` THEN explicitly newline preserved (with automatic indent level detection)
		
		IF ( `"` OR (`\n` then ASCII) )  THEN // detects newline-preserved1
			implied/explicitly newline preserved block mode ( indent level autodetected ) 
		
		IF ( `\n` then `"`) THEN // detects newline-preserved2
			explicitly newline preserved block mode ( indent level specified ) 
			
		IF ( `'` ) THEN // detects folded-block1 
			implied folded block mode ( indent level autodetected ) 
		
		IF ( `\n` then `'`) THEN // detects folded-block2
			explicitly folded block mode ( indent level specified ) 

		// More experimental proposal 
		IF ( Not (space or `\n` or number) right after `::` (This is your "boundary string") ) THEN // detects NEWLINEPRESERVED-experimental1
			read the word after `::` (e.g. `::frontier` would yeild boundary=frontier ) into boundary variable
			// This functions similar to https://en.wikipedia.org/wiki/MIME#Multipart_messages boundary=frontier 
			explicitly newline preserved, but keep reading at any indent level 
			Ignore the first line if its matches '::<var boundary>' (It's optional for setting indent level)
			(even if it's below parent indent level e.g. indent level 0 )
			Keep reading in until a matching '::<var boundary>' number of characters (or more) in it's own line is detected at the right indent level,
			Or end of document
			( For practicality, it)
		
		IF ( NUMBER ) THEN // detects NEWLINEPRESERVED-experimental2
			read in a specific number of lines as specified by NUMBER
			good for immutable records. Has speed advantage over the more flexible option above.
		
	ELSEIF `:` THEN
		Might be something else! Keep parsing

NEWLINEPRESERVED:

	newline-preserved1::
		This is the default behaviour
		where it will save all newlines

	newline-preserved1-alt::"
		Same behaviour
		as the above
		
	newline-preserved2::
		"
			This allows for beginning spaces
		to be preserved
		
FOLDEDBLOCK:

	folded-block1::'
	    This is a folded block
		might as well treat it like this
		as it is easier to deal with
		
	folded-block2::
		'
		This is also a folded block,
		fortunately, since newline is ignored
		the same parsing logic will work for both 
		folded-block1 and folded-block2

NEWLINEPRESERVED-experimental1:
	data:text/html::______________________________________________________________
	<html>
	This is for preserving newlines, where the source is all the way at the bottom
	this make it easier to copy paste codes.
	
	Also nicer for QR codes too.
	
	</html>
	::____________________________________________________________________________
		
	data:text/html::FRONTIER
	::FRONTIER START
	<html>
	This is for preserving newlines, where the source is all the way at the bottom
	this make it easier to copy paste codes.
	
	Also nicer for QR codes too.
	
	</html>
	::FRONTIER END
	
	data:text/html::FRONTIER
::FRONTIER START
<html>
This is for preserving newlines, where the source is all the way at the bottom
this make it easier to copy paste codes.

Also nicer for QR codes too.

</html>
::FRONTIER END

	otherData: 42
	
NEWLINEPRESERVED-experimental2:
	data:text/html::4
	<html>
	This is for preserving newlines, where the source is all the way at the bottom
	this make it easier to copy paste codes.
	</html>
	
	otherData: 42
	otherDat2: lol

NEWLINEPRESERVED-experimental3:
	data:text/html:::
	<html>
	This is for preserving newlines, where the source is all the way at the bottom
	this make it easier to copy paste codes.
	</html>

	//END OF DOCUMENT SIGNAL HERE

Comments

At first read, your conclusion seems like a nice one. But then that trailing quote strikes me as out of place relative the the first, and I start to wonder did we really gain anything in this trade. I also want to ask how this picture might change if YAML adopts the literal format as the default format.